Human-in-the-Loop

🟢 Smarter AI 🟢

⚡We Care. Period.

Human-in-the-Loop Advanced Agent Verifier AI Safety Guardrails Testing

AI Voice+

HITL Protections

Last updated: 2026-03-12

What is Human-in-the-Loop?

Human-in-the-Loop (HITL) is an AI safety pattern where human judgement, consent, or oversight is required at critical points in an automated pipeline. Instead of letting AI operate with unchecked autonomy, HITL ensures that humans remain the final authority on sensitive decisions.

In AI Voice+, HITL does not mean a human reviews every single AI response (that would destroy the real-time experience). Instead, it means the system enforces human checkpoints at the moments that matter most: consent, identity, safety, and audit.

Our 6 HITL Mechanisms

1. Content Moderation Blocks (Fail-Closed)

Every AI chat function (agent-one-chat, convo-chat, chat-with-data) runs user input through moderate functions before the AI model ever sees it. If the moderation check flags the content or if the moderation service itself is unreachable, the request is blocked — not allowed through.

Code: in each edge function
Logging: Blocks are logged to ai_usage_logs with feature tags agent_one_moderation_block, convo_moderation_block, chat_data_moderation_block
Human element: The user receives a clear safety notice and can rephrase their request

2. Safety Refusals via System Prompt

The SAFETY_PREAMBLE is injected into every AI conversation. It contains non-negotiable instructions that the model must follow, including:

No medical, legal, or financial advice — the AI will politely decline and suggest consulting a professional
Uncertainty disclosure — if the AI is not confident, it says so rather than guessing
Bias protection — equitable treatment regardless of caller demographics (see BiasProtections.md)
Code: SAFETY_PREAMBLE constant in agent-one-chat, convo-chat
Human element: The AI actively redirects users to human experts for sensitive topics

Before any voice call is recorded, the caller must provide explicit consent. The recordConsent tool captures:

Whether consent was given (consent_given: boolean)
The method of consent (consent_method: string)
Caller identity (name, number)
Timestamp and metadata

If the caller declines, recording does not proceed. Consent records are stored in the call_consents table with full audit trail.

Code: Record Consent tool
Human element: The caller — a real human — has the final say on whether their call is recorded

4. Identity Verification

Before AI agents can perform sensitive actions (accessing account details, making changes), callers must verify their identity through one or more methods:

Security PIN — caller provides their PIN
Date of birth — caller confirms their DOB
Account number — caller provides their account number

The Verify Identity tool checks these against client_records and logs every attempt (successful or not) to the Identity Verifications table.

Code: Verify Identity tool
Human element: The caller must prove who they are before the AI proceeds — no verification, no access

5. Injection Safe-Wrapping

When prompt injection attempts are detected (10 regex patterns covering DAN mode, system prompt extraction, role-play jailbreaks, etc.), the system does not silently block them. Instead, it:

Wraps the injection in safety markers so the AI model can see it's been flagged
Logs the detection to ai_usage_logs with feature tag injection_detected
Allows the conversation to continue safely

This approach preserves UX (no mysterious failures) while neutralizing the attack.

Code: Scan For Injection in agent-one-safety.ts and edge functions
Human element: The superadmin can review all injection attempts in the audit log and take action if patterns emerge

6. Output Moderation

AI responses are checked after generation but before delivery to the user. If the output contains flagged content:

The response is replaced with a safety notice
The event is logged to ai_usage_logs with feature tag convo_output_moderation_block
The user is informed that the response was filtered
Code: Output moderation in convo-chat edge function
Human element: Harmful content never reaches the end user; the superadmin can review what was blocked

Request Pipeline — Where HITL Checkpoints Sit

User Input
    │
    ▼
┌─────────────────────┐
│  Rate Limiting       │ ← IP-based, 30 msg / 15 min
└─────────┬───────────┘
          │
          ▼
┌─────────────────────┐
│  Injection Scanning  │ ← 10 regex patterns, safe-wrap
└─────────┬───────────┘
          │
          ▼
┌─────────────────────┐
│  Input Moderation    │ ← moderateContent(), fail-closed  ✦ HITL
└─────────┬───────────┘
          │
          ▼
┌─────────────────────┐
│  PII Redaction       │ ← Email, phone, SSN, NINO patterns
└─────────┬───────────┘
          │
          ▼
┌─────────────────────┐
│  AI Model + Safety   │ ← SAFETY_PREAMBLE enforced        ✦ HITL
│  Preamble            │
└─────────┬───────────┘
          │
          ▼
┌─────────────────────┐
│  Output Moderation   │ ← Response checked before delivery ✦ HITL
└─────────┬───────────┘
          │
          ▼
   Response to User

For voice calls, two additional HITL checkpoints apply before the conversation begins:

Incoming Call
    │
    ▼
┌─────────────────────┐
│  Consent Flow        │ ← Caller must agree to recording   ✦ HITL
└─────────┬───────────┘
          │
          ▼
┌─────────────────────┐
│  Identity Verify     │ ← PIN / DOB / Account number       ✦ HITL
└─────────┬───────────┘
          │
          ▼
   AI Agent Conversation

How Users Benefit

Benefit

How It Works

Callers are never recorded without consent

The consent flow is mandatory. No consent = no recording. Period.

AI cannot give dangerous advice

Medical, legal, and financial advice is refused by the safety preamble. The AI redirects to human professionals.

Client PII is protected automatically

Email addresses, phone numbers, SSNs, and NINOs are redacted before they reach the AI model.

AI asks instead of guessing

The safety preamble instructs the AI to disclose uncertainty and ask for clarification rather than hallucinating answers.

Harmful content is blocked

Both input and output moderation catch inappropriate content before it affects the conversation.

Identity theft is prevented

Callers must verify their identity before accessing sensitive account information.

How Superadmins Benefit

Benefit

Implementation

Full audit trail

Every moderation block, injection detection, and safety event is logged to ai_usage_logs with feature tags

Attack pattern visibility

Injection attempts are logged (not silently dropped), making patterns visible over time

Moderation metrics

Block counts per feature tag enable monitoring of content safety trends

Consent compliance

The Call Consents table provides a complete record for regulatory compliance

Identity verification audit

The Identity Verifications table logs every attempt, including failed ones

What HITL Does NOT Do

Item

Reason

Manual review queue

Would add unacceptable latency to real-time conversations. We use automated moderation instead.

Human approval before every response

Would destroy the conversational UX. The safety preamble and output moderation provide equivalent protection at machine speed.

Human-reviewed training data

We use third-party models (OpenAI, Google). We mitigate via prompts, not training.

Escalation to human agents

Currently out of scope. The transfer-to-phone-number feature provides a manual fallback for complex situations.

Admin - need-to-know basis

Framework Alignment

Our HITL implementation aligns with:

NIST AI RMF (Govern function): Human oversight is a core requirement of the Govern function. Our consent flows and identity verification satisfy this.
EU AI Act (Article 14): Requires "human oversight measures" for AI systems. Our fail-closed moderation and consent flows provide this.
ISO/IEC 42001 (Section 6.1.3): Requires identification of AI risks and human intervention points. Our 6 mechanisms map directly to identified risk areas.

These are reference alignments, not certifications. External auditing is recommended for formal compliance.

PreviousContent Moderation Next(confidential Features)

Last updated 2 hours ago

hashtag⚡We Care. Period.

hashtagAI Voice+

hashtagHITL Protections

hashtagWhat is Human-in-the-Loop?

hashtagOur 6 HITL Mechanisms

hashtag1. Content Moderation Blocks (Fail-Closed)

hashtag2. Safety Refusals via System Prompt

hashtag3. Call Recording Consent

hashtag4. Identity Verification

hashtag5. Injection Safe-Wrapping

hashtag6. Output Moderation

hashtagRequest Pipeline — Where HITL Checkpoints Sit

hashtagHow Users Benefit

hashtagHow Superadmins Benefit

hashtagWhat HITL Does NOT Do

hashtagRelated Documentation

hashtagFramework Alignment