Page cover

fingerprintHuman-in-the-Loop

🟒 Smarter AI 🟒

⚑We Care. Period.

fingerprintHuman-in-the-Loopchevron-rightarrows-to-circleAdvanced Agent Verifierchevron-rightshield-checkAI Safety Guardrailschevron-rightmicroscopeTestingchevron-right

AI Voice+

HITL Protections

Tests Vitestarrow-up-right Languages BYOK Providers

Updated: 02-19-2026


What is Human-in-the-Loop?

Human-in-the-Loop (HITL) is an AI safety pattern where human judgement, consent, or oversight is required at critical points in an automated pipeline. Instead of letting AI operate with unchecked autonomy, HITL ensures that humans remain the final authority on sensitive decisions.

In AI Voice+, HITL does not mean a human reviews every single AI response (that would destroy the real-time experience). Instead, it means the system enforces human checkpoints at the moments that matter most: consent, identity, safety, and audit.


Our 6 HITL Mechanisms

1. Content Moderation Blocks (Fail-Closed)

Every AI chat function (agent-one-chat, convo-chat, chat-with-data) runs user input through moderate functions before the AI model ever sees it. If the moderation check flags the content or if the moderation service itself is unreachable, the request is blocked β€” not allowed through.

  • Code: in each edge function

  • Logging: Blocks are logged to ai_usage_logs with feature tags agent_one_moderation_block, convo_moderation_block, chat_data_moderation_block

  • Human element: The user receives a clear safety notice and can rephrase their request

2. Safety Refusals via System Prompt

The SAFETY_PREAMBLE is injected into every AI conversation. It contains non-negotiable instructions that the model must follow, including:

  • No medical, legal, or financial advice β€” the AI will politely decline and suggest consulting a professional

  • Uncertainty disclosure β€” if the AI is not confident, it says so rather than guessing

  • Bias protection β€” equitable treatment regardless of caller demographics (see BiasProtections.md)

  • Code: SAFETY_PREAMBLE constant in agent-one-chat, convo-chat

  • Human element: The AI actively redirects users to human experts for sensitive topics

Before any voice call is recorded, the caller must provide explicit consent. The recordConsent tool captures:

  • Whether consent was given (consent_given: boolean)

  • The method of consent (consent_method: string)

  • Caller identity (name, number)

  • Timestamp and metadata

If the caller declines, recording does not proceed. Consent records are stored in the call_consents table with full audit trail.

  • Code: Record Consent tool

  • Human element: The caller β€” a real human β€” has the final say on whether their call is recorded

4. Identity Verification

Before AI agents can perform sensitive actions (accessing account details, making changes), callers must verify their identity through one or more methods:

  • Security PIN β€” caller provides their PIN

  • Date of birth β€” caller confirms their DOB

  • Account number β€” caller provides their account number

The Verify Identity tool checks these against client_records and logs every attempt (successful or not) to the Identity Verifications table.

  • Code: Verify Identity tool

  • Human element: The caller must prove who they are before the AI proceeds β€” no verification, no access

5. Injection Safe-Wrapping

When prompt injection attempts are detected (10 regex patterns covering DAN mode, system prompt extraction, role-play jailbreaks, etc.), the system does not silently block them. Instead, it:

  1. Wraps the injection in safety markers so the AI model can see it's been flagged

  2. Logs the detection to ai_usage_logs with feature tag injection_detected

  3. Allows the conversation to continue safely

This approach preserves UX (no mysterious failures) while neutralizing the attack.

  • Code: Scan For Injection in agent-one-safety.ts and edge functions

  • Human element: The superadmin can review all injection attempts in the audit log and take action if patterns emerge

6. Output Moderation

AI responses are checked after generation but before delivery to the user. If the output contains flagged content:

  • The response is replaced with a safety notice

  • The event is logged to ai_usage_logs with feature tag convo_output_moderation_block

  • The user is informed that the response was filtered

  • Code: Output moderation in convo-chat edge function

  • Human element: Harmful content never reaches the end user; the superadmin can review what was blocked


Request Pipeline β€” Where HITL Checkpoints Sit

For voice calls, two additional HITL checkpoints apply before the conversation begins:


How Users Benefit

Benefit
How It Works

Callers are never recorded without consent

The consent flow is mandatory. No consent = no recording. Period.

AI cannot give dangerous advice

Medical, legal, and financial advice is refused by the safety preamble. The AI redirects to human professionals.

Client PII is protected automatically

Email addresses, phone numbers, SSNs, and NINOs are redacted before they reach the AI model.

AI asks instead of guessing

The safety preamble instructs the AI to disclose uncertainty and ask for clarification rather than hallucinating answers.

Harmful content is blocked

Both input and output moderation catch inappropriate content before it affects the conversation.

Identity theft is prevented

Callers must verify their identity before accessing sensitive account information.


How Superadmins Benefit

Benefit
Implementation

Full audit trail

Every moderation block, injection detection, and safety event is logged to ai_usage_logs with feature tags

Attack pattern visibility

Injection attempts are logged (not silently dropped), making patterns visible over time

Moderation metrics

Block counts per feature tag enable monitoring of content safety trends

Consent compliance

The Call Consents table provides a complete record for regulatory compliance

Identity verification audit

The Identity Verifications table logs every attempt, including failed ones


What HITL Does NOT Do

Item
Reason

Manual review queue

Would add unacceptable latency to real-time conversations. We use automated moderation instead.

Human approval before every response

Would destroy the conversational UX. The safety preamble and output moderation provide equivalent protection at machine speed.

Human-reviewed training data

We use third-party models (OpenAI, Google). We mitigate via prompts, not training.

Escalation to human agents

Currently out of scope. The transfer-to-phone-number feature provides a manual fallback for complex situations.


  • Admin - need-to-know basis


Framework Alignment

Our HITL implementation aligns with:

  • NIST AI RMF (Govern function): Human oversight is a core requirement of the Govern function. Our consent flows and identity verification satisfy this.

  • EU AI Act (Article 14): Requires "human oversight measures" for AI systems. Our fail-closed moderation and consent flows provide this.

  • ISO/IEC 42001 (Section 6.1.3): Requires identification of AI risks and human intervention points. Our 6 mechanisms map directly to identified risk areas.


These are reference alignments, not certifications. External auditing is recommended for formal compliance.


Last updated