Page cover

code-branchBias Protections

🟒 Smarter AI 🟒

⚑We Care. Period.

fingerprintHuman-in-the-Loopchevron-rightarrows-to-circleAdvanced Agent Verifierchevron-rightshield-checkAI Safety Guardrailschevron-rightmicroscopeTestingchevron-right

Security

Tests Vitestarrow-up-right Languages BYOK Providers

Updated: -2-19-2026


AI Voice+

Overview

AI Voice+ implements a 5-layer bias protection stack to ensure equitable treatment across all AI-powered features. These protections are designed to be practical and performant β€” they run inline without degrading response times or user experience.


5-Layer Bias Protection Stack

Layer 1: Input Classifier

  • Regex-based injection scanning detects attempts to override safety instructions

  • PII redaction strips identifying information before it reaches the AI model

  • Input sanitization removes control characters that could manipulate behavior

Layer 2: Policy-Constrained Execution

Location: SAFETY_PREAMBLE in each chat edge function

The system prompt includes explicit anti-bias instructions:

  • "Treat all users equitably regardless of demographics"

  • "Do not make assumptions based on names, accents, or other personal attributes"

  • "If uncertain about a claim, state your uncertainty rather than guessing"

Layer 3: Output Verification

Location: Post-response moderation in agent-one-chat and convo-chat

  • AI-powered content moderation scans all responses before delivery

  • Flagged responses are replaced with safe fallback messages

  • Moderation blocks are logged to ai_usage_logs for audit

Layer 4: Evidence Enforcement

Location: SAFETY_PREAMBLE professional boundaries section

  • AI agents are instructed to recommend professional consultation for medical, legal, and financial questions

  • Agents must indicate when information may be incomplete or uncertain

  • No guarantees or promises on behalf of the business

Layer 5: Data Minimization

Location: PII redaction + system prompt instructions

  • PII patterns (credit cards, SSNs, emails, phone numbers, UK NINOs) are redacted on input AND output

  • System prompt instructs: "Do not echo back personal data shared by the user"

  • Conversation history is limited to 20 messages to minimize data exposure


Multi-Agent Bias Mitigations

For organizations using multiple AI agents (via Agent-to-Agent transfer rules):

  • Each agent operates within org-scoped isolation (RLS policies)

  • Transfer rules use keyword/intent matching, not demographic data

  • Agent skills use a numeric proficiency scale (15-100), not subjective labels

  • All agents share the same safety preamble and bias protection instructions


What We Track

Metric
Table
Purpose

Moderation blocks

ai_usage_logs (feature: *_moderation_block)

Track false positive rate

Injection detections

ai_usage_logs (feature: injection_detected)

Monitor attack patterns

Output moderation blocks

ai_usage_logs (feature: *_output_moderation_block)

Track output safety


Known Limitations

  1. Model-level biases: We use third-party models (Gemini, GPT). We cannot retrain them to remove biases β€” we mitigate via prompt engineering and output filtering.

  2. Language coverage: PII redaction patterns are optimized for English, US SSN, and UK NINO formats. Other national ID formats may not be caught.

  3. Cultural context: The safety preamble is written in English. Non-English conversations may have reduced bias protection coverage.

  4. No demographic auditing: We do not collect demographic data about users, so we cannot audit for disparate impact across groups.


Future Improvements

  • Multi-language PII pattern support

  • Automated bias testing with synthetic personas

  • Output fairness scoring via secondary model

  • Configurable sensitivity levels per workspace

Last updated