Bias Protections
π’ Smarter AI π’
β‘We Care. Period.
Human-in-the-LoopAdvanced Agent VerifierAI Safety GuardrailsTestingUpdated: -2-19-2026
AI Voice+
Overview
AI Voice+ implements a 5-layer bias protection stack to ensure equitable treatment across all AI-powered features. These protections are designed to be practical and performant β they run inline without degrading response times or user experience.
5-Layer Bias Protection Stack
Layer 1: Input Classifier
Regex-based injection scanning detects attempts to override safety instructions
PII redaction strips identifying information before it reaches the AI model
Input sanitization removes control characters that could manipulate behavior
Layer 2: Policy-Constrained Execution
Location: SAFETY_PREAMBLE in each chat edge function
The system prompt includes explicit anti-bias instructions:
"Treat all users equitably regardless of demographics"
"Do not make assumptions based on names, accents, or other personal attributes"
"If uncertain about a claim, state your uncertainty rather than guessing"
Layer 3: Output Verification
Location: Post-response moderation in agent-one-chat and convo-chat
AI-powered content moderation scans all responses before delivery
Flagged responses are replaced with safe fallback messages
Moderation blocks are logged to
ai_usage_logsfor audit
Layer 4: Evidence Enforcement
Location: SAFETY_PREAMBLE professional boundaries section
AI agents are instructed to recommend professional consultation for medical, legal, and financial questions
Agents must indicate when information may be incomplete or uncertain
No guarantees or promises on behalf of the business
Layer 5: Data Minimization
Location: PII redaction + system prompt instructions
PII patterns (credit cards, SSNs, emails, phone numbers, UK NINOs) are redacted on input AND output
System prompt instructs: "Do not echo back personal data shared by the user"
Conversation history is limited to 20 messages to minimize data exposure
Multi-Agent Bias Mitigations
For organizations using multiple AI agents (via Agent-to-Agent transfer rules):
Each agent operates within org-scoped isolation (RLS policies)
Transfer rules use keyword/intent matching, not demographic data
Agent skills use a numeric proficiency scale (15-100), not subjective labels
All agents share the same safety preamble and bias protection instructions
What We Track
Moderation blocks
ai_usage_logs (feature: *_moderation_block)
Track false positive rate
Injection detections
ai_usage_logs (feature: injection_detected)
Monitor attack patterns
Output moderation blocks
ai_usage_logs (feature: *_output_moderation_block)
Track output safety
Known Limitations
Model-level biases: We use third-party models (Gemini, GPT). We cannot retrain them to remove biases β we mitigate via prompt engineering and output filtering.
Language coverage: PII redaction patterns are optimized for English, US SSN, and UK NINO formats. Other national ID formats may not be caught.
Cultural context: The safety preamble is written in English. Non-English conversations may have reduced bias protection coverage.
No demographic auditing: We do not collect demographic data about users, so we cannot audit for disparate impact across groups.
Future Improvements
Multi-language PII pattern support
Automated bias testing with synthetic personas
Output fairness scoring via secondary model
Configurable sensitivity levels per workspace
Last updated