Bias Protections

🟢 Smarter AI 🟢

⚡We Care. Period.

Human-in-the-Loop Advanced Agent Verifier AI Safety Guardrails Testing

Updated: -2-19-2026

AI Voice+

Overview

AI Voice+ implements a 5-layer bias protection stack to ensure equitable treatment across all AI-powered features. These protections are designed to be practical and performant — they run inline without degrading response times or user experience.

5-Layer Bias Protection Stack

Layer 1: Input Classifier

Regex-based injection scanning detects attempts to override safety instructions
PII redaction strips identifying information before it reaches the AI model
Input sanitization removes control characters that could manipulate behavior

Layer 2: Policy-Constrained Execution

Location: SAFETY_PREAMBLE in each chat edge function

The system prompt includes explicit anti-bias instructions:

"Treat all users equitably regardless of demographics"
"Do not make assumptions based on names, accents, or other personal attributes"
"If uncertain about a claim, state your uncertainty rather than guessing"

Layer 3: Output Verification

Location: Post-response moderation in agent-one-chat and convo-chat

AI-powered content moderation scans all responses before delivery
Flagged responses are replaced with safe fallback messages
Moderation blocks are logged to ai_usage_logs for audit

Layer 4: Evidence Enforcement

Location: SAFETY_PREAMBLE professional boundaries section

AI agents are instructed to recommend professional consultation for medical, legal, and financial questions
Agents must indicate when information may be incomplete or uncertain
No guarantees or promises on behalf of the business

Layer 5: Data Minimization

Location: PII redaction + system prompt instructions

PII patterns (credit cards, SSNs, emails, phone numbers, UK NINOs) are redacted on input AND output
System prompt instructs: "Do not echo back personal data shared by the user"
Conversation history is limited to 20 messages to minimize data exposure

Multi-Agent Bias Mitigations

For organizations using multiple AI agents (via Agent-to-Agent transfer rules):

Each agent operates within org-scoped isolation (RLS policies)
Transfer rules use keyword/intent matching, not demographic data
Agent skills use a numeric proficiency scale (15-100), not subjective labels
All agents share the same safety preamble and bias protection instructions

What We Track

Metric

Table

Purpose

Moderation blocks

ai_usage_logs (feature: *_moderation_block)

Track false positive rate

Injection detections

ai_usage_logs (feature: injection_detected)

Monitor attack patterns

Output moderation blocks

ai_usage_logs (feature: *_output_moderation_block)

Track output safety

Known Limitations

Model-level biases: We use third-party models (Gemini, GPT). We cannot retrain them to remove biases — we mitigate via prompt engineering and output filtering.
Language coverage: PII redaction patterns are optimized for English, US SSN, and UK NINO formats. Other national ID formats may not be caught.
Cultural context: The safety preamble is written in English. Non-English conversations may have reduced bias protection coverage.
No demographic auditing: We do not collect demographic data about users, so we cannot audit for disparate impact across groups.

Future Improvements

Multi-language PII pattern support
Automated bias testing with synthetic personas
Output fairness scoring via secondary model
Configurable sensitivity levels per workspace

PreviousAI Safety Guardrails NextAdvanced AI Safety

Last updated 16 hours ago

hashtag⚡We Care. Period.

hashtagAI Voice+

hashtagOverview

hashtag5-Layer Bias Protection Stack

hashtagLayer 1: Input Classifier

hashtagLayer 2: Policy-Constrained Execution

hashtagLayer 3: Output Verification

hashtagLayer 4: Evidence Enforcement

hashtagLayer 5: Data Minimization

hashtagMulti-Agent Bias Mitigations

hashtagWhat We Track

hashtagKnown Limitations

hashtagFuture Improvements