Page cover

arrows-to-circleAdvanced Agent Verifier

🟢 Smarter AI 🟢

Responsible. We Care.

microscopeTestingchevron-rightcode-branchBias Protectionschevron-rightshield-checkAI Safety Guardrailschevron-rightfingerprintHuman-in-the-Loopchevron-right

Security

​​Tests​​​​​​​​ Vitestarrow-up-right Languages BYOK Providers

Updated: 02-19-2026


AI Voice+

Overview

The Agent Verifier is a conceptual security framework that ensures AI agents operating within the AI Voice+ platform are trustworthy, sandboxed, and auditable. This document maps the 18 verifier concepts to our actual implementation.


Implementation Status

✅ Already Implemented

Verifier Concept
Our Implementation
Code Location

Agent Identity & Provenance

Org-scoped ai_agents table with unique IDs, API key hashing (SHA-256) for MCP

ai_agents table, mcp-server/index.ts

Capability Declaration

agent_one_tools table declares per-workspace tools with type and definition

agent_one_tools table

Prompt & Policy Compliance

10-pattern injection scanning, expanded safety preamble with bias/fairness/boundaries

agent-one-chat/index.ts, convo-chat/index.ts

Tool & API Sandboxing

Per-request context isolation, org-scoped queries, daily quotas (100 msgs/day)

mcp-server/index.ts (RequestContext class)

DLP (Data Leakage Prevention)

5-pattern PII redaction on input AND output (CC, SSN, email, phone, UK NINO)

All chat edge functions

Audit Logs

ai_usage_logs table tracks moderation blocks, injection detections, and usage

ai_usage_logs table

Human-in-the-Loop

Content moderation blocks with safety refusals; fail-closed moderation

moderateContent() in chat functions

Self-Restricting Behavior

SAFETY_PREAMBLE includes: "ask for clarification rather than guessing", "may decline tasks outside capabilities"

System prompts

Rate Limiting

IP-based (30/15min) + org-based daily quotas; fail-closed rate limiter

_shared/rate-limit.ts

🔮 Planned (Future Roadmap)

Verifier Concept
Status
Notes

Multi-Agent Cross-Verification

Planned

Requires multi-model voting system; would use agent transfer rules

Agent Reputation/Trust Scores

Planned

Needs historical behavior data collection over time

Behavioral Drift Detection

Planned

Requires baseline behavior collection and comparison

Certification Badges

Planned

UI feature showing agent compliance status

Version Control & Rollback

Planned

Agent configuration versioning with rollback capability

Automated Red-Teaming

Planned

Periodic injection testing against live agents


Verification Architecture


MCP Server Security

The MCP (Model Context Protocol) server uses a RequestContext class instead of global state to prevent cross-tenant data leaks during concurrent requests:

  • Each request authenticates via SHA-256 hashed API key

  • Context (Supabase client, org ID, user ID) is stored per-request

  • All tool handlers read from the request-scoped context

  • Org-scoped queries prevent data access across tenants


How Existing Safety Layers Map to Verifier Concepts

Safety Layer
Verifier Concept

INJECTION_PATTERNS (10 patterns)

Prompt & Policy Compliance

PII_PATTERNS (5 patterns)

DLP / Data Leakage Prevention

SAFETY_PREAMBLE (bias, boundaries, honesty)

Policy Compliance + Self-Restriction

moderateContent() (fail-closed)

Human-in-the-Loop (automated)

RequestContext class

Tool Sandboxing

ai_usage_logs audit entries

Audit Logs

checkRateLimit() (fail-closed)

Rate Limiting

BLOCKED_VOICE_PHRASES

DLP for Voice

Error masking (generic messages)

Information Disclosure Prevention

encrypt_sensitive() / decrypt_sensitive()

Data Protection at Rest


🚀 What's Next? (see Roadmap)

Last updated