Advanced Agent Verifier

🟢 Smarter AI 🟢

⚡Responsible. We Care.

Lisaiceland: Our carbon removal commitmentclimate.stripe.com

Testing Bias Protections AI Safety Guardrails Human-in-the-Loop

Last updated: 2026-03-12

AI Voice+

Overview

The Agent Verifier is a conceptual security framework that ensures AI agents operating within the AI Voice+ platform are trustworthy, sandboxed, and auditable. This document maps the 18 verifier concepts to our actual implementation.

Implementation Status

✅ Already Implemented

Verifier Concept

Our Implementation

Code Location

Agent Identity & Provenance

Org-scoped ai_agents table with unique IDs, API key hashing (SHA-256) for MCP

ai_agents table, mcp-server/index.ts

Capability Declaration

agent_one_tools table declares per-workspace tools with type and definition

agent_one_tools table

Prompt & Policy Compliance

10-pattern injection scanning, expanded safety preamble with bias/fairness/boundaries

agent-one-chat/index.ts, convo-chat/index.ts

Tool & API Sandboxing

Per-request context isolation, org-scoped queries, daily quotas (100 msgs/day)

mcp-server/index.ts (RequestContext class)

DLP (Data Leakage Prevention)

5-pattern PII redaction on input AND output (CC, SSN, email, phone, UK NINO)

All chat edge functions

Audit Logs

ai_usage_logs table tracks moderation blocks, injection detections, and usage

ai_usage_logs table

Human-in-the-Loop

Content moderation blocks with safety refusals; fail-closed moderation

moderateContent() in chat functions

Self-Restricting Behavior

SAFETY_PREAMBLE includes: "ask for clarification rather than guessing", "may decline tasks outside capabilities"

System prompts

Rate Limiting

IP-based (30/15min) + org-based daily quotas; fail-closed rate limiter

_shared/rate-limit.ts

🔮 Planned (Future Roadmap)

Verifier Concept

Status

Notes

Multi-Agent Cross-Verification

Planned

Requires multi-model voting system; would use agent transfer rules

Agent Reputation/Trust Scores

Planned

Needs historical behavior data collection over time

Behavioral Drift Detection

Planned

Requires baseline behavior collection and comparison

Certification Badges

Planned

UI feature showing agent compliance status

Version Control & Rollback

Planned

Agent configuration versioning with rollback capability

Automated Red-Teaming

Planned

Periodic injection testing against live agents

Verification Architecture

User Request
    │
    ▼
┌─────────────────┐
│  Rate Limiter    │  ← IP-based, fail-closed
│  (Layer 1)       │
└────────┬────────┘
         │
    ▼
┌─────────────────┐
│  Input Validator │  ← Length, type, UUID format
│  (Layer 2)       │
└────────┬────────┘
         │
    ▼
┌─────────────────┐
│  Sanitizer       │  ← Control char stripping
│  (Layer 3)       │
└────────┬────────┘
         │
    ▼
┌─────────────────┐
│  Injection Scan  │  ← 10 regex patterns, safe-wrapping
│  (Layer 4)       │
└────────┬────────┘
         │
    ▼
┌─────────────────┐
│  PII Redactor    │  ← 5 PII patterns on input
│  (Layer 5)       │
└────────┬────────┘
         │
    ▼
┌─────────────────┐
│  Content Mod     │  ← AI gateway, fail-closed
│  (Layer 6)       │
└────────┬────────┘
         │
    ▼
┌─────────────────┐
│  Auth & Org      │  ← JWT verification, org membership
│  (Layer 7)       │
└────────┬────────┘
         │
    ▼
┌─────────────────┐
│  Daily Quota     │  ← Per-org message limits
│  (Layer 8)       │
└────────┬────────┘
         │
    ▼
┌─────────────────┐
│  AI Model Call   │  ← Safety preamble + context
│  (Layer 9)       │
└────────┬────────┘
         │
    ▼
┌─────────────────┐
│  Output PII      │  ← PII redaction on response
│  Redaction       │
│  (Layer 10)      │
└────────┬────────┘
         │
    ▼
┌─────────────────┐
│  Output Mod      │  ← Content moderation on response
│  (Layer 11)      │
└────────┬────────┘
         │
    ▼
  Response to User

MCP Server Security

The MCP (Model Context Protocol) server uses a RequestContext class instead of global state to prevent cross-tenant data leaks during concurrent requests:

Each request authenticates via SHA-256 hashed API key
Context (Supabase client, org ID, user ID) is stored per-request
All tool handlers read from the request-scoped context
Org-scoped queries prevent data access across tenants

How Existing Safety Layers Map to Verifier Concepts

Safety Layer

Verifier Concept

INJECTION_PATTERNS (10 patterns)

Prompt & Policy Compliance

PII_PATTERNS (5 patterns)

DLP / Data Leakage Prevention

SAFETY_PREAMBLE (bias, boundaries, honesty)

Policy Compliance + Self-Restriction

moderateContent() (fail-closed)

Human-in-the-Loop (automated)

RequestContext class

Tool Sandboxing

ai_usage_logs audit entries

Audit Logs

checkRateLimit() (fail-closed)

Rate Limiting

BLOCKED_VOICE_PHRASES

DLP for Voice

Error masking (generic messages)

Information Disclosure Prevention

encrypt_sensitive() / decrypt_sensitive()

Data Protection at Rest

🚀 What's Next? (see Roadmap)

Give feedback and suggest new ideas for Lisaiceland.future.lisaiceland.com

Previous50 Competitive Advantages NextContent Moderation

Last updated 2 hours ago

hashtag⚡Responsible. We Care.

hashtagAI Voice+

hashtagOverview

hashtagImplementation Status

hashtag✅ Already Implemented

hashtag🔮 Planned (Future Roadmap)

hashtagVerification Architecture

hashtagMCP Server Security

hashtagHow Existing Safety Layers Map to Verifier Concepts

hashtag🚀 What's Next? (see Roadmap)

⚡Responsible. We Care.

AI Voice+

Overview

Implementation Status

✅ Already Implemented

🔮 Planned (Future Roadmap)

Verification Architecture

MCP Server Security

How Existing Safety Layers Map to Verifier Concepts

🚀 What's Next? (see Roadmap)