Security & Trust Model

This document describes the unified security and trust model across the full Mnemom stack: the Agent Alignment Protocol (AAP), the Agent Integrity Protocol (AIP), and smoltbot — the gateway that ties them together. It covers what data is stored, how authentication works, how data is encrypted, and what threats the system does and does not defend against.

Trust Model Overview

Mnemom provides transparency, not guarantees. The two protocols serve complementary roles:

AAP provides post-hoc verification. It records agent decisions as AP-Traces, detects behavioral drift, and verifies traces against declared Alignment Cards. AAP operates after the action has been taken — typically 30-60 seconds later.
AIP provides real-time detection. It analyzes the agent’s thinking blocks between turns, producing integrity verdicts before the next action executes. AIP operates in the ~2-second window between agent reasoning and agent action.

Together they provide full-coverage observability: AIP catches problems as the agent thinks; AAP catches problems after the agent acts. The cross-protocol linkage (IntegrityCheckpoint.linked_trace_id references APTrace.trace_id) creates a complete audit chain from reasoning to decision.

AAP is a transparency protocol, not a trust protocol. A verified trace means the agent’s declared behavior is internally consistent — it does not mean the agent is trustworthy, safe, or aligned with human values. Verification checks consistency, not correctness.

smoltbot is the operational layer that runs both protocols. It is a Cloudflare Workers gateway that intercepts LLM API traffic, extracts thinking blocks for AIP analysis, generates AP-Traces for AAP verification, and enforces configurable enforcement modes based on the combined results.

Data Handling

What Is Stored

Data Type	Description	Storage Location
AP-Traces	Structured decision records: what action the agent took, which values were applied, what alternatives were considered, and whether escalation was evaluated	Supabase Postgres
Integrity Checkpoints	Verdict (`clear`, `review_needed`, `boundary_violation`), concern categories, severity, reasoning summary, and a SHA-256 hash of the thinking block	Supabase Postgres
Alignment Cards	Agent identity declarations: values, autonomy envelope, escalation triggers, audit commitment	Supabase Postgres
Drift Alerts	Alerts generated when behavioral similarity drops below threshold over sustained turns (AAP) or when integrity patterns degrade within a session window (AIP)	Supabase Postgres
Enforcement Records	Records of enforcement actions taken (observe, nudge, enforce) in response to violations	Supabase Postgres

What Is NOT Stored

Data Type	Why Not
Prompts	User prompts are not persisted by the Mnemom infrastructure. They pass through the gateway transiently.
Responses	LLM responses are not persisted. Only the structured trace and checkpoint are retained.
Thinking block content	The full thinking block text is never stored. Only its SHA-256 hash (`thinking_block_hash`) is retained in the Integrity Checkpoint for content-addressed reference.
API keys	API keys are never stored in plaintext. Only SHA-256 hashes are stored, used for agent identification and billing attribution.

The SHA-256 hash of the thinking block serves two purposes: it provides tamper evidence (proving that a specific thinking block was analyzed) and it enables correlation (linking a checkpoint back to the exact content that was evaluated) — all without retaining the agent’s private reasoning.

Authentication & Authorization

The Mnemom API supports three authentication patterns, each serving a different use case.

Bearer Token (Supabase JWT)

Used for authenticated user operations — dashboard access, card management, agent configuration.

Header	Value
`Authorization`	`Bearer <supabase-jwt>`

The JWT is issued by Supabase Auth and contains the user’s identity, organization membership, and role. Tokens are validated on every request.

API Key (`x-api-key` header)

Used for programmatic access — billing API calls, automated agent registration, CI/CD integrations.

Header	Value
`x-api-key`	`<api-key>`

API keys are scoped to a user or organization. The key itself is never stored; only its SHA-256 hash is persisted for lookup and validation.

Service Role Key

Used for internal and administrative operations — database migrations, system maintenance, admin export endpoints. Service role access bypasses row-level security.

Service role keys must never be exposed to client-side code or included in agent configurations. They are intended exclusively for server-side administrative operations.

Organization RBAC

Resources are scoped to organizations. Each organization member holds one of three roles:

Role	Capabilities
Owner	Full access. Can delete the organization, manage billing, transfer ownership.
Admin	Can manage members, invite users, configure agents, manage API keys, view billing.
Member	Can view agents, traces, and checkpoints within the organization. Cannot modify configuration or manage members.

Role-based access is enforced at the API layer via Supabase row-level security policies. Organization-scoped API keys inherit the permissions of the role they were created under.

Encryption

In Transit

All endpoints enforce TLS 1.2+. This applies to:

The smoltbot gateway (Cloudflare Workers edge network)
The Mnemom API (Supabase edge functions)
Webhook signal delivery (AIP POST to registered endpoints, authenticated with HMAC-SHA256)
Cross-protocol communication between AAP and AIP components

Cloudflare’s edge network terminates TLS and re-encrypts traffic to origin servers. Certificate management is handled automatically.

At Rest

Database storage uses Supabase Postgres with encryption at rest. This covers all persisted data: AP-Traces, Integrity Checkpoints, Alignment Cards, drift alerts, enforcement records, and user/organization data.

Thinking Block Privacy

Thinking blocks receive special treatment. The full thinking block content is never written to any persistent storage — not in the database, not in logs, not in object storage. The analysis flow is:

The thinking block is extracted from the LLM response (in the gateway or SDK)
The thinking block is sent to the analysis LLM for evaluation
The analysis result is captured as an Integrity Checkpoint
The thinking block content is discarded
Only the SHA-256 hash is retained in the checkpoint

The IntegrityConcern.evidence field may contain a short excerpt (recommended maximum 200 characters) from the thinking block to support human review of flagged concerns. This is an intentional tradeoff between privacy and reviewability. Implementations may configure evidence redaction for maximum privacy.

Threat Model

The Mnemom stack is designed with explicit awareness of what it can and cannot defend against. The following threat categories are addressed.

Adversarial Agents

An agent can produce perfect AP-Traces and pass every integrity checkpoint while acting against its principal’s interests. If the agent controls what it logs and reasons in ways not captured by the thinking block, the protocols provide visibility into what the agent declares and exposes, not what it actually does. AAP provides visibility, not prevention. AIP provides detection, not certainty. Together they raise the cost of deception — an adversarial agent must maintain consistency across both real-time thinking analysis and post-hoc trace verification — but they do not make deception impossible.

Do not treat protocol compliance as evidence of alignment. A verified trace and a clear integrity verdict mean the agent’s observable behavior is internally consistent with its declarations. They do not mean the agent is safe.

Prompt Injection

AIP detects injection patterns in thinking blocks. When an agent’s reasoning reveals it is responding to injected instructions (e.g., instructions embedded in tool output rather than the user’s original request), the analysis LLM flags the concern and the checkpoint verdict reflects it. However, AIP operates after the agent has already processed the injected content. The injection has occurred. AIP detects its effects on reasoning and can intervene before the action lands (in gateway or enforce mode), but it does not prevent the injected content from reaching the model in the first place.

Replay Attacks

Trace IDs (trace_id) and timestamps prevent AP-Trace replay. Each trace is uniquely identified and temporally bound. Integrity Checkpoint IDs (checkpoint_id, format ic-{uuid}) are unique per checkpoint. AIP webhook signals include timestamps, and receivers should reject signals older than a configurable threshold (recommended: 5 minutes). HMAC-SHA256 signatures on webhooks prevent signal forgery.

LLM-as-Judge Limits

AIP uses an LLM to evaluate another LLM’s thinking. This introduces a fundamental ceiling: the analysis model can be fooled by sophisticated adversarial content crafted specifically to manipulate the evaluation. This is the meta-injection threat — the content under analysis attempts to subvert the analysis itself. Mitigations include structured output validation, explicit meta-injection warnings in the conscience prompt, and using a separate model instance (ideally a different model family) for analysis. These raise the bar but do not eliminate the risk.

Detection accuracy is bounded by the analysis model’s capability. A smaller, cheaper analysis model will miss subtleties that a larger model would catch. AIP provides probabilistic detection, not mathematical proof of alignment.

Model Dependency

AIP requires models that expose thinking blocks. Current provider support:

Provider	Thinking Source	Extraction Confidence
Anthropic	Native `thinking` content blocks	1.0
OpenAI	`reasoning_content` field (GPT-5 Thinking)	0.9
Google Gemini	Thinking content parts	0.9
Fallback	Regex-based inference from response text	0.3

Models without exposed thinking blocks cannot be analyzed by AIP. In this case, the system generates synthetic clear verdicts — the agent proceeds, but without integrity analysis. This is a known gap: models with opaque reasoning are invisible to AIP.

Retention & Audit

Configurable Retention

Trace and checkpoint retention is configurable via the audit_commitment.retention_days field in the Alignment Card. This field declares how long the agent’s operator commits to retaining audit data.

Use Case	Recommended Retention
Development / testing	7-30 days
Production general purpose	90 days
EU AI Act compliance	90+ days (see EU Compliance)
Enterprise / regulated	365+ days

Queryability

AP-Traces and Integrity Checkpoints are queryable via the Mnemom API:

GET /agents/{agent_id}/traces — Retrieve traces for an agent
GET /traces/{trace_id} — Retrieve a specific trace
GET /agents/{agent_id}/checkpoints — Retrieve integrity checkpoints for an agent
GET /agents/{agent_id}/checkpoints/{checkpoint_id} — Retrieve a specific checkpoint
GET /drift/{agent_id} — Retrieve drift analysis for an agent

Compliance Exports

Enterprise customers have access to compliance export endpoints for bulk data retrieval:

GET /billing/export/usage — Export usage data for billing and compliance reporting

Admin export endpoints are available for organization-level data extraction, supporting regulatory audit requirements.

Responsible Disclosure

If you discover a security vulnerability in any Mnemom component, please report it responsibly via GitHub Security Advisories on the relevant repository:

Component	Repository
Agent Alignment Protocol	github.com/mnemom/aap
Agent Integrity Protocol	github.com/mnemom/aip
smoltbot Gateway	github.com/mnemom/smoltbot

Do not open public GitHub issues for security vulnerabilities. Use the Security Advisories feature (Security tab on each repository) to report vulnerabilities privately. We will acknowledge receipt within 48 hours and aim to provide a fix or mitigation within 7 days.

Guides

Security & Trust Model

Security & Trust Model

Trust Model Overview

Data Handling

What Is Stored

What Is NOT Stored

Authentication & Authorization

Bearer Token (Supabase JWT)

API Key (`x-api-key` header)

Service Role Key

Organization RBAC

Encryption

In Transit

At Rest

Thinking Block Privacy

Threat Model

Adversarial Agents

Prompt Injection

Replay Attacks

LLM-as-Judge Limits

Model Dependency

Retention & Audit

Configurable Retention

Queryability

Compliance Exports

Responsible Disclosure

Further Reading

Guides

​Security & Trust Model

​Trust Model Overview

​Data Handling

​What Is Stored

​What Is NOT Stored

​Authentication & Authorization

​Bearer Token (Supabase JWT)

​API Key (x-api-key header)

​Service Role Key

​Organization RBAC

​Encryption

​In Transit

​At Rest

​Thinking Block Privacy

​Threat Model

​Adversarial Agents

​Prompt Injection

​Replay Attacks

​LLM-as-Judge Limits

​Model Dependency

​Retention & Audit

​Configurable Retention

​Queryability

​Compliance Exports

​Responsible Disclosure

​Further Reading

Security & Trust Model

Trust Model Overview

Data Handling

What Is Stored

What Is NOT Stored

Authentication & Authorization

Bearer Token (Supabase JWT)

API Key (`x-api-key` header)

Service Role Key

Organization RBAC

Encryption

In Transit

At Rest

Thinking Block Privacy

Threat Model

Adversarial Agents

Prompt Injection

Replay Attacks

LLM-as-Judge Limits

Model Dependency

Retention & Audit

Configurable Retention

Queryability

Compliance Exports

Responsible Disclosure

Further Reading