Documentation Index
Fetch the complete documentation index at: https://docs.mnemom.ai/llms.txt
Use this file to discover all available pages before exploring further.
XFD security architecture
Mnemom’s XFD system is a multi-layer security envelope around every agent interaction. It combines two gates — the Context Front Door (CFD) and the Context Back Door (CBD) — with a synchronous conscience check (AIP) running inside the agent’s reasoning loop. Together they form the Safe House: a protected execution environment with isolation, integrity attestation via ZK proofs, confidentiality via per-org encrypted storage, and remote auditability via the alignment card and proof chain.
“XFD” is the collective name. Any door, any direction.
The Safe House model
╔══════════════════════════════════╗
║ SAFE HOUSE ║
║ ║
Inbound ──[ CFD ]─────►║ AGENT ║────[ CBD ]──► Outbound
Front ║ + AIP Conscience (sync) ║ Back
Door ║ + Alignment Card ║ Door
║ + Canary Alarms ║
║ + ZK Proof Chain ║
║ ║
╚══════════════════════════════════╝
- CFD screens every inbound message before the agent processes it. Threats are caught before they reach the LLM.
- CBD screens every outbound response before it reaches the caller. Catches what the agent should not have produced.
- AIP runs a synchronous conscience analysis inside the Safe House on the agent’s thinking trace and output. It is the last line of defense against subtle reasoning compromise.
- Alignment Cards define the agent’s intent: its values, autonomy envelope, escalation triggers, and audit commitment. Every proof binds an execution to a specific card version.
- ZK proof chain makes the audit trail independently verifiable. Third parties can reconstruct and verify the sequence of decisions without trusting Mnemom.
The Safe House is an application-layer analog to a Trusted Execution Environment. Isolation is enforced at the gateway; integrity is attested by cryptographic verdicts; confidentiality is per-org by default; remote auditability is the whole point.
The request pipeline
INBOUND MESSAGE
│
▼
[ CFD FAST PATH ] ~4ms $0 always on
├── Preprocessor Unicode normalization, emoji strip, encoding detection
├── PatternMatcher Injection, spoofing patterns
├── SignalScorer BEC, social engineering — 10-language word lists
├── DLPScanner Inbound PII/credential detection
├── FingerprintMatcher MinHash similarity against multilingual card corpus
├── CanaryMatcher Canary values referenced inbound
└── SessionTracker Cross-message risk escalation
│
├─ verdict ≥ block → HTTP 403, no content stored ──► exit
├─ verdict = quarantine → HTTP 400, encrypt + store ──► exit
│
▼
[ CFD SEMANTIC ] ~200ms paid enforce only conditional
└── SemanticAnalyzer (Haiku) — triggered when score ≥ 0.4
OR surface = tool_result
OR non-Latin-script language detected
│
▼
[ QUOTA + POLICY ] ~2ms KV-cached
│
▼
[ AI PROVIDER ] 500–5000ms streaming ← dominates user latency
│
▼
[ CBD FAST PATH ] ~3ms $0 always on per stream chunk
├── CanaryMatcher Canary in output → interrupt stream, P0 alert
├── DLPScanner Outbound PII/PHI (HIPAA-extended)
└── PromptLeakDetector System prompt signature in output
│
▼
RESPONSE DELIVERED TO CALLER ◄── user latency ends here
│
▼
[ ASYNC — waitUntil, never blocks ]
├── AIP + CBD SEMANTIC (one combined call):
│ Conscience check | LaunderDetector
│ RegComplianceChecker | Prompt-leak semantic
├── Observer logging (Tier-2 signals, batched)
├── Usage increment (billing)
├── Session state update (KV)
└── Card dedup check (if new pattern candidate flagged)
On Enforce and Sovereign tiers, the AIP + CBD combined call moves sync — it runs between CBD fast path and response delivery. The caller waits 1–3s longer; real-time blocking is enabled.
The detector registry
XFD is a named registry of detectors. There are no layers — the legacy L1/L2/L3 terminology is retired. Each detector contributes evidence independently, and the Verdict Engine combines them.
CFD detectors
| Detector | Modality | Latency | Cost | Always on | What it catches |
|---|
PatternMatcher | Regex | ~1ms | $0 | Yes | Injection, spoofing, explicit exfiltration commands |
SignalScorer | Word-signal scoring | ~1ms | $0 | Yes | BEC fraud, social engineering — 10 language word lists |
DLPScanner | Format regex + Luhn | ~1ms | $0 | Yes | Inbound PII and credentials |
FingerprintMatcher | MinHash LSH | ~2ms | $0 | Yes | Near-duplicate known attacks; all scripts/languages |
CanaryMatcher | Exact string | <1ms | $0 | Yes | Canary values referenced inbound |
SessionTracker | Score windowing | ~2ms | $0 | Yes | Cross-message escalation (risk multiplier) |
SemanticAnalyzer | LLM (Haiku) | ~200ms | ~$0.0003 | No | Indirect injection, hijack, nuanced threats, any language |
CBD detectors
| Detector | Modality | Latency | Cost | Sync? | What it catches |
|---|
CanaryMatcher | Exact string | <1ms | $0 | Yes | Canary in output — definitive compromise proof |
DLPScanner | Format regex + HIPAA 18 | ~2ms | $0 | Yes | PII/PHI in output, credential leakage |
PromptLeakDetector | Signature match | ~1ms | $0 | Yes | System prompt content in output |
LaunderDetector | LLM (via AIP) | merged | merged | async* | CFD-warned message influenced output |
RegComplianceChecker | LLM (via AIP) | merged | merged | async* | HIPAA, SEC, COPPA violations |
* Sync on Sovereign / Regulated tier.
The AIP + CBD combined call
AIP already sees the thinking trace, conversation context, and the agent’s output. Rather than a separate CBD semantic call, the AIP prompt is extended to return both:
{
"conscience_verdict": "pass | warn | flag",
"conscience_concerns": [],
"cbd_verdict": "pass | warn | hold | block",
"cbd_threats": [{ "type": "...", "confidence": 0.0 }],
"compliance_flags": []
}
One LLM call covers both conscience analysis and CBD semantic screening. This eliminates a redundant round-trip at the most expensive point in the pipeline.
The Verdict Engine
Detector outputs are combined into a single decision:
overall_risk = max(detector_confidences)
+ min((unique_threat_types - 1) × 0.05, 0.15) ← multi-threat bonus
× session_multiplier ← from SessionTracker
verdict = threshold_lookup(overall_risk, agent_config)
Default thresholds (configurable per agent and org):
| Verdict | Default | Conservative | Sovereign |
|---|
warn | 0.60 | 0.50 | 0.50 |
quarantine | 0.80 | 0.70 | 0.70 |
block | 0.95 | 0.90 | 0.85 |
Session multiplier: two or more messages scoring ≥ 0.6 in the last 10 minutes → 1.15×. Three or more → 1.30×.
Evaluation records are persisted as Tier-2 signals. They contain the verdict, the combined risk, per-detector scores, and the detection sources — never the original payload.
The trust continuum
Security is a user-configured dial. Every element adds latency and cost. Operators choose where they sit on the continuum; Mnemom engineers the full stack.
OpenClaw Developer Standard Enforce Sovereign Regulated
(Free) (Free) (Paid) (Paid+) (Enterprise) (Enterprise)
CFD fast CFD fast CFD full CFD sync CFD sync CFD sync
CBD fast CBD fast AIP async AIP async AIP sync AIP sync
No AIP No AIP CBD async CBD sync CBD sync CBD + regs
via AIP via AIP
~9ms ~9ms ~9ms ~9ms+1–3s ~9ms+1–3s ~9ms+1–3s
$0 $0 $$ $$$ $$$$ Custom
Free-tier guarantee. Every agent — even on OpenClaw — gets CFD fast path plus CBD fast path. Zero LLM calls. ~9ms added latency. $0 marginal cost. Catches roughly 65–70% of threats deterministically.
The 1–3s on Enforce+ is intentional. Safety-critical agents pay for synchronous verification. This is not a bug to engineer away — it is the price of the guarantee. Operators see the cost transparently and choose their tier.
Mode names
| Mode | Behavior |
|---|
off | No XFD |
observe | All detectors async; no blocking; full logging |
enforce | CFD fast path sync; AIP + CBD async |
enforce_sync | Everything sync including AIP + CBD; 1–3s added latency |
sovereign | enforce_sync + regulated compliance checks |
Isolation model
The Safe House is not a runtime sandbox around the LLM provider — Mnemom does not host the model. Isolation is enforced at the control-plane boundary:
- Per-org encryption keys protect Tier-3 content (quarantined messages, canary evidence). Keys are held in Supabase Vault, scoped per organization, and rotated on compromise signal.
- Row-level security enforces tenant isolation in Postgres. Every table carrying traces, checkpoints, quarantine records, or cards is scoped by
org_id with RLS policies.
- Gateway isolation — each org’s traffic is tagged at ingress with a workload identity that flows into every downstream call (CFD, AIP, AAP, observer). Cross-tenant bleed is prevented at the query layer, not just the application layer.
- Canary credentials — fake but format-valid credentials are planted in agent system prompts. Values are high-entropy and unique; they cannot be guessed, only leaked. A canary match in any output is definitive proof of system-prompt extraction.
- ZK proof chain — every verdict is signed with Ed25519, hash-chained, and (for the Integrity Protocol) accompanied by a STARK proof that the verdict was correctly derived from the analysis output. See AIP verifiable verdicts.
Latency and cost tradeoffs
The pipeline is designed so the free tier pays nothing and still catches most threats deterministically. Semantic analysis is conditional and paid-only, so operators never pay for LLM calls that fast-path detectors have already settled.
| Stage | Typical cost per request | Latency impact |
|---|
| CFD fast path | $0 | ~4ms |
| CFD semantic (conditional) | ~$0.0003 | ~200ms when triggered |
| CBD fast path | $0 | ~3ms (per chunk) |
| AIP + CBD semantic async | ~$0.001 | 0ms to user |
| AIP + CBD semantic sync (Enforce+) | ~$0.001 | +1–3s to user |
Fast-path coverage alone catches BEC fraud, canary exfiltration, PII leakage, system-prompt leakage, and the majority of known injection payloads — all at zero marginal cost.
Storage model
Three tiers, aligned to the regulatory posture. Content is stored only when required, encrypted at rest, and subject to defined TTLs.
| Tier | What is stored | Retention | GDPR scope |
|---|
| Tier 1 — Commitment | SHA-256(content + timestamp + agent_id + session_id) and verdict. No personal data. | Permanent | Out of scope (Article 19 satisfied) |
| Tier 2 — Signal | Verdict, detector scores, threat types, surface, pseudonymized session, duration. No content. | 30–90 days | Article 5(1)(e) — defined TTL |
| Tier 3 — Content | Actual message or response content. Encrypted AES-256-GCM per-org. CFD: only on quarantine. CBD: default off (analyze in flight, discard); exception for canary hit or regulated enterprise hold. | 30 days → auto-delete | Article 17 — erasable |
For compliance detail, see Compliance posture and EU AI Act compliance.
Multilingual coverage
XFD supports 10 languages (EN, FR, DE, IT, ES, PT, JA, ZH, AR, KO). The design principle: maximize language-agnostic coverage; localize only what must be localized; use the LLM as the universal fallback.
| Detector | Language dependency | How handled |
|---|
DLPScanner | None — format patterns are language-agnostic | Works as-is |
FingerprintMatcher | None — character-trigram MinHash works on any Unicode script | Works as-is |
CanaryMatcher | None — exact string match | Works as-is |
SessionTracker | None — score-based | Works as-is |
SemanticAnalyzer | None — LLM handles 100+ languages natively | Works as-is |
SignalScorer | Per-language word lists | 10 language packs |
PatternMatcher | Latin-script: shared normalized patterns. Non-Latin: structural ASCII markers only. | Semantic always triggered for non-Latin |
Non-Latin-script content always triggers the SemanticAnalyzer, regardless of fast-path score, because regex and word-signal detectors are weaker there. The LLM compensates.
The card corpus is multilingual at build time, not runtime: when a new pattern is confirmed in one language, Arena V2 auto-generates equivalents in the other nine and fingerprints them as child cards. FingerprintMatcher then catches cross-language variants with no language-specific runtime code.
What XFD does not protect against
Being explicit about limits is as important as documenting coverage.
- Agent reasoning errors. If the agent misinterprets a legitimate request and takes a harmful action, CFD does not catch this — the input was benign. AIP’s conscience check is the layer that reasons about the agent’s reasoning; see AIP specification.
- Novel indirect injection. Semantically indistinguishable injection in large tool results may pass the fast path and the semantic layer. FingerprintMatcher reduces the gap, and Arena V2 promotes confirmed patterns into the corpus, but novel, carefully crafted payloads remain a residual risk.
- Model-level compromise. If the underlying LLM has been fine-tuned or otherwise modified to comply with certain instruction patterns, pattern detection at the gateway doesn’t change the model’s behavior — only whether the payload reaches it in block mode.
- Tool-call side channels. CFD screens inbound text; CBD screens outbound text. Tool call arguments are currently not covered by the synchronous path. The Agent Activity Protocol records all tool calls for post-hoc analysis (~60s latency); event-driven AAP screening on Enforce+ is on the roadmap.
For the full threat-class breakdown mapped to MITRE ATLAS, see Safe House threat model.
See also