Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mnemom.ai/llms.txt

Use this file to discover all available pages before exploring further.

XFD security architecture

Mnemom’s XFD system is a multi-layer security envelope around every agent interaction. It combines two gates — the Context Front Door (CFD) and the Context Back Door (CBD) — with a synchronous conscience check (AIP) running inside the agent’s reasoning loop. Together they form the Safe House: a protected execution environment with isolation, integrity attestation via ZK proofs, confidentiality via per-org encrypted storage, and remote auditability via the alignment card and proof chain. “XFD” is the collective name. Any door, any direction.
This page is the canonical reference for the full pipeline. For attack-class detail on what CFD catches, see Safe House threat model. For the broader AAP and AIP threat analysis, see Security and trust model.

The Safe House model

                       ╔══════════════════════════════════╗
                       ║           SAFE HOUSE             ║
                       ║                                  ║
Inbound ──[ CFD ]─────►║  AGENT                           ║────[ CBD ]──► Outbound
               Front   ║    + AIP Conscience (sync)       ║   Back
                Door   ║    + Alignment Card              ║   Door
                       ║    + Canary Alarms               ║
                       ║    + ZK Proof Chain              ║
                       ║                                  ║
                       ╚══════════════════════════════════╝
  • CFD screens every inbound message before the agent processes it. Threats are caught before they reach the LLM.
  • CBD screens every outbound response before it reaches the caller. Catches what the agent should not have produced.
  • AIP runs a synchronous conscience analysis inside the Safe House on the agent’s thinking trace and output. It is the last line of defense against subtle reasoning compromise.
  • Alignment Cards define the agent’s intent: its values, autonomy envelope, escalation triggers, and audit commitment. Every proof binds an execution to a specific card version.
  • ZK proof chain makes the audit trail independently verifiable. Third parties can reconstruct and verify the sequence of decisions without trusting Mnemom.
The Safe House is an application-layer analog to a Trusted Execution Environment. Isolation is enforced at the gateway; integrity is attested by cryptographic verdicts; confidentiality is per-org by default; remote auditability is the whole point.

The request pipeline

INBOUND MESSAGE


[ CFD FAST PATH ]   ~4ms   $0   always on
  ├── Preprocessor         Unicode normalization, emoji strip, encoding detection
  ├── PatternMatcher       Injection, spoofing patterns
  ├── SignalScorer         BEC, social engineering — 10-language word lists
  ├── DLPScanner           Inbound PII/credential detection
  ├── FingerprintMatcher   MinHash similarity against multilingual card corpus
  ├── CanaryMatcher        Canary values referenced inbound
  └── SessionTracker       Cross-message risk escalation

      ├─ verdict ≥ block       → HTTP 403, no content stored  ──► exit
      ├─ verdict = quarantine  → HTTP 400, encrypt + store    ──► exit


[ CFD SEMANTIC ]   ~200ms   paid enforce only   conditional
  └── SemanticAnalyzer (Haiku) — triggered when score ≥ 0.4
      OR surface = tool_result
      OR non-Latin-script language detected


[ QUOTA + POLICY ]   ~2ms   KV-cached


[ AI PROVIDER ]   500–5000ms   streaming  ← dominates user latency


[ CBD FAST PATH ]   ~3ms   $0   always on   per stream chunk
  ├── CanaryMatcher        Canary in output → interrupt stream, P0 alert
  ├── DLPScanner           Outbound PII/PHI (HIPAA-extended)
  └── PromptLeakDetector   System prompt signature in output


RESPONSE DELIVERED TO CALLER  ◄── user latency ends here


[ ASYNC — waitUntil, never blocks ]
  ├── AIP + CBD SEMANTIC (one combined call):
  │     Conscience check | LaunderDetector
  │     RegComplianceChecker | Prompt-leak semantic
  ├── Observer logging (Tier-2 signals, batched)
  ├── Usage increment (billing)
  ├── Session state update (KV)
  └── Card dedup check (if new pattern candidate flagged)
On Enforce and Sovereign tiers, the AIP + CBD combined call moves sync — it runs between CBD fast path and response delivery. The caller waits 1–3s longer; real-time blocking is enabled.

The detector registry

XFD is a named registry of detectors. There are no layers — the legacy L1/L2/L3 terminology is retired. Each detector contributes evidence independently, and the Verdict Engine combines them.

CFD detectors

DetectorModalityLatencyCostAlways onWhat it catches
PatternMatcherRegex~1ms$0YesInjection, spoofing, explicit exfiltration commands
SignalScorerWord-signal scoring~1ms$0YesBEC fraud, social engineering — 10 language word lists
DLPScannerFormat regex + Luhn~1ms$0YesInbound PII and credentials
FingerprintMatcherMinHash LSH~2ms$0YesNear-duplicate known attacks; all scripts/languages
CanaryMatcherExact string<1ms$0YesCanary values referenced inbound
SessionTrackerScore windowing~2ms$0YesCross-message escalation (risk multiplier)
SemanticAnalyzerLLM (Haiku)~200ms~$0.0003NoIndirect injection, hijack, nuanced threats, any language

CBD detectors

DetectorModalityLatencyCostSync?What it catches
CanaryMatcherExact string<1ms$0YesCanary in output — definitive compromise proof
DLPScannerFormat regex + HIPAA 18~2ms$0YesPII/PHI in output, credential leakage
PromptLeakDetectorSignature match~1ms$0YesSystem prompt content in output
LaunderDetectorLLM (via AIP)mergedmergedasync*CFD-warned message influenced output
RegComplianceCheckerLLM (via AIP)mergedmergedasync*HIPAA, SEC, COPPA violations
* Sync on Sovereign / Regulated tier.

The AIP + CBD combined call

AIP already sees the thinking trace, conversation context, and the agent’s output. Rather than a separate CBD semantic call, the AIP prompt is extended to return both:
{
  "conscience_verdict": "pass | warn | flag",
  "conscience_concerns": [],
  "cbd_verdict": "pass | warn | hold | block",
  "cbd_threats": [{ "type": "...", "confidence": 0.0 }],
  "compliance_flags": []
}
One LLM call covers both conscience analysis and CBD semantic screening. This eliminates a redundant round-trip at the most expensive point in the pipeline.

The Verdict Engine

Detector outputs are combined into a single decision:
overall_risk = max(detector_confidences)
             + min((unique_threat_types - 1) × 0.05, 0.15)   ← multi-threat bonus
             × session_multiplier                              ← from SessionTracker

verdict = threshold_lookup(overall_risk, agent_config)
Default thresholds (configurable per agent and org):
VerdictDefaultConservativeSovereign
warn0.600.500.50
quarantine0.800.700.70
block0.950.900.85
Session multiplier: two or more messages scoring ≥ 0.6 in the last 10 minutes → 1.15×. Three or more → 1.30×. Evaluation records are persisted as Tier-2 signals. They contain the verdict, the combined risk, per-detector scores, and the detection sources — never the original payload.

The trust continuum

Security is a user-configured dial. Every element adds latency and cost. Operators choose where they sit on the continuum; Mnemom engineers the full stack.
OpenClaw    Developer    Standard    Enforce    Sovereign    Regulated
 (Free)      (Free)      (Paid)     (Paid+)   (Enterprise) (Enterprise)

CFD fast    CFD fast    CFD full    CFD sync    CFD sync    CFD sync
CBD fast    CBD fast    AIP async   AIP async   AIP sync    AIP sync
No AIP      No AIP      CBD async   CBD sync    CBD sync    CBD + regs
                                    via AIP     via AIP

~9ms        ~9ms        ~9ms        ~9ms+1–3s   ~9ms+1–3s   ~9ms+1–3s
$0          $0          $$          $$$         $$$$        Custom
Free-tier guarantee. Every agent — even on OpenClaw — gets CFD fast path plus CBD fast path. Zero LLM calls. ~9ms added latency. $0 marginal cost. Catches roughly 65–70% of threats deterministically. The 1–3s on Enforce+ is intentional. Safety-critical agents pay for synchronous verification. This is not a bug to engineer away — it is the price of the guarantee. Operators see the cost transparently and choose their tier.

Mode names

ModeBehavior
offNo XFD
observeAll detectors async; no blocking; full logging
enforceCFD fast path sync; AIP + CBD async
enforce_syncEverything sync including AIP + CBD; 1–3s added latency
sovereignenforce_sync + regulated compliance checks

Isolation model

The Safe House is not a runtime sandbox around the LLM provider — Mnemom does not host the model. Isolation is enforced at the control-plane boundary:
  • Per-org encryption keys protect Tier-3 content (quarantined messages, canary evidence). Keys are held in Supabase Vault, scoped per organization, and rotated on compromise signal.
  • Row-level security enforces tenant isolation in Postgres. Every table carrying traces, checkpoints, quarantine records, or cards is scoped by org_id with RLS policies.
  • Gateway isolation — each org’s traffic is tagged at ingress with a workload identity that flows into every downstream call (CFD, AIP, AAP, observer). Cross-tenant bleed is prevented at the query layer, not just the application layer.
  • Canary credentials — fake but format-valid credentials are planted in agent system prompts. Values are high-entropy and unique; they cannot be guessed, only leaked. A canary match in any output is definitive proof of system-prompt extraction.
  • ZK proof chain — every verdict is signed with Ed25519, hash-chained, and (for the Integrity Protocol) accompanied by a STARK proof that the verdict was correctly derived from the analysis output. See AIP verifiable verdicts.

Latency and cost tradeoffs

The pipeline is designed so the free tier pays nothing and still catches most threats deterministically. Semantic analysis is conditional and paid-only, so operators never pay for LLM calls that fast-path detectors have already settled.
StageTypical cost per requestLatency impact
CFD fast path$0~4ms
CFD semantic (conditional)~$0.0003~200ms when triggered
CBD fast path$0~3ms (per chunk)
AIP + CBD semantic async~$0.0010ms to user
AIP + CBD semantic sync (Enforce+)~$0.001+1–3s to user
Fast-path coverage alone catches BEC fraud, canary exfiltration, PII leakage, system-prompt leakage, and the majority of known injection payloads — all at zero marginal cost.

Storage model

Three tiers, aligned to the regulatory posture. Content is stored only when required, encrypted at rest, and subject to defined TTLs.
TierWhat is storedRetentionGDPR scope
Tier 1 — CommitmentSHA-256(content + timestamp + agent_id + session_id) and verdict. No personal data.PermanentOut of scope (Article 19 satisfied)
Tier 2 — SignalVerdict, detector scores, threat types, surface, pseudonymized session, duration. No content.30–90 daysArticle 5(1)(e) — defined TTL
Tier 3 — ContentActual message or response content. Encrypted AES-256-GCM per-org. CFD: only on quarantine. CBD: default off (analyze in flight, discard); exception for canary hit or regulated enterprise hold.30 days → auto-deleteArticle 17 — erasable
For compliance detail, see Compliance posture and EU AI Act compliance.

Multilingual coverage

XFD supports 10 languages (EN, FR, DE, IT, ES, PT, JA, ZH, AR, KO). The design principle: maximize language-agnostic coverage; localize only what must be localized; use the LLM as the universal fallback.
DetectorLanguage dependencyHow handled
DLPScannerNone — format patterns are language-agnosticWorks as-is
FingerprintMatcherNone — character-trigram MinHash works on any Unicode scriptWorks as-is
CanaryMatcherNone — exact string matchWorks as-is
SessionTrackerNone — score-basedWorks as-is
SemanticAnalyzerNone — LLM handles 100+ languages nativelyWorks as-is
SignalScorerPer-language word lists10 language packs
PatternMatcherLatin-script: shared normalized patterns. Non-Latin: structural ASCII markers only.Semantic always triggered for non-Latin
Non-Latin-script content always triggers the SemanticAnalyzer, regardless of fast-path score, because regex and word-signal detectors are weaker there. The LLM compensates. The card corpus is multilingual at build time, not runtime: when a new pattern is confirmed in one language, Arena V2 auto-generates equivalents in the other nine and fingerprints them as child cards. FingerprintMatcher then catches cross-language variants with no language-specific runtime code.

What XFD does not protect against

Being explicit about limits is as important as documenting coverage.
  • Agent reasoning errors. If the agent misinterprets a legitimate request and takes a harmful action, CFD does not catch this — the input was benign. AIP’s conscience check is the layer that reasons about the agent’s reasoning; see AIP specification.
  • Novel indirect injection. Semantically indistinguishable injection in large tool results may pass the fast path and the semantic layer. FingerprintMatcher reduces the gap, and Arena V2 promotes confirmed patterns into the corpus, but novel, carefully crafted payloads remain a residual risk.
  • Model-level compromise. If the underlying LLM has been fine-tuned or otherwise modified to comply with certain instruction patterns, pattern detection at the gateway doesn’t change the model’s behavior — only whether the payload reaches it in block mode.
  • Tool-call side channels. CFD screens inbound text; CBD screens outbound text. Tool call arguments are currently not covered by the synchronous path. The Agent Activity Protocol records all tool calls for post-hoc analysis (~60s latency); event-driven AAP screening on Enforce+ is on the roadmap.
For the full threat-class breakdown mapped to MITRE ATLAS, see Safe House threat model.

See also