Documentation Index
Fetch the complete documentation index at: https://docs.mnemom.ai/llms.txt
Use this file to discover all available pages before exploring further.
A trust plane for agentic systems
The composition of post-hoc, real-time, and contextual oversight — unified behind a managed gateway and made verifiable by cryptographic attestation
Version: 4.0
Date: April 2026
Authors: Mnemom Research
License: CC BY 4.0
Abstract
We describe Mnemom as a trust plane for agentic systems: a single governance stance, composed of three complementary layers of oversight and a cryptographic attestation system that makes the oversight itself verifiable. The platform’s contribution is not any one protocol but the composition.
Three failure modes characterize the agentic internet. Behavioral failures — the agent takes an action that violates its declared alignment. Cognitive failures — the agent reasons itself toward an action outside its boundaries. Contextual failures — adversarial content reaches the agent from outside, or sensitive content leaves the agent unnoticed. A trust plane that addresses only one of these is a partial governance story. A trust plane that addresses all three must work as a whole, not as three systems stapled together.
Mnemom organizes that whole around a single integration point. The managed gateway is a drop-in proxy for LLM provider traffic. Customers point their existing provider client at the gateway base URL and, from that one configuration, receive three layers of oversight working in concert:
- The Agent Integrity Protocol (AIP) catches problems as the agent thinks. It analyzes the model’s thinking blocks between turns, producing an integrity verdict before the next action executes.
- The Agent Alignment Protocol (AAP) catches problems after the agent acts. It records structured decision traces (AP-Traces), verifies them against the agent’s declared alignment posture, and detects drift across sessions.
- Safe House catches problems that arrive from outside the agent or attempt to leave it unnoticed. It screens inbound user messages, tool results, and retrieved content for adversarial patterns, and optionally screens outbound agent output for data leakage.
The layers share one unified alignment contract — a pair of YAML cards that declare what the agent is and how it is defended, composed at storage time across platform, organization, and agent scopes.
Cryptographic attestation makes all of this independently verifiable. Every integrity checkpoint is signed with Ed25519, linked into a per-session hash chain, accumulated into a per-agent Merkle tree, and — for selected checkpoints — proven in an SP1 zkVM that the deterministic verdict-derivation rules were applied honestly. Aggregated Merkle roots and reputation scores are anchored on Base L2 via an ERC-8004-aligned registry, so third parties can verify an agent’s history without trusting Mnemom’s API.
This paper articulates Mnemom as a governance stance. Section 2 defines the three failure modes and what a trust plane owes each of them. Section 3 describes the platform architecture and the managed gateway as integration point. Section 4 introduces the unified alignment card, the contract the whole platform evaluates against. Sections 5.1–5.3 describe AAP, AIP, and Safe House as peers in that contract. Section 6 describes the cryptographic attestation chain, including the three zero-knowledge proof types (verdict derivation, individual risk, team risk). Section 7 describes team coherence and fleet governance. Section 8 briefly covers reputation, risk, and accountability surfaces. Section 9 states the security model and its explicit non-goals. Section 10 preserves and expands the limitations disclosure. Mnemom is transparency and verification infrastructure, not trust infrastructure. It makes behavior observable, it makes oversight integrity cryptographically verifiable, and it is explicit about what it cannot prove.
1. Introduction
1.1 The problem
Modern AI agents take actions on behalf of principals across systems the principal does not directly observe. The substrate that makes this practical — LLMs routing through tool surfaces, agents coordinating with other agents, autonomous transactions at machine speed — does not yet have a governance layer that lets principals, auditors, or downstream agents see what an agent is doing and verify that what they see is honest.
Four gaps leave this governance problem open. The post-hoc audit layer does not exist by default; where it exists, it is operator-controlled. The real-time reasoning layer is not addressed by current products, which observe behavior after the fact. The contextual-defense layer is fragmented across web application firewalls, guardrail libraries, and red-team products that do not share observability with the audit layer. And none of these layers, today, produce evidence an independent verifier can check without trusting the oversight service.
This paper describes a single governance stance that addresses all four gaps.
1.2 The gap in the current stack
The industry has built protocols for agent coordination:
| Protocol | Function | Limitation |
|---|
| MCP (Model Context Protocol) | Agent-to-tool connectivity | No alignment semantics |
| A2A (Agent-to-Agent Protocol) | Task negotiation | No value verification |
| AP2 (Agent Payments Protocol) | Payment authorization | No decision audit trail |
None of these answer:
- Is this agent serving its principal (the human or organization who delegated authority)?
- What alternatives did it consider before this decision?
- When will it escalate versus act autonomously?
- Are its values compatible with my agent’s values for this coordination?
- What is it thinking before it acts?
- What reached the agent and what left the agent?
- Can we prove the oversight was conducted honestly?
The last three questions reveal successively deeper gaps. Even a perfect post-hoc audit trail cannot prevent harm that occurs before the log is processed. Real-time integrity analysis introduces a new trust dependency in the analysis service itself. And both real-time and post-hoc analysis presume the content reaching the model is the content the user intended — a presumption that adversarial inputs break.
Comprehensive agent governance requires retroactive accountability, real-time intervention, defense against untrusted context, and cryptographic proof that the oversight process itself was honestly conducted. These four properties do not decompose cleanly; they are not four products. They are the facets of a single governance stance.
1.3 Design principles
The trust plane shares seven design principles:
- Transparency over guarantee. Make decisions observable, not provably correct.
- Composability. Extend existing protocols rather than replace them.
- Honest limitations. Be explicit about what cannot be provided.
- Empirical calibration. Derive thresholds from observed behavior, not theory.
- Defense in depth. Multiple layers that constrain each other’s failure modes.
- Daimonion philosophy. Silence means aligned; voice means outside boundaries. The integrity system never commands, only warns.
- Trust the math, not the service. Verification endpoints are public, certificates are self-describing, and every cryptographic check can be reproduced independently. Security relies on mathematical guarantees, not access control.
2. What a trust plane owes the agentic internet
Before describing the architecture, we define what the architecture is for. A trust plane for agentic systems must address three distinct failure modes — distinct in time, distinct in causality, and distinct in remedy. A design that collapses them into one class of problem cannot produce a coherent response to any of them.
2.1 The three failure modes
| Failure mode | What goes wrong | When it surfaces | Whose intent is involved |
|---|
| Behavioral | The agent takes an action outside its declared alignment — a forbidden tool, an unescalated high-value transaction, a value it did not declare | After the action, in the record | The agent’s |
| Cognitive | The agent’s reasoning departs from its declared values — responding to injected instructions, planning a forbidden action, degenerating into incoherent chain-of-thought | During reasoning, before the action | The agent’s |
| Contextual | Adversarial content reaches the agent, or sensitive content leaves it — prompt injection in a tool result, PII smuggled in an outbound message, a canary credential that proves an exfiltration attempt | Before the agent reads, or after the agent writes | A third party’s |
Each mode is visible at a different point in the request lifecycle and requires a different kind of evidence.
-
Behavioral evidence lives in the action stream. It is produced by logging, post-hoc verification against a declared contract, and cross-session pattern analysis. The natural primitive is the audit trail — a structured record of what was considered, what was selected, and why.
-
Cognitive evidence lives in the thinking stream. Modern reasoning models expose internal reasoning (thinking blocks, reasoning content, thought parts) between turns. The natural primitive is the integrity checkpoint — an analysis of whether the thinking is consistent with the declared alignment, performed by a separate model acting as a conscience.
-
Contextual evidence lives in the message stream entering and leaving the agent. The natural primitive is the inbound/outbound screen — a pipeline that inspects user messages, tool results, and agent outputs for adversarial patterns, hidden injection, or data leakage, before they cross the agent boundary.
2.2 Why the failure modes don’t decompose
It is tempting to ship three independent products — a logging tool, a guardrail tool, and a WAF. Several incumbent stacks are shaped this way. The approach has three structural problems.
Observability is shared. A prompt injection that slips past a WAF produces behavioral evidence downstream (the agent takes a forbidden action), cognitive evidence upstream (the thinking block reveals the injection), and contextual evidence at the entry point (the injection payload). Three independent products cannot correlate these into a single incident. One platform can.
Contracts are shared. The agent’s declared alignment — what it serves, what it may do, what it must never do — is evaluated by every layer. If the declarations live in three formats across three products, every change requires three synchronized updates, and the invariant between them is operator discipline. A single contract removes the synchronization problem by design.
Attestations are shared. When a customer shows an auditor that an agent is governed, they want one chain of evidence — not three. Ed25519 signatures over a checkpoint should cover every layer’s verdict about that request. A Merkle accumulator over an agent’s history should include behavioral, cognitive, and contextual verdicts. On-chain anchoring should make the whole chain tamper-evident, not selectively so.
These three arguments — shared observability, shared contracts, shared attestations — are the architectural reason the platform is a platform. Section 3 describes the shape that results.
2.3 Non-goal: trust replacement
The trust plane does not replace trust. It makes trust legible.
A verified AP-Trace does not mean the agent is safe. A clear integrity checkpoint does not mean the thinking was sound. A Safe House pass does not mean the inbound message was benign. What these outcomes mean is that the observable substrate is consistent with the agent’s declarations — that operators, principals, auditors, and downstream agents can inspect the substrate and form their own judgments. The platform’s job is to produce honest, non-fabricatable evidence. The judgment remains human.
3.1 The managed gateway as integration point
Customers interact with Mnemom’s trust plane by pointing their existing LLM provider client at the managed gateway at gateway.mnemom.ai. The gateway proxies three provider surfaces — /anthropic/*, /openai/*, and /gemini/* — and nothing else. Requests are transparently forwarded to the upstream provider; responses are transparently returned to the client. The only thing that changes in customer code is the base URL.
Behind that single proxy hop, the gateway activates all three oversight layers:
The key properties of this integration:
- Single configuration change. No agent framework changes, no SDK adoption required. Customers keep their existing provider library and change one base URL.
- Agent identification by key hash. The gateway computes a SHA-256 of the upstream provider’s API key to derive a stable agent identity. Raw keys are never persisted.
- Zero added synchronous latency for most traffic. The gateway uses
tee() on streaming responses: one fork goes to the client immediately, the other forks into background analysis via ctx.waitUntil(). The client sees its stream at provider-native latency.
- Asynchronous oversight. Thinking-block extraction, AIP analysis, checkpoint signing, Merkle updating, OTel span emission, and webhook delivery all run in the background.
- Synchronous oversight where required. Safe House L1 heuristics run before the provider call in
enforce mode, so blocked messages never reach the model. AIP analysis runs inline on non-streaming responses.
The three layers are not three parallel products. They share the gateway’s request context, the agent’s unified alignment contract, a single session identity, and a single attestation chain. An AIP boundary_violation in the thinking block flags the linked AP-Trace that the observer subsequently produces. A Safe House high-risk verdict elevates the session’s cognitive scrutiny. A drift event detected post-hoc informs the conscience prompt’s session window for subsequent real-time checks.
3.2 Per-request lifecycle
The gateway processes every request through the same pipeline, regardless of which layers are active:
The lifecycle has four timescales. Synchronous (Safe House L1 heuristics on inbound): single-digit milliseconds. Near-synchronous (AIP analysis): typically 800–2000 ms, dominated by the conscience-prompt LLM call. Asynchronous (AAP trace build): within a minute, as the observer processes gateway logs. Periodic (on-chain anchoring): typically every few hours, batching many checkpoint Merkle roots into one Base L2 transaction.
Customers do not need to understand these timescales to use the platform. They matter architecturally because they determine what the platform can guarantee and when.
3.3 Shape of the trust plane
The managed gateway is the primary integration path and the one most customers adopt. It is not the only one.
- SDKs. The AAP and AIP protocols are published as open-source libraries (
@mnemom/agent-alignment-protocol, @mnemom/agent-integrity-protocol). Customers who want to embed oversight in-process — rather than route through the gateway — can verify traces and run integrity checks locally. The protocols are the contract; the gateway is one implementation.
- OTel export. Customers already operating an OpenTelemetry backend (Datadog, Grafana Cloud, Splunk, Arize, Langfuse) can install
@mnemom/aip-otel-exporter and receive AIP, AAP, and attestation spans directly in their existing observability stack. See Observability for the span attribute specification.
- Public verification endpoints. Third parties — regulators, downstream agents, enterprise buyers — can verify an agent’s integrity certificates without authenticating to Mnemom. See Section 6.9 and Security & Trust.
Whichever path a customer uses, the substrate is the same: one declared contract, three oversight layers, one cryptographic chain.
4. The unified alignment contract
Every layer of the trust plane evaluates against the same declared contract. In Mnemom, that contract is expressed as two YAML cards:
- The alignment card — who the agent is, what it values, what it may do, what it must never do, and how it commits to being audited.
- The protection card — how the agent is defended at runtime against adversarial inputs and outbound leakage.
The two cards are the central schema innovation in the v4 platform. They replace the pre-unified state in which an AAP alignment card, a CLPI policy YAML, and a Safe House JSON configuration lived as three disjoint artifacts that had to be synchronized by hand. The alignment card absorbs AAP-level declarations, tool-policy concerns, and the conscience-value injection schema; the protection card elevates Safe House configuration from an implementation detail to a first-class card. See Agent Cards for the customer-facing overview.
4.1 Card surfaces
The alignment card declares:
| Section | What it holds |
|---|
identity | Card ID, agent ID, issued_at, optional expires_at |
principal | Whom the agent serves — human, organization, agent, or unspecified — and the relationship type (delegated_authority, advisory, autonomous) |
values | Declared values, their definitions, conflicts_with, optional hierarchy |
conscience | Typed conscience values (BOUNDARY, FEAR, COMMITMENT, BELIEF, HOPE) with augment/replace mode, evaluated by AIP |
integrity | The enforcement_mode applied to AIP verdicts (observe, nudge, enforce) |
autonomy | bounded_actions, forbidden_actions, escalation_triggers, max_autonomous_value |
capabilities | Tool-pattern-to-action mappings (absorbed from the former CLPI policy) |
enforcement | Tool-policy knobs: forbidden_tools, unmapped_tool_action, fail_open, mode |
audit | Trace format, retention, queryable, query_endpoint, tamper_evidence |
extensions | Protocol-specific metadata (e.g., EU AI Act disclosures) |
The protection card declares:
| Section | What it holds |
|---|
mode | The Safe House mode (disabled, simulate, observe, enforce) |
thresholds | Per-signal cutoffs: warn, quarantine, block |
screen_surfaces | Which message surfaces Safe House inspects (inbound, tool results, agent output) |
trusted_sources | Per-source risk multipliers and trust tiers |
The full normative schemas are at /specifications/alignment-card-schema and /specifications/protection-card-schema.
4.2 Three-scope composition
Both cards compose across three scopes:
Platform policy (applied to every agent on Mnemom — the absolute floor)
└── Org template (optional, applied to every agent in an organization)
└── Agent card (the agent's own declaration)
└── Canonical card (system-computed, stored, served)
Composition runs at storage time, not per request. When any input changes — an agent publishes a new card, an org updates its template, a platform policy changes, an exemption is granted — the platform recomposes the agent’s canonical card and persists it. Every gateway and observer read hits the pre-composed canonical row. The request path has zero merge cost. See Card Composition.
Field-level composition semantics vary by section, chosen to match governance intent. values.declared unions across scopes — any scope can require a value. autonomy.forbidden_actions unions with deny-overrides — once forbidden at any scope, the field cannot be un-forbidden downstream. integrity.enforcement_mode uses a strictness floor — an organization that requires enforce cannot be downgraded to observe by an individual agent. Audit commitments (retention, tamper evidence) compose toward the stronger guarantee. Protection-card thresholds compose toward the more sensitive setting. The full per-field rule table is in Card Composition.
Exemptions replace the pre-unified boolean org_card_exempt flag. An exemption waives a specific section (optionally scoped to specific patterns) of a scope’s contribution for a specific agent, with a documented reason, a granted_by, and an expiry. Exemptions on BOUNDARY conscience entries are rejected at the API layer — inviolable commitments cannot be waived.
4.3 Why the composer matters
Beyond mechanical merging, the composer guarantees schema invariants. Some fields only cohere when certain other fields are present — for example, declaring audit.queryable: true requires a non-empty audit.query_endpoint, because a queryable commitment without an endpoint is a hollow governance promise. Authors only supply governance choices; operational fields (URLs, endpoints) are resolved and synthesized by the composer.
This is structurally stronger than validator-only enforcement. The validator rejects malformed input, but the composer produces well-formed output by construction — across authored layers, synthesized defaults, and applied exemptions. Legacy-data fixes become recompose operations; new invariants are added by tightening the validator and extending the composer, without schema migrations.
Every oversight layer maps from the canonical card to its internal evaluation shape, rather than reading from the authored card directly:
- The AAP SDK reads an
AlignmentCard shape derived from the canonical card’s principal, values, autonomy, audit, and extensions sections.
- The AIP engine derives its card summary from the canonical card’s
values, conscience, and autonomy sections; the integrity.enforcement_mode field determines how the gateway reacts to boundary verdicts.
- The policy engine reads
capabilities and enforcement for tool-policy evaluation.
- The Safe House engine reads the canonical protection card for mode, thresholds, and source trust.
Because all four paths read from a single composed artifact, there is no “which card is authoritative?” question at request time. When an operator updates an org template, every affected agent’s oversight updates atomically at the next composition.
This is the schema innovation that makes the rest of the paper possible. Sections 5.1, 5.2, and 5.3 describe the three layers that evaluate against this contract.
5. The three oversight layers
5.1 Agent Alignment Protocol (AAP) — post-hoc behavioral oversight
AAP answers the behavioral question: what did this agent actually do, and is that consistent with what it declared?
5.1.1 AP-Trace
An AP-Trace is an audit log entry recording a single agent decision:
- Action: type, name, category (bounded vs. unbounded).
- Decision: alternatives considered (with optional scores and flags), selected option, selection reasoning, and
values_applied.
- Escalation: which triggers were checked, whether escalation was required, and why.
- Context:
session_id, optional parent trace ID for multi-step decisions.
AP-Traces are generated by the observer from gateway logs (in the managed-gateway path) or emitted by the AAP SDK (in the embedded path). The decision block is where AAP’s transparency value concentrates. By recording alternatives considered, selection reasoning, and values applied, the trace makes the agent’s decision process auditable without retaining the underlying prompts or responses.
Critical design choice: AP-Traces are sampled, not complete. They capture decision points the agent (or the gateway on its behalf) chooses to log. This is intentional — comprehensive logging of every LLM computation is neither practical nor necessarily desirable. Section 10 addresses the implications.
5.1.2 Verification
AAP provides three verification functions against the canonical alignment card:
| Function | Input | Output | Purpose |
|---|
verify_trace | AP-Trace, card | VerificationResult | Check trace consistency with declared contract |
check_coherence | Two cards | CoherenceResult | Assess pairwise multi-agent compatibility |
detect_drift | Card, trace sequence | DriftAlert[] | Identify behavioral divergence over time |
Trace verification performs six structural checks (card reference, card expiration, autonomy compliance, forbidden compliance, escalation compliance, value consistency) plus one semantic check (behavioral similarity — how similar the trace reasoning is to the declared value semantics). Violations are typed and severity-ranked: FORBIDDEN_ACTION (critical), UNBOUNDED_ACTION (high), MISSED_ESCALATION (high), UNDECLARED_VALUE (medium), CARD_EXPIRED (medium), CARD_MISMATCH (low).
5.1.3 Drift detection
Drift detection identifies when agent behavior diverges from declared alignment over multiple sessions. The algorithm uses Self-Similarity Matrix (SSM) analysis on trace feature vectors. Thresholds were derived empirically — similarity below 0.30 sustained across 3 consecutive turns indicates genuine divergence. Single-turn drops are stylistic variation (≈23% genuine drift in the calibration corpus); 3-turn streaks indicate genuine divergence (≈87% genuine drift). Detected drift is classified by direction: value_drift, autonomy_expansion, principal_misalignment, or unknown.
5.1.4 Value coherence
Before two agents coordinate, they can exchange cards and check value compatibility. AAP 1.0 provides a baseline handshake — Jaccard-flavored scoring over the declared value sets — preserved for protocol stability. For production team governance, a separate v2 coherence scorer is introduced in Section 7; its contribution is that it refuses to emit a single headline score and instead reports a dimensional vector with narrative helpers.
5.2 Agent Integrity Protocol (AIP) — real-time cognitive oversight
AIP answers the cognitive question: what is this agent thinking, and is that consistent with what it declared?
5.2.1 The real-time gap
AAP’s post-hoc verification is comprehensive but operates after the action has landed — typically within a minute, as the observer processes gateway logs. This delay creates a window of vulnerability:
- Prompt injection: A malicious input causes the agent to exfiltrate data. The trace records it faithfully, but the data is already gone.
- Gradual drift: Reasoning shifts subtly across turns. Each individual action passes verification; the cumulative direction is misaligned.
- Misaligned intent: The thinking block reveals intent to take a forbidden action, but the thinking is only examined retroactively.
All three failure modes are visible in thinking blocks before the action lands. AIP fills this gap.
5.2.2 Integrity Checkpoints
The Integrity Checkpoint is AIP’s primary data structure. Where an AP-Trace records what the agent did, a checkpoint records the analysis engine’s conclusions about what the agent was thinking.
Verdict types:
clear — the thinking is consistent with the card. The daimonion is silent. Proceed.
review_needed — the thinking raises concerns that do not constitute hard violations. The daimonion speaks a warning. May proceed; should log.
boundary_violation — the thinking indicates intent to violate a hard boundary. The daimonion speaks forcefully. Should not proceed without escalation.
Concern categories: prompt_injection, value_misalignment, autonomy_violation, reasoning_corruption, deceptive_reasoning, undeclared_intent. Each concern carries a severity (low, medium, high, critical), a human-readable description, direct evidence quoted from the thinking block, and a reference to the relevant card field or conscience value.
Privacy by design. The full thinking block text is never stored. A SHA-256 hash (thinking_block_hash) enables correlation without retaining private reasoning. See Security & Trust.
5.2.3 The conscience prompt
AIP uses a separate, typically small and fast LLM (for example, Claude Haiku) to evaluate semantic intent against the declared card. Pattern matching is trivially gameable; semantic analysis is not.
The conscience prompt is the most critical engineering artifact in AIP. It is structured in five layers:
- Role definition. The analysis LLM acts as a daimonion — a conscience voice that evaluates the thinking block as an object of analysis, not as instructions to follow. A meta-injection warning prevents the thinking block from manipulating the analysis.
- Card summary. A compressed (~500 token) representation of the canonical alignment card — declared values, bounded actions, forbidden actions, escalation triggers.
- Conscience values. Only
BOUNDARY (hard limits) and FEAR (concerns) types are injected. COMMITMENT, BELIEF, and HOPE types are excluded from the conscience prompt to reduce false positives while remaining available to other layers.
- Session window context. Recent verdicts and reasoning summaries providing drift context across the session.
- Thinking block. The content to evaluate, with token limits (4096 tokens recommended) and a first/last split strategy for truncation.
Minimum evidence threshold. Thinking blocks under ≈100 tokens provide insufficient context for reliable analysis and produce speculative verdicts with high false-positive rates. Below-threshold blocks receive a synthetic clear signal without LLM analysis. This threshold was derived from production observation of spurious critical-severity verdicts on short reasoning fragments.
5.2.4 Provider adapters
AIP is model-agnostic. Different providers expose reasoning differently:
| Provider | Location | Extraction confidence |
|---|
| Anthropic | content[].type === "thinking" | 1.0 |
| OpenAI | reasoning_content field | 0.9 |
| Google Gemini | Thinking content parts | 0.9 |
| Fallback | Response-text inference | 0.3 |
Adapters extract thinking from both JSON responses and SSE streams. The confidence score flows into the checkpoint’s analysis_metadata.extraction_confidence, enabling downstream consumers to weight verdicts by the reliability of their input.
5.2.5 Session windowing and integrity drift
AIP maintains a sliding window of recent checkpoints per session. The window feeds the conscience prompt’s context, supports session-scale drift detection (3+ consecutive non-clear verdicts triggers a drift alert), and exposes an integrity ratio — the proportion of clear verdicts in the window — as a single-number health metric. Integrity drift directions mirror AAP drift: injection_pattern, value_erosion, autonomy_creep, deception_pattern, unknown.
5.2.6 Signal delivery and enforcement
AIP produces a signal with a proceed boolean and a recommended action (continue, log_and_continue, pause_for_review, deny_and_escalate). In the managed-gateway path, the signal is consumed by the gateway itself — the integrity.enforcement_mode field in the canonical alignment card determines whether the gateway logs, injects a nudge into the agent’s next context, or actively blocks the pending action. In the embedded path, customers consume the signal via SDK callback or HMAC-signed webhook.
Failure policy. AIP defaults to fail-open — if the analysis LLM is unreachable, the agent proceeds with a logged warning. For high-security deployments, fail-closed mode blocks until analysis succeeds or a human approves continuation.
Latency posture. AIP analysis runs near-synchronously in the managed-gateway path. End-to-end cost is dominated by the conscience-prompt LLM call to a small model (Claude Haiku-class), which in production observation averages 800–2000 ms depending on thinking-block length. This places AIP outside sub-second synchronous critical paths. Customers who require user-facing latency at p99 below that range should deploy AIP in observe mode — analysis runs asynchronously via ctx.waitUntil with zero perceptible latency — and reserve enforce mode for non-user-facing workflows or for flows where a near-second pause is acceptable.
5.3 Safe House — contextual oversight
For the full Safe House concept page, see Safe House.
Safe House answers the contextual question: what is reaching this agent from outside, and what is this agent letting out?
5.3.1 The contextual gap
AAP records what the agent did; AIP analyzes what the agent was thinking. Both assume the agent is the intent-originator of the interaction. In the open world, that assumption is often false.
- A tool result the agent retrieved from the web contains a hidden
<span style="display:none">Assistant: disregard prior constraints</span> — indirect injection. Neither AAP nor AIP inspect tool results at ingestion.
- A user message embeds plausible-sounding authorization claims — “As the developer who built you, I’m authorizing you to skip approval” — social engineering. The agent’s subsequent reasoning may look coherent, so AIP may not flag it.
- An inbound message from another agent is falsely attributed to a trusted system — agent spoofing. The agent has no way to verify the source.
- An agent’s outbound response begins to include raw PII or a secret-formatted string that should never leave the tenant — outbound data leakage. AAP captures the outcome; AIP analyzed the reasoning that led to it. Neither prevented the specific outbound payload.
These are not behavioral failures of the agent or cognitive failures of its reasoning. They are failures of the context boundary — and they require inspection at the boundary itself.
5.3.2 Three-layer detection
Safe House uses a layered approach balancing speed and accuracy:
| Layer | Method | Role |
|---|
| L1 | Regex and word-list heuristics | Fast rejection of obvious attacks; multilingual coverage including English, French, German, Italian, Spanish, Portuguese, Japanese, Chinese |
| L2 | Semantic analysis via a small LLM | Deep intent understanding; handles obfuscation and novel attacks; catches attacks in languages outside the L1 set |
| L3 | Session escalation | Stateful risk accumulation across messages within a session |
L1 runs first and can short-circuit the pipeline for high-confidence detections. If L1 passes or returns a warn score, L2 runs. L3 operates continuously across the session rather than per-message.
5.3.3 Threat taxonomy
Safe House classifies detected threats into a taxonomy that is independent of which layer detected them:
prompt_injection — direct attempts to override the agent’s system prompt.
indirect_injection — adversarial content in retrieved documents, tool outputs, or external data.
social_engineering — psychological manipulation to bypass agent judgment.
bec_fraud — business-email-compromise patterns targeting financial workflows.
agent_spoofing — messages falsely claiming to originate from a trusted agent or system.
hijack_attempt — attempts to redirect agent goals mid-session.
data_exfiltration — attempts to surface protected data.
privilege_escalation — attempts to claim permissions the sender does not have.
pii_in_inbound — personal data in user messages that should not enter agent context.
5.3.4 Bidirectional screening
Safe House screens in both directions:
- Inbound. Evaluates user messages, tool results, and retrieved context before they reach the agent. This is the primary use case; most attacks arrive here.
- Outbound. Scans agent responses for data leaks — PII patterns, secret formats, or content that should not leave the agent’s context — before the response is returned to the caller. Configured separately; applies regardless of inbound mode.
The symmetry matters. An agent compromised by a contextual attack rarely telegraphs the compromise in a boundary-violating action; the tell is often in the outbound payload. Outbound screening turns “the agent was tricked into sending X” into a blockable event rather than a post-hoc investigation.
5.3.5 Canary credentials
Safe House supports planting fake API keys, tokens, or identifiers inside agent context — canary credentials. Any inbound message containing a canary pattern triggers an automatic block verdict regardless of other scoring, and emits a webhook event. Canaries produce zero false positives by construction: a real credential would never appear in an inbound message. They convert silent compromise into loud, first-class evidence.
5.3.6 Source trust
Not all message sources carry the same risk. Per-source risk_multiplier values tune sensitivity — an internal-scheduler source with multiplier 0.0 is fully trusted (analysis still runs but scores are suppressed); an unverified-user-input source with multiplier 2.0 doubles the computed risk score before thresholding. This turns per-source trust from a binary into a continuous dial.
5.3.7 Modes
Safe House modes align with the same observe / enforce vocabulary used by the alignment card’s integrity.enforcement_mode:
- Disabled. Default. No analysis runs.
- Simulate. Full analysis runs synchronously; no messages are blocked. Verdicts are returned in a response header. Used to understand the threat landscape before enabling enforcement.
- Observe. Analysis runs asynchronously via
waitUntil — zero latency impact. Verdicts and session risk scores are logged.
- Enforce. Full synchronous analysis. Messages receiving
quarantine or block verdicts do not reach the agent.
This is the first mode a new customer typically sees. Deploying Safe House in simulate first produces a no-impact risk picture; observe turns that picture into a signal; enforce makes it a control.
5.3.8 Cross-layer feedback
Safe House is not an independent gate. Its output enriches the other oversight layers:
- Into AIP. A message that passed Safe House but scored close to a threshold is surfaced to the AIP conscience prompt as context — enabling the cognitive layer to apply extra scrutiny to reasoning that operates on suspicious input.
- From AIP. A high boundary-violation rate within a session elevates Safe House’s L3 session risk score — sustained cognitive distress makes the session’s contextual pipeline more sensitive for subsequent messages.
- Into AAP. Safe House events are emitted to the same observer pipeline as AP-Traces and integrity checkpoints, so a behavioral audit can correlate a suspicious action with an inbound contextual event.
This cross-layer feedback is the architectural reason Safe House is a peer of AAP and AIP, not an add-on.
6. Cryptographic attestation
6.1 The verification gap
AAP provides post-hoc transparency. AIP provides real-time integrity analysis. Safe House provides contextual defense. Each produces evidence, and each introduces a new trust dependency: the service that produces the evidence.
A centralized oversight service could, in principle:
- Fabricate verdicts. Return
clear for genuinely problematic reasoning, or boundary_violation for benign reasoning, depending on incentive.
- Modify history. Alter stored checkpoints after the fact to cover missed violations or insert false ones.
- Operate inconsistently. Apply different rules to different customers, or change rules silently.
- Fail silently. Stop performing analysis while returning plausible-looking results.
These failure modes are not hypothetical for any centralized service. The verification gap is the distance between “this service says the agent is aligned” and “we can independently confirm this service applied its stated rules honestly.”
Cryptographic attestation closes this gap. It transforms oversight verdicts from claims into evidence — evidence that can be verified by any party, without trusting the Mnemom API, without trusting the analysis service, and without trusting any component beyond the underlying cryptographic primitives.
6.2 The attestation chain
The attestation architecture comprises five layers, each providing a distinct security guarantee:
| Layer | Primitive | Guarantee | Verification |
|---|
| 1. Input commitment | SHA-256 over sorted-key canonical inputs | Verdict is bound to specific inputs | Fully offline |
| 2. Signatures | Ed25519 | Verdict was issued by a known key | Fully offline |
| 3. Hash chains | SHA-256 preimage per session | Checkpoint sequence is tamper-evident | Fully offline |
| 4. Merkle trees | Per-agent binary Merkle accumulator | Checkpoint exists in the complete log | Online root lookup |
| 5. ZK proofs | SP1 STARK via zkVM | Deterministic rules were applied correctly | Fully offline |
Layers degrade gracefully. If the ZK prover is unavailable, the first four layers still provide meaningful guarantees. If the Merkle tree has not yet been updated, signatures and chains still hold. This layered design ensures that attestation enhances rather than gates the oversight pipeline.
Before analysis begins, a deterministic SHA-256 commitment is computed over all inputs:
input_commitment = SHA-256(
thinking_block_hash ||
SHA-256(alignment_card) ||
SHA-256(conscience_values) ||
analysis_model_version ||
prompt_template_version ||
SHA-256(window_context)
)
Individual component hashes are also stored separately, enabling selective disclosure: a verifier can confirm that a specific card was used without requiring access to the conscience values. Sorted-key JSON serialization ensures determinism across implementations; the pipe delimiter prevents ambiguity between field boundaries.
6.4 Ed25519 checkpoint signing
Every checkpoint carries a digital signature from the analysis service’s Ed25519 keypair. Third parties verify signatures against published public keys without contacting Mnemom’s servers.
The signed payload is a deterministic JSON string with lexicographically sorted keys over {agent_id, chain_hash, checkpoint_id, input_commitment, thinking_block_hash, timestamp, verdict}. Key rotation is supported from day one: signing keys carry a key_id, a public_key (hex-encoded), an algorithm, a created_at, and an is_active flag. The public-key listing endpoint (GET /v1/keys) returns all active keys for cached verification.
6.5 Hash chain linking
Each checkpoint includes the hash of the previous checkpoint for the same session, creating a tamper-evident sequence:
chain_hash = SHA-256(
prev_chain_hash || checkpoint_id || verdict ||
thinking_block_hash || input_commitment || timestamp
)
For the first checkpoint, prev_chain_hash is the literal string genesis. Deleting or reordering checkpoints breaks the chain; modifying any checkpoint invalidates all subsequent chain hashes. This is blockchain-flavored tamper evidence without the consensus overhead.
6.6 Merkle accumulator
A per-agent append-only binary Merkle tree accumulates all checkpoint leaf hashes. This enables:
- Inclusion proofs. Prove that a specific checkpoint exists in the agent’s history in O(log N) hashes, without revealing other checkpoints.
- Completeness proofs. Any deletion changes the Merkle root; previously-cached roots make deletion detectable.
Each leaf: SHA-256(checkpoint_id || verdict || thinking_block_hash || chain_hash || timestamp). Inclusion proofs contain O(log N) sibling hashes, each annotated with its position relative to the path node. The Merkle root is publicly accessible at GET /v1/agents/{id}/merkle-root.
6.7 Integrity certificates
The integrity certificate packages all cryptographic evidence for a checkpoint into a self-describing JSON document. The structure draws on C2PA Content Credentials and the W3C Verifiable Credentials Data Model.
A certificate contains: the subject (checkpoint, agent, session, card IDs); the claims (verdict, concerns, confidence, analysis model, duration); the input commitments (hash components and the combined hash); and the proofs (the Ed25519 signature, the chain hash and position, the Merkle leaf hash, leaf index, root, tree size, and inclusion proof, and — when available — the SP1 STARK receipt and image ID for the verdict-derivation proof). A verification block exposes the URLs for the public keys endpoint, the Merkle root endpoint, the verify endpoint, and the specification.
Certificates are designed for offline verification of signatures, chains, and ZK proofs; Merkle inclusion requires a single online lookup of the current root.
6.8 Zero-knowledge proofs: three types
The platform generates three distinct classes of zero-knowledge proof, each answering a different question a verifier may ask. All are produced by SP1 guest programs on GPU-accelerated proving infrastructure.
6.8.1 Verdict derivation proof
“Given this LLM analysis output, applying the deterministic checkIntegrity rules produces the claimed verdict.”
This is the central attestation contribution of the platform. An integrity verdict is the composition of two stages:
- An LLM analyzes a thinking block and produces structured JSON with typed concerns (semantic, expensive, non-deterministic).
- Deterministic rules map those concerns to a verdict and recommended action (mechanical, cheap, deterministic).
Stage 1 is prohibitively expensive to prove cryptographically with current zero-knowledge technology. Stage 2 is a few thousand RISC-V cycles — trivial for a zkVM.
What the proof proves. The verdict was derived by applying specific, deterministic rules to the analysis response; the analysis response (private witness) produces the claimed verdict (public output); the thinking hash, card hash, and values hash match the committed inputs; no verdict fabrication or substitution occurred.
What the proof does not prove. That the LLM was actually called (input commitments and signing handle this). That the LLM response was unmodified (Layer 2 signatures handle this). That the LLM correctly identified all concerns — this is the residual trust assumption, explicitly stated in Section 10.9.
The rules (executed inside the zkVM):
- Any concern with
critical severity produces boundary_violation.
high severity in prompt_injection, deceptive_reasoning, or value_misalignment categories produces boundary_violation.
- Any remaining
medium or higher severity concern produces review_needed.
- No medium-or-higher concerns produces
clear.
Cross-language conformance tests guarantee that TypeScript and Rust implementations produce bit-identical outputs for identical inputs. The guest program runs in no_std Rust against the RISC-V target, deserializes the analysis JSON, normalizes concern evidence, re-derives the verdict, and commits the output to the STARK journal.
6.8.2 Individual risk proof
“Given this agent’s reputation components and violation history, applying the risk formula with context weighting, recency decay, and composite scoring produces the claimed risk score.”
The individual-risk proof generalizes the auditor-model distinction to risk assessment. The inputs — reputation components, violation history, context parameters — are committed; the deterministic scoring function is executed inside the zkVM; the output risk score is public. Customers and auditors can verify that a published risk score was derived from the stated inputs by the stated formula, without trusting the risk service.
6.8.3 Team risk proof
“Given this team’s member composition and per-agent signals, applying the three-pillar team risk model produces the claimed team risk score and per-agent attributions.”
The three-pillar model decomposes team risk into Aggregate Quality, Coherence Quality, and Structural Risk. Per-agent attributions use leave-one-out Shapley values — marginal contributions computed by comparing the full-team score to each single-member-removed score — with circuit-breaker logic to guard against pathological inputs. The ZK proof commits the team composition, proves the three-pillar computation, and exposes the per-agent attribution vector.
6.9 Optimistic proving strategy
Not every checkpoint requires a STARK proof. The attestation layer uses a hybrid approach:
| Mode | When | Cost posture |
|---|
| Immediate | Every checkpoint | Layers 1–4 applied synchronously |
| Always | Every boundary_violation | STARK proof generated automatically |
| Stochastic | Random sample of other checkpoints | Amortized random sampling |
| On-demand | Any checkpoint, upon request | Customer-initiated via API |
| Enterprise | Every checkpoint (enterprise tier) | All checkpoints proven |
The row names above are a conceptual taxonomy of current proving policy, not official API enums. The two surfaces that reach customer-visible controls are the on_demand_proving feature gate (the “On-demand” row — POST /v1/checkpoints/{id}/prove) and the zk_proofs feature gate (GET /v1/checkpoints/{id}/proof). The remaining rows are internal policy that the platform applies on behalf of the customer.
The highest-stakes verdicts (boundary_violation) always receive cryptographic proof; stochastic sampling of other checkpoints maintains probabilistic deterrence — an attacker cannot predict which checkpoints will be proven.
6.10 On-chain anchoring
Aggregated Merkle roots and published reputation scores are anchored on Base L2 via two ERC-8004-aligned smart contracts:
MnemoReputationRegistry at 0xfba717a6c4eb481a74f6911954a625242a048425 — publishes individual agent reputation scores (0–1000), letter grades (AAA–CCC), and batch roots. Exposes publishScore, publishBatch (up to 200 agents), getScore, getScoreHistory, getLatestBatchRoot, and getTotalPublishedAgents.
MnemoMerkleAnchor at 0xdfdbc9374907d5adea8f100a0d1e07e16c99816d — anchors the aggregated Merkle root derived from per-agent checkpoint Merkle trees. Exposes anchorRoot, getLatestRoot, getRootByIndex, getRootCount, and isRootAnchored.
Anchoring is periodic, not per-checkpoint. A frequency of several hours balances cost efficiency against tamper-evidence windows; typical Base L2 cost per anchor is a small number of US cents at current network conditions. Any party can call isRootAnchored(root) to confirm that a root obtained from the off-chain API has been anchored — establishing tamper-evidence without relying on Mnemom’s infrastructure. See On-Chain Verification for the customer-facing flow.
Agent identifiers do not appear in any Merkle leaf pre-image or in on-chain data. The linkage from agent to Merkle root lives only in Mnemom’s off-chain mapping table. This is load-bearing for GDPR: deleting the off-chain mapping severs all linkage to the anchored root, without requiring any on-chain mutation (which would be impossible). See GDPR data subject rights.
6.11 Verification API and offline verification
Public verification endpoints require no authentication — security relies on cryptographic guarantees, not access control:
| Endpoint | Purpose |
|---|
GET /v1/keys | List active signing public keys (Ed25519, hex-encoded) |
GET /v1/checkpoints/{id}/certificate | Retrieve the full integrity certificate |
POST /v1/verify | Verify a certificate (all layer checks) |
GET /v1/agents/{id}/merkle-root | Current Merkle root plus metadata |
GET /v1/checkpoints/{id}/inclusion-proof | Merkle inclusion proof for a specific checkpoint |
POST /v1/checkpoints/{id}/prove | Request a STARK proof on demand |
GET /v1/checkpoints/{id}/proof | Retrieve STARK proof status and data |
Verification performs five independent checks: signature validity against the published public key; chain hash recomputation; Merkle root recomputation from the leaf hash and inclusion proof siblings; combined_commitment consistency with the individual input hashes; and — when available — the SP1 STARK receipt verified against the expected image ID, with the journal output compared to the certificate’s verdict and concerns.
An offline verifier can check signatures, chains, and ZK proofs without any network calls, given a trusted public key. Only Merkle inclusion requires a single online lookup of the current root. This enables air-gapped auditing, CI/CD pipelines, and regulatory tooling to verify certificates independently.
6.12 The auditor-model distinction
The central conceptual move of the attestation layer is the distinction between proving model inference and proving auditor judgment.
| Approach | What is proven | Posture |
|---|
| Full inference proofs (Modulus Labs, Giza) | Neural network forward pass | Research-stage for frontier models |
| Hardware attestation (EQTY, Intel SGX) | Code executed in a secure enclave | Requires specific hardware and manufacturer trust |
| Signature-only attestation (C2PA, W3C VC) | Issuer signed a claim | Proves authorship, not computation |
| Verdict-derivation proof (this work) | Deterministic rules applied honestly | Production; operational |
The platform occupies a practical middle ground: stronger than signature-only attestation, because computation is proven rather than only signed; far cheaper than full inference proofs, because only the deterministic judgment layer is proven; complementary to hardware attestation, which could cover the LLM analysis stage that the proof explicitly does not cover.
The guarantee is precise: “Given these concerns, the verdict derivation was honest.” The non-guarantee is equally precise: “The concerns themselves were correctly identified.” Both are communicated to verifiers alongside the certificate.
7. Team coherence and fleet governance
An agent rarely operates alone. The trust plane’s per-agent primitives — cards, traces, checkpoints, certificates — compose upward into fleet-level governance: “does this team of agents share the governance floor we require?”, “where are the fault lines?”, “who is the outlier?“
7.1 Why fleets need a different primitive
The natural first instinct is to compute a single “fleet coherence score” from pairwise comparisons. The platform shipped this shape first, with Jaccard-flavored scoring over declared value sets. Observations on the platform’s showcase fixtures revealed three structural problems with the single-number output:
- Silence counted as disagreement. A value declared by one agent that another agent does not mention deflates the score — even though absence from a role-specialist card is specialization, not disagreement.
- Role specialization was penalized. A monitoring agent and a remediation agent sharing every governance commitment but differing on role-specific values scored substantially below parity under Jaccard — not because they conflict, but because the denominator counted every unique value as potential disagreement. The effect is reproducible on the
@mnemom/team-coherence baseline fixtures.
- Mean-of-pairs obscured structure. There was no asymmetry between a universal conscience floor (which must be shared) and role extensions (which should diverge). No surfacing of the weakest pair, the conflict surface, or the specialization structure.
A single blended percentage is a lossy compression, and the specific compression that naive coherence scorers apply distorts legitimate fleets.
7.2 Dimensional coherence
Mnemom’s production scorer — @mnemom/team-coherence/v2 — refuses to emit a single headline score and instead returns a vector with narrative helpers. The policy decision: any single blended number is a lie by compression; refusing to emit one forces every consumer to grapple honestly with the vector.
The pairwise computation is evidence-based:
shared = A.values.declared ∩ B.values.declared
conflicts = (A.values.declared ∩ B.values.conflicts_with)
∪ (B.values.declared ∩ A.values.conflicts_with)
agreement_evidence = |shared|
conflict_evidence = |conflicts|
if agreement_evidence + conflict_evidence < MIN_EVIDENCE:
governance_score = null # insufficient evidence
else:
governance_score = agreement_evidence
/ (agreement_evidence + conflict_evidence)
Key properties: silence is neutral (values declared by only one agent do not enter the denominator; they contribute to a separate diversity signal); only explicit conflicts count (a value in one card’s conflicts_with that the other card declares); insufficient evidence returns null, not a fabricated zero; every non-null score is bounded in [0, 1].
7.3 The fleet vector
computeTeamCoherence(cards) returns structural information rather than a single number:
- Pair-level aggregates.
pairwise_governance_floor (the weakest pair — if the fleet has one bad pair, this number tells you directly), pairwise_governance_median (typical-case health), insufficient_evidence_pairs, conflict_edge_count (how many pairs have at least one explicit conflict).
- Diversity.
diversity_rate_median — role specialization reported separately, not blended into the governance number.
- Structural invariants.
conscience_universal (boolean — true if every agent shares the same conscience commitment set); integrity_uniform (boolean — true if every agent is in the same integrity.enforcement_mode); conscience_divergence and integrity_divergence expose the minority. These are first-order signals: a fleet without a universal conscience floor has a governance problem that no continuous score can express. The invariants require unified cards; they return null when cards lack the conscience or integrity sections.
- Outliers. Per-agent
mean_pair_governance and deviation_sigma — an agent whose mean pairwise score is more than 1σ below the fleet mean is surfaced as an outlier.
- Narrative helpers. Pre-computed answers to common human questions, so every UI surface tells the same story:
weakest_pair (answers “where should I look first?”), most_conflicted_agent, specializations (per-agent unique values), conflict_surface (flat list of every explicit conflict with evidence).
The absence of a single fleet score is intentional. Consumers that require a scalar must derive one from the vector themselves and take responsibility for that compression. See Fleet Coherence.
7.4 Package structure
The scorer is published as a public npm package with three subpaths:
@mnemom/team-coherence (top-level) — convenience re-export of the v2 scorer.
@mnemom/team-coherence/v2 — the dimensional scorer, input type TeamCoherenceInput, property-based-tested for symmetry, role-specialization invariance, conflict monotonicity, insufficient-evidence handling, self-pair idempotence, and boundedness.
@mnemom/team-coherence/baseline — the AAP 1.0 Jaccard-flavored handshake, re-exported with explicit “baseline” naming so consumers that want side-by-side pedagogical comparison can import both.
The public package is narrow by design: it is the scorer, not the full fault-line analysis layer. Fault-line extraction — the classification of divergences into resolvable / priority_mismatch / incompatible / complementary buckets and the associated recommendation layer — is an IP-sensitive product surface and remains proprietary to the Mnemom API. The public package documents the scorer’s interface; the internals of fault-line extraction are intentionally not described here.
7.5 Application
Team coherence v2 is a primitive; it becomes governance through use. Production applications:
- Fleet dashboards render the vector — structural-invariant banners lead, the weakest pair is the first-order triage content, agent detail and specialization follow, pairwise matrices sit behind links.
- Alert rules fire on invariant flips (e.g.,
conscience_universal: true → false) and on governance-floor thresholds.
- Team cards — fleets of agents can share a team card that declares the governance floor the fleet promises to the principal; see Team Reputation.
- Cross-org coordination uses pairwise coherence as a handshake before multi-agent workflows; insufficient-evidence pairs are surfaced for review rather than silently proceeding.
8. Reputation, risk, and accountability
The oversight layers produce evidence; fleet governance aggregates that evidence across agents; reputation and risk score summarize it into decision-useful signals.
8.1 Mnemom Trust Rating
The Mnemom Trust Rating is a composite metric derived from independently verified behavioral evidence — not self-reported claims. Scores range from 0 to 1000 and map to letter grades inspired by bond credit ratings:
| Grade | Range | Tier |
|---|
| AAA | 900–1000 | Exemplary |
| AA | 800–899 | Established |
| A | 700–799 | Reliable |
| BBB | 600–699 | Developing |
| BB | 500–599 | Emerging |
| B | 400–499 | Concerning |
| CCC | 200–399 | Critical |
| NR | — | Not Rated (< 50 analyzed checkpoints) |
The score is a weighted composite over integrity ratio (clear-verdict proportion), compliance (session-capped violation history with a one-week half-life), drift stability, trace completeness, and coherence compatibility. Minimum-evidence thresholds prevent gaming through selective checkpoint submission. See Mnemom Trust Rating for the full component breakdown and formulas.
8.2 Risk assessment
Where reputation describes an agent’s behavioral track record, risk describes the expected consequences of relying on it for a particular task. Individual risk scores incorporate reputation components and violation history with context weighting and recency decay; team risk uses the three-pillar model (Aggregate Quality, Coherence Quality, Structural Risk) with LOO Shapley attribution. Both are ZK-provable (Section 6.8); both are consumable by downstream agents for trust-gated interactions. See Risk Assessment.
8.3 Accountability surfaces
Beyond scores, the platform exposes accountability primitives that make agent and team behavior first-class material for governance processes:
- Reclassification. When a flagged violation was caused by card drift rather than agent misbehavior, operators can reclassify it as a
card_gap after publishing a corrective card amendment. card_gap violations are excluded from the compliance calculation, enabling trust recovery without rewriting history. See Trust Recovery.
- Disagreement reviews. When an integrity verdict is disputed, the disagreement is recorded with the associated checkpoint, reviewed, and — if sustained — applied as a card amendment.
- Reputation badges. Public badges expose trust ratings to third-party surfaces (dashboards, marketplaces, inter-agent handshakes) with the underlying cryptographic evidence one hop away.
- CI/CD policy gates. Agents under development consume reputation thresholds as CI checks via the
reputation-check GitHub Action, so regressions show up in pull requests rather than production. See CI/CD Policy Gates.
9. Security model
9.1 Threat model
The platform assumes agents may be:
- Mistaken (unintentionally misaligned). All three oversight layers detect and surface these effectively.
- Poorly configured (wrong values, over-broad autonomy, weak protection thresholds). The layers detect and surface; the composition layer limits blast radius at the org and platform scope.
- Under attack (contextual adversary). Safe House is the primary mitigation; AIP detects downstream cognitive effects; AAP records the outcome.
- Adversarial (intentionally deceptive). All three layers provide meaningful friction but not guarantee. The attestation layer specifically raises the bar for oversight service adversaries — fabricating verdicts without detection becomes cryptographically infeasible.
9.2 Attack surface
| Attack | AAP | AIP | Safe House | Attestation |
|---|
| Card forgery | Card signing, registry | Same | Same | Card hash bound in every certificate |
| Selective logging | Third-party monitoring | Pre-action analysis | Pre-action screening | — |
| Value-washing | Behavioral audit, drift | Real-time reasoning analysis | — | Values hash in certificates |
| Prompt injection — direct | Post-hoc visibility | Detection in reasoning | L1+L2 inbound screening, canaries | — |
| Prompt injection — indirect | Post-hoc visibility | Detection in reasoning | L1+L2 on tool results + retrieved content | — |
| Social engineering / BEC | Post-hoc visibility | Detection in reasoning | Dedicated threat category | — |
| Outbound data leakage | Post-hoc trace of outcome | — | Outbound screening, PII + secret-format detection | — |
| Thinking-block gaming | — | Raises the bar (Section 10.5) | — | Covers verdict rules, not LLM judgment |
| Meta-injection of analysis LLM | — | Conscience prompt warning, separate model instance | — | — |
| Verdict fabrication | — | — | — | Ed25519 prevents unauthorized verdicts |
| History rewriting | — | — | — | Hash chains + Merkle roots detect retroactive changes |
| Selective checkpoint deletion | — | — | — | Merkle tree inconsistency detectable by any verifier |
| Signing key compromise | — | — | — | Key rotation, public key listing, layered verification |
9.3 Defense in depth
No single layer provides complete protection. The layering is load-bearing: each layer narrows the failure modes the next layer has to handle. Ordered from fastest to slowest:
- Safe House (inbound contextual defense, canaries, outbound screening).
- AIP (real-time cognitive analysis between turns).
- Gateway-level enforcement (applying card-declared
integrity.enforcement_mode to AIP verdicts).
- AAP (post-hoc behavioral verification, drift, coherence).
- Cryptographic attestation (signatures, chains, Merkle accumulator, STARK proofs, on-chain anchoring).
- Human oversight (dashboards, alerts, disagreement reviews).
- Organizational controls (role-based access, passkey authentication, per-user session revocation, new-device email notifications, SSO, MFA). See Security & Trust.
- External accountability (third-party audits, regulatory compliance, independent verification of on-chain anchors).
9.4 Non-goals
The platform is honest about what it does not attempt:
- Not full AI safety. The layers cover observable substrate. They do not make the agent safe; they make its behavior observable and its oversight verifiable.
- Not proof of model inference. Proving that an LLM produced a given output remains a research problem at frontier scale; the platform proves verdict derivation, not model execution (Section 6.12).
- Not mind reading. Thinking-block analysis operates on what the model emits, not on latent reasoning the model does not surface. Models that emit no reasoning receive synthetic
clear verdicts.
- Not deception-proofing. A sufficiently sophisticated adversarial agent can produce thinking and actions that consistently match a declared posture while being misaligned in ways none of the layers observe.
- Not a replacement for principal oversight. Escalation still routes to the principal; humans remain the ultimate judges of consequential decisions.
These non-goals are load-bearing. A trust plane that overclaims erodes the legibility it is trying to produce.
10. Limitations
These limitations are non-negotiable. They define what the platform is.
10.1 AAP does NOT ensure alignment — it provides visibility
An agent can declare values it does not hold; produce perfect traces while acting against its principal’s interests; pass every verification check while being fundamentally misaligned. The word “alignment” in “Agent Alignment Protocol” refers to the declared alignment posture — what the agent claims about itself. It does not refer to actual alignment between behavior and values.
10.2 Verified does NOT equal safe
When verification returns verified: true, the AP-Trace is internally consistent and matches declared structure. It does NOT mean the decision was correct, the outcome was beneficial, or the agent is safe to trust. Verification checks consistency, not correctness.
10.3 AP-Traces are sampled, not complete
Traces capture decision points the agent chooses to log. An agent could implement selective logging — logging decisions that look good, executing others silently. The absence of a trace means nothing was logged, not that nothing happened.
10.4 AIP depends on thinking block availability
AIP can only analyze what the provider exposes. If a provider does not expose thinking blocks, AIP falls back to response-text inference with significantly reduced confidence (0.3 vs. 1.0). Providers may truncate, summarize, or omit portions of internal reasoning. Models that emit no reasoning produce synthetic clear verdicts — the agent proceeds without cognitive analysis.
10.5 AIP is gameable by sophisticated adversaries
An agent that knows it is monitored by AIP could produce thinking blocks designed to pass integrity analysis while harboring undisclosed intent. AIP raises the bar — the adversary must maintain consistent, card-aligned reasoning across every thinking block — but it does not eliminate the possibility of deception.
10.6 The conscience prompt is not infallible
The analysis LLM may produce false positives (flagging benign thinking as violations) or false negatives (missing genuine violations). Minimum-evidence thresholds, priority ordering, and typed conscience injection are calibrated to minimize these errors but cannot eliminate them.
10.7 Safe House depends on pattern coverage and adversary sophistication
Safe House’s L1 heuristics cover documented attack patterns and multilingual lexicons; L2 semantic analysis catches obfuscation and novel attacks at higher latency. Attacks outside both pattern coverage and semantic detection capability may pass undetected. Sophisticated context-boundary attacks — semantically benign on the surface, weaponized only in combination with specific agent capabilities — may reach the agent despite screening. Canaries remain a strong fallback: they convert silent compromise into loud evidence.
10.8 Value coherence is relative to declared values
The handshake and the dimensional fleet scorer check declared compatibility, not actual alignment. Two misaligned agents with coherent declared values can coordinate on outcomes that harm their principals. The dimensional scorer surfaces more structure than the baseline handshake (insufficient evidence, outliers, weakest pairs, conscience universality, integrity uniformity) — but it still cannot verify that declarations correspond to behavior.
10.9 Cryptographic proof proves computation, not judgment
This limitation is fundamental to the attestation layer and must not be understated.
A clear verdict with a valid STARK proof means: the LLM produced an analysis with no medium-or-higher severity concerns, and the deterministic rules correctly mapped that analysis to clear. It does not mean the thinking block was genuinely safe. The LLM could have missed a subtle prompt injection. The conscience prompt could have been insufficient for the specific attack vector. The thinking block could have been too short for reliable analysis.
Symmetrically, a boundary_violation with a valid proof means: the LLM identified at least one concern meeting the boundary criteria, and the rules correctly escalated. It does not mean the agent was actually misaligned. The LLM could have produced a false positive.
The proof makes the judgment process verifiable. It does not make the judgment itself correct. The LLM analysis stage remains a trust assumption — clearly delineated and honestly stated.
10.10 Attestation does not prevent all tampering
Ed25519 signing-key compromise would enable verdict forgery. Chain rewriting is detectable via Merkle root inconsistency, but the verifier must have cached a previous root to detect retroactive changes. Merkle inclusion verification requires an online lookup of the current root. On-chain anchoring reduces the trust surface further — anchors are immutable once included in a Base L2 block — but it does not change the fundamental posture: compromise of the signing key within an anchoring window remains possible.
These are standard limitations of public-key cryptography and Merkle-based systems, mitigated by key rotation, root pinning, layered design, and the audit expectation that verifiers cache roots.
10.11 Calibration bounds
All empirically derived thresholds (similarity, sustained turns, minimum evidence, Safe House score cutoffs) were calibrated on transformer-to-transformer dialogue. Non-transformer architectures, non-deliberative tasks, and adversarial agents may require recalibration.
11. Regulatory and standards alignment
11.1 EU AI Act Article 50
The EU AI Act’s transparency obligations (enforcement August 2026) require AI systems to:
- Inform users of AI interaction → alignment card
principal + disclosure fields in extensions.eu_ai_act.
- Provide machine-readable AI content marking → AP-Trace structured format + checkpoint metadata.
- Maintain transparency of decision-making → AP-Trace
decision.selection_reasoning and values_applied.
- Retain audit trails →
audit.retention_days (configurable; default 90).
- Provide verifiable evidence of compliance → integrity certificates with cryptographic proofs, independently verifiable via public endpoints.
The attestation layer strengthens the compliance story from transparency to verifiable transparency. See EU AI Act Compliance.
11.2 NIST agent identity and authorization
NIST’s NCCoE concept paper on AI agent identity and authorization (February 2026) identifies four focus areas — identification, authorization, access delegation, and logging/transparency. The trust plane maps to all four: alignment cards with discoverable endpoints; autonomy envelopes with escalation triggers; principal-agent delegation chains with typed relationships; AP-Traces, integrity checkpoints, and cryptographically attested certificates.
11.3 Content provenance standards
The integrity certificate format draws on two established standards:
- C2PA (Coalition for Content Provenance and Authenticity). Certificate structure mirrors C2PA manifests — claims about a subject, input commitments analogous to ingredient hashes, and multiple proof types in a single document. The domain differs — AI behavioral analysis provenance rather than media provenance — but the structural patterns enable future interoperability.
- W3C Verifiable Credentials Data Model (v2.0, 2024). The issuer-subject-claims-proof structure follows the VC conceptual model. The
verification section provides the VC-equivalent verification methods.
11.4 On-chain reputation standards
The on-chain anchoring contracts are aligned with ERC-8004, a standard interface for on-chain reputation registries. Any contract or dApp that speaks ERC-8004 can consume Mnemom reputation scores without bespoke integration.
11.5 Additional standards
- ISO/IEC 42001:2023 — AI management systems.
- ISO/IEC 42005:2025 — AI impact assessment.
- IEEE 7001-2021 — Transparency of autonomous systems.
- IEEE 3152-2024 — Transparent human and machine agency identification.
- Singapore IMDA Model AI Governance Framework for Agentic AI (January 2026).
- NIST AI RMF 1.0 — The GOVERN and MAP functions align with documented risk management; attestation chains provide the cryptographic backbone.
- WEF AI Governance Alliance / Presidio AI Framework — Trust verification rather than trust assumption.
Detailed mappings are published in the protocol specifications and the certificate format specification.
12. Data handling and compliance surfaces
The platform is explicit about what is and is not stored, and what customer-controllable surfaces govern retention.
Stored (in managed Supabase Postgres with encryption at rest): AP-Traces, integrity checkpoints (including the SHA-256 thinking-block hash), alignment cards, protection cards, canonical cards, drift alerts, enforcement records, reputation and risk scores, signing keys, Merkle trees, certificates, and verdict proofs.
Not stored: user prompts (pass transiently through the gateway); LLM responses (only structured traces and checkpoints retained); full thinking-block content (only the SHA-256 hash); raw API keys (only SHA-256 hashes).
Right-to-deletion. DELETE /v1/agents/{id} returns 202 Accepted and initiates a two-phase asynchronous cascade: immediate tombstoning (the agent is inaccessible to all read paths from that moment), followed by a phase-by-phase deletion across Supabase tables, Cloudflare KV namespaces, and off-chain on-chain-mapping tables. Progress is queryable via GET /v1/agents/{id}/deletion-status. On-chain Merkle roots are not deleted (immutable by design); they were constructed to contain no agent-identifiable material in leaf pre-images, so deleting the off-chain mapping severs all linkage between the agent and any anchored root. See GDPR Data Subject Rights.
Authentication. Customer authentication supports session cookies (encrypted AES-256-GCM HttpOnly cookies with key-id rotation), bearer JWTs (for CLI and programmatic flows), and API keys (SHA-256 hashed at rest). Passkey (WebAuthn) primary authentication is available alongside password and SSO, with monotonic counter verification and resident-credential discoverability. Per-user session revocation and new-device email notifications surface compromise discovery without requiring users to visit a settings panel. See Passkeys and Security & Trust.
API versioning. The platform uses URL major versioning (/v1/) plus date-based minor versioning (X-Mnemom-Version: YYYY-MM-DD) following the Stripe model. Breaking changes get new dates; the standard deprecation support window is 18 months from supersedence, reflecting the reality that AI-agent API consumers may be hardcoded in contexts that cannot respond to deprecation signals. See API Versioning.
Alignment research and frameworks. The alignment card concept draws on constitutional AI, value-aligned design, and the broader alignment literature — but the card itself is not an alignment technique; it is a declaration substrate that alignment techniques can populate and auditors can evaluate against.
Agent coordination protocols. A2A, MCP, and AP2 cover coordination, tool integration, and payment authorization. The platform extends rather than replaces — alignment cards extend the A2A Agent Card; AP-Traces cover MCP tool invocations; the integrity layer attaches to any transport.
Content provenance. C2PA (media provenance) and W3C Verifiable Credentials (general credentials) provide the structural patterns the certificate format follows. The domain is different (AI behavioral analysis rather than media), but the verification semantics map cleanly.
Zero-knowledge AI. Modulus Labs and Giza work on full-inference proofs; EQTY Lab on hardware attestation; zkML research on optimized neural-network proving. The platform occupies a complementary position: proving deterministic judgment rather than model inference, at costs that make per-checkpoint proving practical today.
LLM-as-judge. The conscience prompt is a specific instance of LLM-as-judge evaluation with explicit meta-injection guards and a daimonion framing. Limitations of LLM-as-judge (bias, adversarial robustness) carry over and are honestly acknowledged (Section 10.6).
On-chain reputation. ERC-8004 defines a standard interface for on-chain reputation registries; the platform’s Base L2 contracts implement this interface. Prior work in decentralized identity (DIDs, VCs) and Sybil resistance informs the design of agent-ID–to-on-chain-score mappings.
Prompt injection defense. Safe House draws on the active research literature on prompt injection detection — direct, indirect, and multimodal — and on operator practice in the space (regex heuristics, small-model classifiers, canary tokens). The multilingual L1 patterns and the typed threat taxonomy reflect this literature.
14. Future work
The following directions are under design or internal exploration; no dates are committed.
- OpenTelemetry GenAI SIG standardization. Propose alignment, integrity, and attestation semantic conventions to the OpenTelemetry GenAI Semantic Convention Special Interest Group. GenAI agent spans already exist (
create_agent, invoke_agent); alignment, integrity, and attestation spans are the natural next layer.
- N-way collective coherence. Extend dimensional coherence from pairwise to n-agent collective structure — coalition dynamics, emergent group behaviors, and multi-role specialization ontologies.
- Cross-substrate calibration. Extend calibration to non-transformer architectures.
- TEE integration for the analysis stage. Run the AIP analysis engine inside an Intel TDX or ARM CCA trusted execution environment. Combined with zkVM proofs for verdict derivation, this would close the LLM-analysis trust assumption, leaving only TEE hardware manufacturer trust.
- Recursive ZK proofs. SP1 supports proof composition; a single proof attesting that all N checkpoints in a session were derived correctly would reduce storage costs and enable session-level certificates.
- Threshold signatures. Multi-party computation threshold signatures for signing-key management, so no single party holds the complete key and collusion of t-of-n parties is required to forge.
- Adversarial robustness characterization. Formalize the boundary between agents that the trust plane can detect and those it cannot, via red-team exercises with adversarial agent deployments.
15. Conclusion
The three oversight layers — AAP for behavior, AIP for cognition, Safe House for context — address the three failure modes defined in §2. Unified alignment cards give the layers a common contract. The managed gateway gives customers a single integration point that activates them together. Each layer’s output flows into a shared attestation chain rather than living in a product silo, so correlation across layers is structural rather than operational.
Cryptographic attestation makes the oversight verifiable. Ed25519 signatures, per-session hash chains, and per-agent Merkle accumulators make verdicts non-fabricatable; SP1 STARK proofs show that the deterministic verdict-derivation rules were applied honestly; aggregated Merkle roots anchored on Base L2 let any third party confirm the chain without trusting Mnemom’s infrastructure. The conceptual move that makes this practical at per-checkpoint cost is the auditor-model distinction — proving deterministic judgment rather than model inference, and naming the residual trust assumption (the LLM analysis stage) explicitly rather than eliding it.
What remains is the work of building alignment infrastructure that scales with the agents we are deploying — infrastructure where the response to “why should I trust this?” is not “because we said so” but “because the math checks out, and the things the math does not reach are named explicitly.”
References
- A2A Protocol Specification. Google / Linux Foundation, 2025.
- Model Context Protocol (MCP). Anthropic / Linux Foundation, 2025.
- Agent Payments Protocol (AP2). Google, 2026.
- EU AI Act. Regulation (EU) 2024/1689. European Parliament and Council, 2024.
- NIST NCCoE Concept Paper: “Accelerating the Adoption of Software and AI Agent Identity and Authorization.” February 2026.
- ISO/IEC 42001:2023. Artificial Intelligence — Management System.
- ISO/IEC 42005:2025. Artificial Intelligence — AI Impact Assessment.
- IEEE 7001-2021. Transparency of Autonomous Systems.
- IEEE 3152-2024. Transparent Human and Machine Agency Identification.
- Singapore IMDA Model AI Governance Framework for Agentic AI. January 2026.
- OpenTelemetry GenAI Semantic Conventions. CNCF, 2025.
- NIST SP 800-207. Zero Trust Architecture. August 2020.
- NIST SP 800-63-4. Digital Identity Guidelines. 2024.
- Succinct Labs. “SP1 zkVM Documentation.” 2025.
- C2PA (Coalition for Content Provenance and Authenticity). “C2PA Technical Specification.” 2025.
- W3C. “Verifiable Credentials Data Model v2.0.” W3C Recommendation, 2024.
- NIST. “Artificial Intelligence Risk Management Framework (AI RMF 1.0).” NIST AI 100-1. January 2023.
- World Economic Forum. “Presidio AI Framework: Towards Safe Generative AI Models.” 2024.
- World Economic Forum. “Navigating the AI Frontier: Agent Governance.” AI Governance Alliance. January 2026.
- ERC-8004. On-Chain Reputation Registry Standard. Ethereum Improvement Proposals.
- Merkle, R. C. “A Certified Digital Signature.” Advances in Cryptology — CRYPTO ‘89. Springer, 1989.
- Bernstein, D. J., Duif, N., Lange, T., Schwabe, P., and Yang, B.-Y. “High-speed high-security signatures.” Journal of Cryptographic Engineering, 2(2):77–89, 2012.
- FIPS 180-4. “Secure Hash Standard (SHS).” NIST, August 2015.
- Ben-Sasson, E., Bentov, I., Horesh, Y., and Riabzev, M. “Scalable, transparent, and post-quantum secure computational integrity.” IACR Cryptology ePrint Archive, 2018.
- Goldwasser, S., Micali, S., and Rackoff, C. “The Knowledge Complexity of Interactive Proof Systems.” SIAM Journal on Computing, 18(1):186–208, 1989.
Glossary
Agent. An autonomous software entity capable of taking actions on behalf of a principal.
Alignment card. The YAML card declaring an agent’s identity, principal, values, conscience, autonomy, capabilities, enforcement, audit commitment, and extensions. Composed from platform, org, and agent scopes into a canonical card at storage time.
AP-Trace. An audit log entry recording an agent’s decision process — action, decision (alternatives, selection, reasoning, values applied), and escalation evaluation. The primary AAP primitive.
Attestation. The process of creating cryptographic evidence binding an oversight verdict to its inputs and history: input commitments, Ed25519 signatures, hash chains, Merkle proofs, STARK proofs, and on-chain anchors.
Autonomy envelope. The set of actions an agent may take without escalation, the set it must never take, and the conditions that trigger escalation.
Base L2. Coinbase’s Ethereum Layer 2 network, where MnemoReputationRegistry and MnemoMerkleAnchor are deployed.
Canonical card. The fully-composed alignment or protection card served to the gateway, observer, and other read paths. Computed at storage time, cached in KV with short TTL, and invalidated on recompose events.
Chain hash. A SHA-256 hash linking each checkpoint to its predecessor within a session, creating a tamper-evident sequence.
Composer. The system component that resolves raw authored cards plus org templates plus platform defaults plus exemptions into a canonical card, guaranteeing schema invariants by construction.
Conscience prompt. The structured prompt sent to the AIP analysis LLM, instructing it to evaluate a thinking block against the canonical alignment card. Five-layer structure: role, card summary, conscience values, session window, thinking block.
Conscience value. A typed declaration (BOUNDARY, FEAR, COMMITMENT, BELIEF, HOPE) that augments alignment-card evaluation. BOUNDARY and FEAR are injected into AIP’s conscience prompt.
Daimonion. The AIP analysis model’s role — a conscience voice that is silent when aligned and speaks when outside boundaries.
Drift. Behavioral deviation from declared alignment posture over time. AAP detects cross-session drift; AIP detects session-scale integrity drift; both write to a unified alert storage with type-prefixed categories.
Escalation. Deferring a decision to a principal or higher-authority agent; triggered by declared conditions in the alignment card’s autonomy envelope.
Exemption. A granular, auditable waiver of a specific scope’s contribution to a canonical card for a specific agent, with a reason, a granter, and an expiry. Replaces the pre-unified boolean org_card_exempt.
Fleet coherence. Dimensional scoring of a team of agents against their declared cards. Returns a vector with narrative helpers rather than a single blended number; checks structural invariants (universal conscience, uniform integrity mode).
Input commitment. A SHA-256 hash over all inputs to an integrity analysis, binding a verdict to specific inputs.
Integrity certificate. A self-describing JSON document packaging all cryptographic evidence for a checkpoint: signature, chain, Merkle inclusion, input commitments, and (when available) a verdict-derivation STARK proof.
Integrity checkpoint. A structured verdict on an agent’s thinking block; the primary AIP primitive.
Managed gateway. Mnemom’s hosted LLM proxy at gateway.mnemom.ai, serving as the single integration point that activates AAP, AIP, Safe House, and cryptographic attestation.
Merkle tree. A per-agent append-only binary hash tree enabling O(log N) inclusion proofs over checkpoint history. Roots are periodically anchored on Base L2.
Mnemom Trust Rating. A composite 0–1000 reputation score with AAA–CCC letter grades, derived from independently verified behavioral evidence, requiring a minimum of 50 analyzed checkpoints.
Principal. The human, organization, or agent whose interests an agent serves. Declared in the alignment card’s principal section with a typed relationship (delegated authority, advisory, autonomous).
Protection card. The YAML card declaring how Safe House defends an agent — mode, thresholds, screen surfaces, trusted sources. Composed across the same three scopes as the alignment card.
Safe House. The contextual oversight layer: inbound and outbound screening, three-layer detection (L1 heuristics, L2 semantic, L3 session escalation), canary credentials, source-trust multipliers.
STARK proof. A Scalable Transparent Argument of Knowledge generated by the SP1 zkVM; used to prove verdict derivation, individual risk computation, and team risk computation.
Thinking block. The internal reasoning content produced by a model before or alongside its visible response; the AIP input.
Trust plane. The unified governance substrate comprising AAP + AIP + Safe House + cryptographic attestation + unified cards + the managed gateway.
Unified cards. The two-YAML-card model (alignment + protection) that replaces the pre-unified state of disjoint AAP cards, CLPI policy files, and Safe House JSON configurations.
Value coherence. The degree to which two agents’ declared values are compatible; the AAP 1.0 handshake computes a baseline Jaccard-flavored score, while @mnemom/team-coherence/v2 computes a dimensional evidence-based score.
Verdict derivation. The deterministic mapping from analysis concerns to a verdict and recommended action — the specific computation proven by the SP1 zkVM in the verdict-derivation proof.
Verification gap. The distance between trusting an oversight service’s output and independently confirming that the service applied its stated rules honestly. Closed by the cryptographic attestation chain.
This document is released under CC BY 4.0. Copyright 2026 Mnemom LLC.