Skip to main content
The protection card is the YAML document that tells Safe House how to defend an agent at runtime. It is one of the two cards every Mnemom agent has — the alignment card declares who the agent is; the protection card declares how the agent is defended. The protection card elevates what used to be an ad-hoc Safe House JSON config into a first-class YAML card with three-scope composition (platform > org > agent), granular exemptions, and the same audit + amendment semantics as the alignment card.

Structure

A protection card has four top-level sections. The complete schema is at /specifications/protection-card-schema; this page describes each section’s purpose.

mode

Top-level action policy for Safe House on this agent. Mirrors the alignment card’s integrity.enforcement_mode enum plus an off value for explicit opt-out.
  • off — detection skipped entirely (cost / latency / non-applicability cases). No telemetry.
  • observe — all detectors run, signals are logged asynchronously to the trace + reputation pipeline, no action is taken on the agent’s request.
  • nudge — detectors run synchronously; matches attach an advisory annotation to the agent’s prompt context (and an X-Safe-House-Advisory response header) but the request proceeds. The model sees the advisory as part of its context.
  • enforce — detectors run synchronously; matches block the request (quarantine ≥ quarantine threshold, hard block ≥ block threshold).
enforce is implicitly synchronous — to block a request, the gateway must wait for the verdict. There is no separate enforce_sync mode. Composition: strictest wins across enforce > nudge > observe > off. An agent cannot drop below the platform/org floor.
mode: enforce

thresholds

Three-band escalation ladder for Safe House detector scores. All values are floats in [0, 1].
thresholds:
  warn: 0.60         # informational annotation in observe/nudge; soft annotation in enforce
  quarantine: 0.80   # quarantine in enforce; informational in observe/nudge
  block: 0.95        # hard block in enforce
Three bands map cleanly onto the SOC severity ladder familiar to most operators. Per-detector tuning is an internal calibration concern and not exposed in the schema. Validation enforces warn ≤ quarantine ≤ block.

screen_surfaces

Which request surfaces Safe House inspects:
  • incoming — the user/principal prompt entering the agent
  • outgoing — the agent’s response leaving the agent
  • tool_calls — tool-use invocations the agent makes
  • tool_responses — responses to those tool calls
screen_surfaces:
  incoming: true
  outgoing: true
  tool_calls: true
  tool_responses: false      # don't inspect tool return values
Default is all-true (scan everything). Turning off a surface is a deliberate performance/coverage trade-off and is logged in every detection event so auditors can see what was not inspected.

trusted_sources

Per-bucket allowlist of upstream sources whose content Safe House skips detection for. Buckets are typed so the validator can apply per-bucket deny-lists and the composer can apply per-bucket intersection rules.
trusted_sources:
  domains:
    - internal.mnemom.ai
    - vendor-api.example.com:8080
  agent_ids:
    - mnm-aabbccdd-eeff-0011    # agent-to-agent pass-through
  ip_ranges:
    - 10.0.0.0/8                # RFC1918 internal space
The validator deny-lists public LLM endpoints (api.openai.com, etc.), public DNS resolvers (8.8.8.0/24, 1.1.1.0/24), and any-host CIDRs (0.0.0.0/0, ::/0). Adding a publicly-routable IP range or a customer-controllable domain is a critical misconfiguration even if it passes the deny-list. Trusted sources cause Safe House to skip detection (no detector cycles spent), and every match emits an sh_trusted_source_skip audit trace so reviewers can see what was waved through.

Composition across scopes

Like the alignment card, the protection card composes from platform > org > agent:
SectionComposition rule
modeStrictest wins (enforce > nudge > observe > off). An agent cannot drop below the platform/org floor.
thresholds.*Min across scopes — lowest = strictest wins. An agent can tighten further than the platform/org but not loosen.
screen_surfaces.*OR per field — true wins. If any scope requires scanning a surface, it’s scanned.
trusted_sourcesPlatform → agent intersection (compliance ceiling); org + agent union within that ceiling. An agent cannot widen trust beyond what the platform allows.
See Card Composition for the full rules + worked examples.

Exemptions

A protection-card exemption waives a specific threshold or surface for a specific agent with a stated reason, expiry, and audit trail.
# Granted via:
#   POST /v1/agents/{agent_id}/exemptions
# Body:
#   exempt_section: "protection.thresholds.canary_match"
#   reason: "Agent requires canary in prompts for debugging red-team scenarios"
#   granted_by: "org-admin@example.com"
#   expires_at: "2026-07-17T00:00:00Z"
Exemptions replace the pre-UC boolean org_card_exempt flag — which waived the whole card at once — with section-specific, audit-logged, time-bounded grants.

How the protection card is used

Gateway (request path)

Every request hits:
  1. canonical_protection_cards read (KV-cached, 5-min TTL) to get the composed protection card.
  2. Detectors inspect enabled screen_surfaces with the composed thresholds.
  3. mode determines the action: observe (log only), nudge (advisory), enforce (block).
  4. trusted_sources short-circuits detectors for allowlisted upstreams (with trace entry).
The protection card is never fetched from the agent-scope row on the request path — it’s always the canonical (pre-composed) version.

Observer (trace analysis)

The observer pipeline does not read the protection card. Protection is inline at the gateway; the observer’s role is to reconcile and enrich traces after the fact. Detector signals produced at the gateway are written into the trace itself.

Website + CLI

mnemom protection show                         # canonical protection card (YAML)
mnemom protection edit                         # open in $EDITOR
mnemom protection publish protection.card.yaml # validate + publish + recompose
mnemom protection validate protection.card.yaml
The website agent detail → Security tab shows both raw agent-scope and canonical-composed views.

Modes across both cards

Both the alignment card (via integrity.enforcement_mode) and the protection card (via mode) share the observe / nudge / enforce vocabulary; the protection card also adds off for explicit opt-out:
ValueAlignment (integrity.enforcement_mode)Protection (mode)
off(n/a — alignment integrity always runs)Detection skipped entirely
observeIntegrity checkpoints run; violations loggedSafe House detectors run; signals logged
nudgeViolations nudge the agent (soft)Detectors attach advisory; request proceeds
enforceViolations hard-blockDetectors may block (quarantine / block)
A well-run fleet has both dimensions aligned across all agents. The fleet coherence v2 scorer checks integrity uniformity as a first-order structural invariant.

Migration from legacy Safe House config

Pre-UC, Safe House read a JSON sh_configs row keyed by agent_id. The UC-3 migration script composed every existing sh_configs row into a protection-card YAML and wrote the canonical version.
  • Legacy endpoint GET/PUT /v1/agents/:id/cfd/config is removed. Use GET/PUT /v1/agents/:id/protection-card.
  • The internal SafeHouseConfig TypeScript type that Safe House detectors consume is stable (per ADR-008 Edge 4); the UC work swapped the fetch source, not the type. If you’re embedding Safe House detectors, your integration did not need to change.

See also