Protection Card

The protection card is the YAML document that tells Safe House how to defend an agent at runtime. It is one of the two cards every Mnemom agent has — the alignment card declares who the agent is; the protection card declares how the agent is defended. The protection card elevates what used to be an ad-hoc Safe House JSON config into a first-class YAML card with three-scope composition (platform > org > agent), granular exemptions, and the same audit + amendment semantics as the alignment card.

Structure

A protection card has four top-level sections. The complete schema is at /specifications/protection-card-schema; this page describes each section’s purpose.

`mode`

Top-level action policy for Safe House on this agent. Shares the four-mode enum (off | observe | nudge | enforce) with the alignment card’s two master switches (autonomy_mode and integrity_mode) — same words, same semantics, same UI picker.

off — detection skipped entirely (cost / latency / non-applicability cases). No telemetry.
observe — all detectors run, signals are logged asynchronously to the trace + reputation pipeline, no action is taken on the agent’s request.
nudge — detectors run synchronously; matches attach an advisory annotation to the agent’s prompt context (and an X-Mnemom-Advisory response header — see the Headers reference) but the request proceeds. The model sees the advisory as part of its context.
enforce — detectors run synchronously; matches block the request (quarantine ≥ quarantine threshold, hard block ≥ block threshold).

enforce is implicitly synchronous — to block a request, the gateway must wait for the verdict. There is no separate enforce_sync mode. Composition: strictest wins across enforce > nudge > observe > off. An agent cannot drop below the platform/org floor.

mode: enforce

AEGIS Managed Rules tune protection-card behavior across the network. When a new Managed Rule promotes platform-scope (Ed25519-signed; tier-3 auto-promoted with 24h observe soak; tier-1/-2 dual-control), the gateway loads it via the tiered KV → R2 → isolate read substrate and adds the rule’s detection thresholds to the screening pipeline your protection card configured. Your card’s mode (off / observe / nudge / enforce) determines what action the gateway takes when a Managed Rule matches.

`thresholds`

Three-band escalation ladder for Safe House detector scores. All values are floats in [0, 1].

thresholds:
  warn: 0.60         # informational annotation in observe/nudge; soft annotation in enforce
  quarantine: 0.80   # quarantine in enforce; informational in observe/nudge
  block: 0.95        # hard block in enforce

Three bands map cleanly onto the SOC severity ladder familiar to most operators. Per-detector tuning is an internal calibration concern and not exposed in the schema. Validation enforces warn ≤ quarantine ≤ block.

`screen_surfaces`

Which request surfaces Safe House inspects:

incoming — the user/principal prompt entering the agent
outgoing — the agent’s response leaving the agent
tool_calls — tool-use invocations the agent makes
tool_responses — responses to those tool calls

screen_surfaces:
  incoming: true
  outgoing: true
  tool_calls: true
  tool_responses: false      # don't inspect tool return values

Default is all-true (scan everything). Turning off a surface is a deliberate performance/coverage trade-off and is logged in every detection event so auditors can see what was not inspected.

`trusted_sources`

Per-bucket allowlist of upstream sources whose content Safe House skips detection for. Buckets are typed so the validator can apply per-bucket deny-lists and the composer can apply per-bucket intersection rules.

trusted_sources:
  domains:
    - internal.mnemom.ai
    - vendor-api.example.com:8080
  agent_ids:
    - mnm-aabbccdd-eeff-0011    # agent-to-agent pass-through
  ip_ranges:
    - 10.0.0.0/8                # RFC1918 internal space

The validator deny-lists public LLM endpoints (api.openai.com, etc.), public DNS resolvers (8.8.8.0/24, 1.1.1.0/24), and any-host CIDRs (0.0.0.0/0, ::/0). Adding a publicly-routable IP range or a customer-controllable domain is a critical misconfiguration even if it passes the deny-list. Trusted sources cause Safe House to skip detection (no detector cycles spent), and every match emits an sh_trusted_source_skip audit trace so reviewers can see what was waved through.

Composition across scopes

Like the alignment card, the protection card composes from platform > org > agent:

Section	Composition rule
`mode`	Strictest wins (`enforce > nudge > observe > off`). An agent cannot drop below the platform/org floor.
`thresholds.*`	Min across scopes — lowest = strictest wins. An agent can tighten further than the platform/org but not loosen.
`screen_surfaces.*`	OR per field — true wins. If any scope requires scanning a surface, it’s scanned.
`trusted_sources`	Platform → agent intersection (compliance ceiling); org + agent union within that ceiling. An agent cannot widen trust beyond what the platform allows.

See Card Composition for the full rules + worked examples.

Exemptions

A protection-card exemption waives a specific threshold or surface for a specific agent with a stated reason, expiry, and audit trail.

# Granted via:
#   POST /v1/agents/{agent_id}/exemptions
# Body:
#   exempt_section: "protection.thresholds.canary_match"
#   reason: "Agent requires canary in prompts for debugging red-team scenarios"
#   granted_by: "[email protected]"
#   expires_at: "2026-07-17T00:00:00Z"

Exemptions replace the legacy boolean org_card_exempt flag — which waived the whole card at once — with section-specific, audit-logged, time-bounded grants.

How the protection card is used

Gateway (request path)

Every request hits:

canonical_protection_cards read (KV-cached, 5-min TTL) to get the composed protection card.
Detectors inspect enabled screen_surfaces with the composed thresholds.
mode determines the action: observe (log only), nudge (advisory), enforce (block).
trusted_sources short-circuits detectors for allowlisted upstreams (with trace entry).

The protection card is never fetched from the agent-scope row on the request path — it’s always the canonical (pre-composed) version.

Observer (trace analysis)

The observer pipeline does not read the protection card. Protection is inline at the gateway; the observer’s role is to reconcile and enrich traces after the fact. Detector signals produced at the gateway are written into the trace itself.

Website + CLI

mnemom protection show                         # canonical protection card (YAML)
mnemom protection edit                         # open in $EDITOR
mnemom protection publish protection.card.yaml # validate + publish + recompose
mnemom protection validate protection.card.yaml

The website agent detail → Security tab shows both raw agent-scope and canonical-composed views.

Modes across both cards

All three master switches — the alignment card’s autonomy_mode and integrity_mode, and the protection card’s mode — share the same four-mode enum and the same UI picker:

Value	Alignment (`integrity_mode`)	Protection (`mode`)
`off`	Skip AIP + drift detection	Detection skipped entirely
`observe`	Integrity checkpoints run; violations logged	Safe House detectors run; signals logged
`nudge`	Violations inject advisory on next request	Detectors attach advisory; request proceeds
`enforce`	Violations hard-block (auto-pause on boundary)	Detectors may block (quarantine / block)

A well-run fleet has both dimensions aligned across all agents. The fleet coherence v2 scorer checks integrity uniformity as a first-order structural invariant.

Overview

Concepts

Gateway

Pricing

Migrations

Policy

Specifications

Changelog

Structure

`mode`

`thresholds`

`screen_surfaces`

`trusted_sources`

Composition across scopes

Exemptions

How the protection card is used

Gateway (request path)

Observer (trace analysis)

Website + CLI

Modes across both cards

See also

​Structure

​mode

​thresholds

​screen_surfaces

​trusted_sources

​Composition across scopes

​Exemptions

​How the protection card is used

​Gateway (request path)

​Observer (trace analysis)

​Website + CLI

​Modes across both cards

​See also

Structure

`mode`

`thresholds`

`screen_surfaces`

`trusted_sources`

Composition across scopes

Exemptions

How the protection card is used

Gateway (request path)

Observer (trace analysis)

Website + CLI

Modes across both cards

See also