Skip to main content
Normative reference for the protection card — the YAML document that configures Safe House for a specific agent, and one half of every Mnemom agent’s two cards. This page specifies every section, field, required/optional status, type, and composition semantic. Conceptual overview: /concepts/protection-card. Alignment-card spec: /specifications/alignment-card-schema. Card composition rules across platform/org/agent scopes: /concepts/card-composition.

Top-level structure

card_version: protection/2026-04-26   # required; string; schema version
card_id: pc-<uuid>                    # required on canonical output; assigned by composer
agent_id: mnm-<uuid>                  # required; string
issued_at: 2026-04-26T12:00:00Z       # required on canonical output
expires_at: null                      # optional; protection cards rarely expire

mode: enforce                         # required; see §mode
thresholds: { ... }                   # optional; see §thresholds
screen_surfaces: { ... }              # optional; see §screen-surfaces
trusted_sources: { ... }              # optional; see §trusted-sources
extensions: { ... }                   # optional; §extensions

_composition: { ... }                 # canonical-only; see §composition-metadata

§mode

Top-level action policy for Safe House on this agent. Mirrors the alignment card’s integrity.enforcement_mode enum plus an off value for explicit opt-out.
mode: enforce   # "off" | "observe" | "nudge" | "enforce"
ValueBehavior
offDetection skipped entirely. No telemetry. For cost / latency / non-applicability cases.
observeDetectors run, signals logged asynchronously, no request-path action.
nudgeDetectors run synchronously; matches attach an advisory annotation to the agent’s prompt context (and an X-Safe-House-Advisory response header) but the request proceeds.
enforceDetectors run synchronously; matches block the request (quarantine ≥ quarantine threshold, hard block ≥ block threshold).
enforce implies synchronous verdict — to block a request, the gateway must wait for the verdict before delivering the message. There is no separate enforce_sync mode. Composition: strictest wins across enforce > nudge > observe > off. An agent cannot drop below the platform/org floor. nudge is the load-bearing middle ground: the model receives the advisory as part of its prompt context, so the security signal reaches the model without blocking the request. Customers running long-tail-confidence detectors typically run nudge rather than enforce until thresholds settle.

§thresholds

Three-band escalation ladder for Safe House detector scores. All values are floats in [0, 1].
thresholds:
  warn: 0.60         # required when thresholds is present
  quarantine: 0.80   # required when thresholds is present
  block: 0.95        # required when thresholds is present
FieldRangeMeaning
warn[0, 1]Score at-or-above triggers a warn-level annotation in observe/nudge mode (and a soft annotation in enforce).
quarantine[0, 1]Score at-or-above triggers a quarantine in enforce mode (message held for review); informational in observe/nudge.
block[0, 1]Score at-or-above triggers a hard block in enforce mode.
Validation: warn ≤ quarantine ≤ block. The validator rejects any out-of-order combination at write time. Composition: min across scopes. The lowest threshold wins, since lower = stricter (matches sooner). An agent cannot loosen a stricter platform/org threshold; it can only tighten further. Three bands map cleanly onto the SOC severity ladder familiar to most operators. Per-detector tuning is an internal calibration concern and is not exposed in the schema.

§screen_surfaces

Which request surfaces Safe House inspects, named by direction (incoming/outgoing) and tool relationship.
screen_surfaces:
  incoming: true         # the prompt/message reaching the agent
  outgoing: true         # the agent's generated response
  tool_calls: true       # arguments to tool invocations
  tool_responses: true   # values returned by tools
FieldDefaultMeaning
incomingtrueInbound prompts: user messages, webhook triggers, queue messages, API calls — anything entering the agent.
outgoingtrueThe agent’s generated response leaving the agent.
tool_callstrueArguments the agent sends to tool invocations (outbound tool side).
tool_responsestrueReturn values from tool calls reaching the agent (inbound tool side).
Validation: Only the four named keys are accepted. Unknown keys are rejected at write time. Composition: OR per field — true wins. If any scope sets a surface to true, it’s scanned. Agents cannot disable scanning that org or platform requires. Phrased in alignment-card vocabulary: strictest wins (with true = scan being the more restrictive choice). Direction-based naming is durable across transport changes: an agent receiving a webhook trigger is “incoming” whether it’s a user message, an API event, or a queue payload. Differentiating tool_calls from tool_responses reflects that they have different threat models — outgoing tool args may exfiltrate; incoming tool responses may inject. Turning off a surface emits a low-priority audit trace so reviewers can see what was not scanned. If you need to disable a surface for a specific agent, the recommended path is an exemption with a documented reason rather than a raw false in the agent card.

§trusted_sources

Per-bucket allowlist of upstream sources whose content Safe House skips detection for. The buckets are typed so the validator can apply per-bucket deny-lists and the composer can apply per-bucket intersection rules.
trusted_sources:
  domains:
    - internal.acme.com
    - vendor-api.example.com:8080
  agent_ids:
    - mnm-aabbccdd-eeff-0011         # agent-to-agent pass-through
  ip_ranges:
    - 10.0.0.0/8                      # RFC1918 internal space
    - 172.16.0.0/12
FieldTypeValidation
domainsstring[]DNS name (or host:port); deny-listed against public LLM endpoints (api.openai.com, api.anthropic.com, etc.) and public DNS-over-HTTPS providers.
agent_idsstring[]Mnemom agent IDs (mnm-* format). No wildcards.
ip_rangesstring[] (CIDR)IPv4 or IPv6 CIDR; deny-listed against 0.0.0.0/0, ::/0, and public DNS resolver ranges (8.8.8.0/24, 1.1.1.0/24, 9.9.9.0/24).
Composition:
  • Platform → agent: intersection. The platform list is the compliance ceiling — downstream scopes (org, agent) cannot widen trust beyond what the platform allows. If the platform sets ip_ranges: [10.0.0.0/8], an agent cannot add 192.168.0.0/16 to its own list and have it take effect.
  • Org + agent: union within the ceiling. Either scope can add trust within the platform-imposed ceiling.
  • Empty platform list = unconstrained ceiling. When the platform doesn’t specify a bucket, downstream entries pass through without intersection.
Trusted sources cause Safe House to skip detection for matching content (no detector cycles spent), but every match emits a low-priority sh_trusted_source_skip audit trace so reviewers can see what was waved through. Security note: the validator’s deny-list is non-exhaustive — adding a publicly-routable IP range or a customer-controllable domain is a critical misconfiguration even if it passes the deny-list. Treat trusted_sources as a sharp tool.

§extensions

Free-form Record<string, unknown> for protocol-specific or user-defined additions. Mnemom reserves mnemom.*.
extensions:
  mnemom:
    alert_webhook: https://ops.acme.example/safe-house-alerts
    team_channel: "#safehouse-alerts"
Extensions are agent-scoped and not composed across scopes by default.

§_composition (canonical-only)

Present on the canonical protection card produced by the composer; absent on raw agent-scope cards written by PUT /v1/agents/:id/protection-card.
_composition:
  composed_at: 2026-04-26T18:23:41Z
  scopes_applied: [platform, "org:acme", "agent:mnm-patch-001"]
  exemptions_applied: []
  source_card_id: pc-88ccdd11
  canonical_id: cp-44ee22bb
_composition is read-only on the wire.

YAML safe schema

All yaml.load() calls use { schema: yaml.CORE_SCHEMA } — Node-specific tags are rejected. Plain scalars, maps, and sequences only.

Body-size limits

  • Full protection card payload: 64 KB max (enforced via Content-Length + body-length double-check).
  • thresholds, screen_surfaces, trusted_sources are bounded by the 64 KB envelope.
413 Payload Too Large for oversize bodies.

Versioning

card_version currently:
  • protection/2026-04-26 — current. ADR-037 canonical form. All canonical cards emit this version.
Older protection/2026-04-15 cards stored before ADR-037 were transformed in-place by migration 140; the schema version was rolled forward at the same time.

See also