Skip to main content
How to work with the protection card — the YAML document that tells Safe House how to defend an agent at runtime. Same CRUD shape as the alignment card, different semantics. If you’re coming from the pre-UC sh_configs JSON blob, start with the migration guide; the automatic UC-3 migration already composed your old configs into protection cards.

Viewing the current card

Via CLI

mnemom protection show            # canonical protection card (YAML)
mnemom protection show --raw      # agent-scope raw card, pre-composition

Via API

curl -H "X-Mnemom-Api-Key: $MNEMOM_API_KEY" \
  https://api.mnemom.ai/v1/agents/{agent_id}/protection-card
Or visit the Security tab on the agent detail page in the dashboard — it shows both raw (agent-scope) and canonical (composed) side by side.

Authoring a protection card

Start from the protection card schema. A minimal card:
# protection.card.yaml
card_version: protection/2026-04-26
agent_id: mnm-xxxxxxxx
issued_at: 2026-04-26T00:00:00Z

mode: enforce            # "off" | "observe" | "nudge" | "enforce"

thresholds:
  warn: 0.60             # informational threshold
  quarantine: 0.80       # quarantine in enforce mode
  block: 0.95            # hard block in enforce mode

screen_surfaces:
  incoming: true
  outgoing: true
  tool_calls: true
  tool_responses: true

trusted_sources:
  domains: ["internal.acme.com"]
  agent_ids: []
  ip_ranges: ["10.0.0.0/8"]
Most fields are optional. Omitted fields inherit from the org template (if any), which inherits from the platform default.

Publishing

mnemom protection publish protection.card.yaml
Or via API:
curl -X PUT https://api.mnemom.ai/v1/agents/{agent_id}/protection-card \
  -H "X-Mnemom-Api-Key: $MNEMOM_API_KEY" \
  -H "Content-Type: text/yaml" \
  -H "Idempotency-Key: <uuid>" \
  --data-binary @protection.card.yaml
The publish triggers compose_protection_card(agent_id), which generates the new canonical card within a second.

Validating without publishing

mnemom protection validate protection.card.yaml
Runs the full schema validator + applies an inline composition with current platform/org templates so you can see the canonical output without writing anything.

Understanding composition

Protection-card composition follows the three-scope model (platform > org > agent). The per-field rules:
FieldComposition
modeStrictest wins (enforce > nudge > observe > off). Agent can go stricter, not looser.
thresholds.*Min across scopes — lowest = strictest wins. An agent can tighten further than the platform/org but not loosen.
screen_surfaces.*OR per field — true wins. Any scope can require scanning a surface; agents cannot turn off scanning the org or platform requires.
trusted_sources.{domains,agent_ids,ip_ranges}Platform intersection, org+agent union: platform allowlist is the compliance ceiling; downstream scopes can only add from inside that ceiling.
Publishing an org protection template propagates to all agents in the org via mark_agents_for_recompose — the same mechanism as alignment templates. See Managing Card Composition for the full flow.

Common tuning patterns

Production-grade strictness

For high-stakes agents (financial, health, compliance):
mode: enforce

thresholds:
  warn: 0.50           # tighter than the 0.60 platform default
  quarantine: 0.70
  block: 0.85

screen_surfaces:
  incoming: true
  outgoing: true
  tool_calls: true
  tool_responses: true

Observe-first for a new agent

Before committing to enforcement, run in observe mode to gather a baseline:
mode: observe          # all detectors run, nothing is blocked

thresholds:
  warn: 0.50           # lower = more sensitive (more events logged)
  quarantine: 0.70
  block: 0.90
Review the event stream for 7-14 days. Adjust thresholds based on false-positive rate. Promote to nudge or enforce when stable. nudge is a useful intermediate stage — the model receives an advisory annotation but the request still proceeds, so you can validate the security signal reaches the agent before committing to hard blocks.

Performance-sensitive agent (tight tool-response window)

If an agent’s tool responses contain large payloads and per-request latency matters:
screen_surfaces:
  incoming: true
  outgoing: true
  tool_calls: true
  tool_responses: false             # skip tool-response scanning

mode: enforce
Every detection event logs which surfaces were inspected, so auditors can see what was not scanned. Document the reason in your internal runbook.
If your org requires tool_responses: true, you cannot turn it off at agent scope (strictest wins). You’ll need a section-specific exemption with a documented reason.

Trusted internal backend

If your agent pulls from a known-safe internal API, add the domain to trusted_sources so Safe House skips detector runs on content from that source:
trusted_sources:
  domains:
    - internal-kb.acme.example
    - vendor-api.example.com
Trusted content still emits a low-priority trace entry. If your internal KB ever gets compromised, the trusted-source entry in the trace makes the blast radius auditable. Security reminder: never add a public DNS resolver, a user-controllable domain, or a public LLM API to trusted_sources. The API validates against a static deny-list and rejects obvious mistakes, but the risk model is on you.

Alerting

Protection-card violations emit webhook events if your org has webhooks configured:
POST https://your-webhook.example/safe-house
{
  "event_type": "safe_house.violation",
  "agent_id": "mnm-xxxxxxxx",
  "detector": "injection_score",
  "score": 0.83,
  "threshold": 0.70,
  "surface": "incoming",
  "action_taken": "block",
  "trace_id": "trace-...",
  "timestamp": "2026-04-17T18:23:41Z"
}
See Safe House Webhooks for the full event catalog.

Validating changes before deploy

For CI pipelines that publish card changes, validate the card client-side before the API call:
mnemom protection validate protection.card.yaml --strict
echo $?   # 0 = valid, 1 = validation errors
The --strict flag also checks that the card composes cleanly with the current org template (no conflicts with stricter-wins).

Rolling back a change

There’s no first-class rollback endpoint. To revert:
  1. Fetch a historical version:
    curl -H "X-Mnemom-Api-Key: $MNEMOM_API_KEY" \
      "https://api.mnemom.ai/v1/agents/{agent_id}/protection-card/history?limit=10"
    
  2. Pick the version you want, publish it as a new PUT.
All publishes are amendments — the history is preserved. A “rollback” is just another amendment referencing the prior shape.

See also