Structure
A protection card has four top-level sections. The complete schema is at /specifications/protection-card-schema; this page describes each section’s purpose.mode
Top-level action policy for Safe House on this agent. Mirrors the alignment card’s integrity.enforcement_mode enum plus an off value for explicit opt-out.
off— detection skipped entirely (cost / latency / non-applicability cases). No telemetry.observe— all detectors run, signals are logged asynchronously to the trace + reputation pipeline, no action is taken on the agent’s request.nudge— detectors run synchronously; matches attach an advisory annotation to the agent’s prompt context (and anX-Safe-House-Advisoryresponse header) but the request proceeds. The model sees the advisory as part of its context.enforce— detectors run synchronously; matches block the request (quarantine ≥ quarantine threshold, hard block ≥ block threshold).
enforce is implicitly synchronous — to block a request, the gateway must wait for the verdict. There is no separate enforce_sync mode.
Composition: strictest wins across enforce > nudge > observe > off. An agent cannot drop below the platform/org floor.
thresholds
Three-band escalation ladder for Safe House detector scores. All values are floats in [0, 1].
warn ≤ quarantine ≤ block.
screen_surfaces
Which request surfaces Safe House inspects:
incoming— the user/principal prompt entering the agentoutgoing— the agent’s response leaving the agenttool_calls— tool-use invocations the agent makestool_responses— responses to those tool calls
trusted_sources
Per-bucket allowlist of upstream sources whose content Safe House skips detection for. Buckets are typed so the validator can apply per-bucket deny-lists and the composer can apply per-bucket intersection rules.
api.openai.com, etc.), public DNS resolvers (8.8.8.0/24, 1.1.1.0/24), and any-host CIDRs (0.0.0.0/0, ::/0). Adding a publicly-routable IP range or a customer-controllable domain is a critical misconfiguration even if it passes the deny-list.
Trusted sources cause Safe House to skip detection (no detector cycles spent), and every match emits an sh_trusted_source_skip audit trace so reviewers can see what was waved through.
Composition across scopes
Like the alignment card, the protection card composes from platform > org > agent:| Section | Composition rule |
|---|---|
mode | Strictest wins (enforce > nudge > observe > off). An agent cannot drop below the platform/org floor. |
thresholds.* | Min across scopes — lowest = strictest wins. An agent can tighten further than the platform/org but not loosen. |
screen_surfaces.* | OR per field — true wins. If any scope requires scanning a surface, it’s scanned. |
trusted_sources | Platform → agent intersection (compliance ceiling); org + agent union within that ceiling. An agent cannot widen trust beyond what the platform allows. |
Exemptions
A protection-card exemption waives a specific threshold or surface for a specific agent with a stated reason, expiry, and audit trail.org_card_exempt flag — which waived the whole card at once — with section-specific, audit-logged, time-bounded grants.
How the protection card is used
Gateway (request path)
Every request hits:canonical_protection_cardsread (KV-cached, 5-min TTL) to get the composed protection card.- Detectors inspect enabled
screen_surfaceswith the composedthresholds. modedetermines the action: observe (log only), nudge (advisory), enforce (block).trusted_sourcesshort-circuits detectors for allowlisted upstreams (with trace entry).
Observer (trace analysis)
The observer pipeline does not read the protection card. Protection is inline at the gateway; the observer’s role is to reconcile and enrich traces after the fact. Detector signals produced at the gateway are written into the trace itself.Website + CLI
Modes across both cards
Both the alignment card (viaintegrity.enforcement_mode) and the protection card (via mode) share the observe / nudge / enforce vocabulary; the protection card also adds off for explicit opt-out:
| Value | Alignment (integrity.enforcement_mode) | Protection (mode) |
|---|---|---|
off | (n/a — alignment integrity always runs) | Detection skipped entirely |
observe | Integrity checkpoints run; violations logged | Safe House detectors run; signals logged |
nudge | Violations nudge the agent (soft) | Detectors attach advisory; request proceeds |
enforce | Violations hard-block | Detectors may block (quarantine / block) |
Migration from legacy Safe House config
Pre-UC, Safe House read a JSONsh_configs row keyed by agent_id. The UC-3 migration script composed every existing sh_configs row into a protection-card YAML and wrote the canonical version.
- Legacy endpoint
GET/PUT /v1/agents/:id/cfd/configis removed. UseGET/PUT /v1/agents/:id/protection-card. - The internal
SafeHouseConfigTypeScript type that Safe House detectors consume is stable (per ADR-008 Edge 4); the UC work swapped the fetch source, not the type. If you’re embedding Safe House detectors, your integration did not need to change.
See also
- Agent Cards — the two-card model (alignment + protection)
- Alignment Card (protocol surface) — AAP 1.0 alignment card
- Protection Card Schema — normative YAML schema
- Card Composition — three-scope composition rules
- Safe House — the runtime detection pipeline this card configures
- Safe House Threat Model — what Safe House defends against