OWASP Agentic Top 10 mapping
Safe House enforces eight named threat patterns at the Mnemom gateway. Buyers evaluating agentic security posture against the OWASP Top 10 for Agentic Applications (OWASP Gen AI Security Project, released 2025-12-09; ASI01–ASI10) need a published cross-reference to make the shipped enforcement legible. This page provides that cross-reference. The discipline is the same one applied throughout this documentation: where Safe House covers an OWASP threat class, the coverage mechanism is named. Where coverage is partial or absent, the gap is stated rather than papered over.Source of truth. The ASI identifiers and titles below are pinned to the official OWASP Gen AI Security Project release (announcement, resource page). The full taxonomy: ASI01 Agent Goal Hijack, ASI02 Tool Misuse, ASI03 Identity & Privilege Abuse, ASI04 Agentic Supply Chain Vulnerabilities, ASI05 Unexpected Code Execution, ASI06 Memory & Context Poisoning, ASI07 Insecure Inter-Agent Communication, ASI08 Cascading Failures, ASI09 Human-Agent Trust Exploitation, ASI10 Rogue Agents.
Safe House threat patterns
Safe House detects eight threat categories via its L1 pattern library, L2 LLM analysis layer, and L3 session model. Each pattern has a stablethreat_type identifier used in API responses, webhook events, and threshold configuration:
| Pattern | threat_type | Checkpoint |
|---|---|---|
| Business Email Compromise | bec_fraud | Front door |
| Prompt injection | prompt_injection | Front door |
| Indirect injection | indirect_injection | Front door |
| Social engineering | social_engineering | Front door |
| Agent spoofing | agent_spoofing | Front door |
| Multi-turn hijack | hijack_attempt | Front door (L3 session model) |
| Data exfiltration | data_exfiltration | Back door |
| Privilege escalation | privilege_escalation | Front door + back door |
Pattern-to-OWASP mapping
Each Safe House pattern maps to one or more OWASP ASI entries. A pattern can map to multiple entries because OWASP threat classes are defined by attacker goal; Safe House detects by observable signal.| Safe House pattern | OWASP entry | Coverage level | Enforcement mechanism |
|---|---|---|---|
prompt_injection | ASI01 Agent Goal Hijack | Full — direct variant | L1 regex families (override phrases, role reassignment, jailbreak openers, authority spoofing); L2 compound confidence scoring; L3 session model. Direct injection is the canonical goal-hijack vector |
indirect_injection | ASI01 Agent Goal Hijack | Partial — indirect variant | L1 instruction-delimiter detection; MinHash similarity against known payloads; L2 instruction-structure detection in tool results. Novel payloads with no similarity to known patterns score lower — see limits |
hijack_attempt | ASI01 Agent Goal Hijack | Partial — multi-turn variant | L3 session model: topic coherence tracking, escalating action scope, identity drift, pivot detection after a trust-building sequence. Substantial multi-turn coverage, with a known residual on novel multi-turn and multi-vector sequences (active recall work). Threshold: ≥ 0.7 confidence (routes to human review — see calibration note) |
bec_fraud | ASI09 Human-Agent Trust Exploitation, ASI01 Agent Goal Hijack | Full — conjunction detection | Requires co-occurring: financial action term + authority claim + urgency marker + secrecy instruction. The authority/urgency/secrecy manipulation is trust exploitation; the intent to redirect the agent into an unauthorized action is goal hijack |
social_engineering | ASI09 Human-Agent Trust Exploitation | Full | Authority + urgency signal pair, absent the financial-action component. Regulatory threat framing, role-authority claims, urgency escalation — manipulating the agent’s trust to change its behavior |
agent_spoofing | ASI07 Insecure Inter-Agent Communication, ASI09 Human-Agent Trust Exploitation | Full — inbound signal | Claims of override authority or elevated permissions arriving as runtime messages; fake admin agent identifiers; credential-presentation patterns in message content. Detects unauthenticated identity/authority claims in the inter-agent message stream |
data_exfiltration | ASI02 Tool Misuse | Full — outbound | Back-door checkpoint: external destination patterns vs. declared bounded_actions, bulk-data request language, covert-channel patterns. Exfiltration is misuse of connected tools/connectors to route data out. Also enforced independently by Policy Engine |
privilege_escalation | ASI03 Identity & Privilege Abuse | Full | Runtime permission claims; requests outside declared bounded_actions; attempts to disable safety mechanisms mid-session. Front-door detection + Policy Engine independent check |
OWASP-to-coverage mapping
The reverse view — starting from each OWASP entry and tracing what ships in enforcement. Coverage is asserted only where there is a concrete shipped mechanism to point to; everything else is stated as a gap.| OWASP entry | Safe House pattern | Additional coverage | Status |
|---|---|---|---|
| ASI01 Agent Goal Hijack | prompt_injection, indirect_injection, hijack_attempt | — | Shipped (direct); partial (multi-turn / indirect). Direct injection fully covered. Multi-turn goal redirection is substantially covered by the L3 session model, with a known residual on novel multi-turn and multi-vector sequences (active Safe House recall work). Indirect injection partial — novel payloads with no payload-library similarity may score below block threshold |
| ASI02 Tool Misuse | data_exfiltration | Policy Engine: bounded_actions enforcement, forbidden-rule Managed Rules, tool-capability mappings | Partial — policy layer + outbound screen. Tool execution is constrained by the Policy Engine before it reaches Safe House; Safe House’s back-door checkpoint catches data-exfiltration-via-tool. Mnemom does not intercept every unsafe tool invocation at the gateway — declared-scope enforcement is the primary control |
| ASI03 Identity & Privilege Abuse | privilege_escalation | AAP alignment cards declare the autonomy envelope; CLPI policy engine enforces | Shipped. Runtime privilege claims blocked at the front door. Declared envelope enforced at the policy layer |
| ASI04 Agentic Supply Chain Vulnerabilities | — | AEGIS substrate fingerprinting + L1 cross-tenant aggregator | Covered at the AEGIS layer, not a Safe House pattern. Runtime-behavior deviation consistent with a compromised dependency/substrate is detected cross-tenant. Does not replace build-time package provenance — see supply-chain trust |
| ASI05 Unexpected Code Execution | — | Policy Engine: bounded_actions constrains which tools/actions an agent may invoke | Gap (front-door). Safe House has no pattern that intercepts code-execution payloads. The Policy Engine limits the action surface (an agent can only invoke declared tools), which reduces blast radius, but Mnemom does not sandbox or screen executed code. Pair with an application-layer execution sandbox |
| ASI06 Memory & Context Poisoning | — | — | Gap. Persistent memory/context store attacks are not in the shipped Safe House pattern library. AIP thinking-block analysis and AAP drift detection give partial observability of downstream effects, but not upstream interception — see limits |
| ASI07 Insecure Inter-Agent Communication | agent_spoofing | — | Partial. Safe House treats unauthenticated authority/identity claims arriving as inbound runtime messages as suspicious by design. Legitimate agent-to-agent authority must be encoded in alignment cards at configuration time. This screens the content of inter-agent messages; it is not a transport-authentication scheme |
| ASI08 Cascading Failures | — | AAP drift detection + CLPI lifecycle governance (observability of degraded agent behavior) | Gap. No shipped Safe House pattern targets multi-agent cascading failure. AAP/CLPI provide some observability of an individual agent drifting, but Mnemom does not model or circuit-break failure propagation across an agent fleet. This is an application-architecture concern (timeouts, bulkheads, circuit breakers) |
| ASI09 Human-Agent Trust Exploitation | bec_fraud, social_engineering, agent_spoofing | — | Shipped. Authority/urgency/secrecy manipulation and impersonated-authority claims from inbound messages are treated as suspicious by design |
| ASI10 Rogue Agents | — | AAP alignment cards declare the autonomy envelope; CLPI lifecycle governance; AEGIS reputation | Covered at the governance layer, not a Safe House pattern. A “rogue” agent operating outside its declared envelope is constrained by Policy Engine bounded_actions enforcement and surfaced by CLPI lifecycle + reputation signals. Safe House screens inbound/outbound message content; it does not by itself decommission a misbehaving agent |
Gaps and limits
ASI06 — Memory & Context Poisoning
No shipped Safe House pattern targets memory/context store attacks. An attacker who can write to an agent’s persistent memory — conversation history, vector store, retrieved documents — can influence future turns without a detectable inbound signal. The AIP thinking-block analysis layer provides partial mitigation: if the poisoned memory causes the agent to reason in ways inconsistent with its alignment card, AIP may flag it before the action lands. That is detection of downstream effect, not upstream interception. Recommended defense-in-depth: treat memory stores as an untrusted input boundary, apply the same L1/L2 screening to memory-retrieved content as to external tool results, and enable AIP to catch the reasoning anomalies that poisoned memory produces.ASI05 — Unexpected Code Execution
Safe House does not intercept code-execution payloads at the gateway. The Policy Engine’sbounded_actions enforcement limits which tools an agent may invoke (reducing the surface that can reach an executor), but Mnemom neither sandboxes nor statically screens executed code. For agents that can run code, pair Mnemom with an application-layer execution sandbox and treat the executor as an untrusted boundary.
ASI08 — Cascading Failures
Mnemom screens per-agent message content and surfaces individual-agent drift (AAP/CLPI), but it does not model failure propagation across a multi-agent fleet. Cascading-failure resilience is an application-architecture responsibility: apply per-agent timeouts, bulkheads, and circuit breakers between agents so one degraded agent cannot fan out.ASI01 — indirect injection, novel payloads
Indirect injection via tool results is partially covered. MinHash similarity matching compares tool results against a library of known injection payloads. Sufficiently novel payloads that bear no structural or semantic similarity to known patterns will score below L1 thresholds. L2 analysis provides a second pass, but detection accuracy is bounded by the analysis model’s capability. The arena flywheel closes this gap over time: canary credentials and cross-agent campaign detection surface novel attack infrastructure, and confirmed patterns promote to active detection. For high-sensitivity environments, consider adding application-layer validation of tool results before they re-enter the agent context.ASI01 — multi-turn hijack, human escalation at default threshold
Thehijack_attempt pattern routes to human review (not autonomous block) at the default 0.7 confidence threshold. This is intentional — legitimate multi-topic conversations produce similar L3 signals. If your use case can tolerate more aggressive autonomous blocking, lower the hijack_attempt threshold in the protection card.
Pairing Safe House with application-layer controls
OWASP guidance recommends pairing runtime substrate controls with application-layer controls. For the gaps above:- ASI06 (Memory & Context Poisoning): Apply Safe House’s inbound screening to memory-retrieved content as well as direct inbound messages. This is not the default — configure the SDK to route memory fetches through the Safe House evaluation layer.
- ASI05 (Unexpected Code Execution): Run agent-executed code in an application-layer sandbox; scope
bounded_actionsso the agent can only reach the executor when its function genuinely requires it. - ASI04 (Agentic Supply Chain): AEGIS covers the runtime-behavior dimension. Pair with SLSA/Sigstore package provenance for the build-time dimension. See supply-chain trust.
- ASI08 (Cascading Failures): Add per-agent timeouts, bulkheads, and circuit breakers at the application/orchestration layer so a degraded agent cannot cascade across the fleet.
- ASI10 (Rogue Agents): Card design is the primary defense. Scope
bounded_actionsas narrowly as the agent’s function permits; CLPI enforces declared scope and surfaces lifecycle/reputation signals for an agent operating outside its envelope.
See also
- Safe House threat model — Detection mechanisms, confidence scoring, and known limits for each pattern
- Protection card — Per-agent threshold configuration for each threat type
- AEGIS — Cross-tenant defensive network and the ASI04 supply-chain detection layer
- Substrate fingerprint — ASI04 runtime-behavior detection in detail
- Supply-chain trust — Composing AEGIS runtime detection with package-layer provenance
- AIP specification — Thinking-block integrity analysis that covers reasoning anomalies downstream of ASI06
- OWASP Top 10 for Agentic Applications — The authoritative source taxonomy this page maps against
- Whitepaper — AEGIS-level OWASP mapping table alongside the EU AI Act mapping