Skip to main content

OWASP Agentic Top 10 mapping

Safe House enforces eight named threat patterns at the Mnemom gateway. Buyers evaluating agentic security posture against the OWASP Top 10 for Agentic Applications (OWASP Gen AI Security Project, released 2025-12-09; ASI01–ASI10) need a published cross-reference to make the shipped enforcement legible. This page provides that cross-reference. The discipline is the same one applied throughout this documentation: where Safe House covers an OWASP threat class, the coverage mechanism is named. Where coverage is partial or absent, the gap is stated rather than papered over.
Source of truth. The ASI identifiers and titles below are pinned to the official OWASP Gen AI Security Project release (announcement, resource page). The full taxonomy: ASI01 Agent Goal Hijack, ASI02 Tool Misuse, ASI03 Identity & Privilege Abuse, ASI04 Agentic Supply Chain Vulnerabilities, ASI05 Unexpected Code Execution, ASI06 Memory & Context Poisoning, ASI07 Insecure Inter-Agent Communication, ASI08 Cascading Failures, ASI09 Human-Agent Trust Exploitation, ASI10 Rogue Agents.

Safe House threat patterns

Safe House detects eight threat categories via its L1 pattern library, L2 LLM analysis layer, and L3 session model. Each pattern has a stable threat_type identifier used in API responses, webhook events, and threshold configuration:
Patternthreat_typeCheckpoint
Business Email Compromisebec_fraudFront door
Prompt injectionprompt_injectionFront door
Indirect injectionindirect_injectionFront door
Social engineeringsocial_engineeringFront door
Agent spoofingagent_spoofingFront door
Multi-turn hijackhijack_attemptFront door (L3 session model)
Data exfiltrationdata_exfiltrationBack door
Privilege escalationprivilege_escalationFront door + back door

Pattern-to-OWASP mapping

Each Safe House pattern maps to one or more OWASP ASI entries. A pattern can map to multiple entries because OWASP threat classes are defined by attacker goal; Safe House detects by observable signal.
Safe House patternOWASP entryCoverage levelEnforcement mechanism
prompt_injectionASI01 Agent Goal HijackFull — direct variantL1 regex families (override phrases, role reassignment, jailbreak openers, authority spoofing); L2 compound confidence scoring; L3 session model. Direct injection is the canonical goal-hijack vector
indirect_injectionASI01 Agent Goal HijackPartial — indirect variantL1 instruction-delimiter detection; MinHash similarity against known payloads; L2 instruction-structure detection in tool results. Novel payloads with no similarity to known patterns score lower — see limits
hijack_attemptASI01 Agent Goal HijackPartial — multi-turn variantL3 session model: topic coherence tracking, escalating action scope, identity drift, pivot detection after a trust-building sequence. Substantial multi-turn coverage, with a known residual on novel multi-turn and multi-vector sequences (active recall work). Threshold: ≥ 0.7 confidence (routes to human review — see calibration note)
bec_fraudASI09 Human-Agent Trust Exploitation, ASI01 Agent Goal HijackFull — conjunction detectionRequires co-occurring: financial action term + authority claim + urgency marker + secrecy instruction. The authority/urgency/secrecy manipulation is trust exploitation; the intent to redirect the agent into an unauthorized action is goal hijack
social_engineeringASI09 Human-Agent Trust ExploitationFullAuthority + urgency signal pair, absent the financial-action component. Regulatory threat framing, role-authority claims, urgency escalation — manipulating the agent’s trust to change its behavior
agent_spoofingASI07 Insecure Inter-Agent Communication, ASI09 Human-Agent Trust ExploitationFull — inbound signalClaims of override authority or elevated permissions arriving as runtime messages; fake admin agent identifiers; credential-presentation patterns in message content. Detects unauthenticated identity/authority claims in the inter-agent message stream
data_exfiltrationASI02 Tool MisuseFull — outboundBack-door checkpoint: external destination patterns vs. declared bounded_actions, bulk-data request language, covert-channel patterns. Exfiltration is misuse of connected tools/connectors to route data out. Also enforced independently by Policy Engine
privilege_escalationASI03 Identity & Privilege AbuseFullRuntime permission claims; requests outside declared bounded_actions; attempts to disable safety mechanisms mid-session. Front-door detection + Policy Engine independent check

OWASP-to-coverage mapping

The reverse view — starting from each OWASP entry and tracing what ships in enforcement. Coverage is asserted only where there is a concrete shipped mechanism to point to; everything else is stated as a gap.
OWASP entrySafe House patternAdditional coverageStatus
ASI01 Agent Goal Hijackprompt_injection, indirect_injection, hijack_attemptShipped (direct); partial (multi-turn / indirect). Direct injection fully covered. Multi-turn goal redirection is substantially covered by the L3 session model, with a known residual on novel multi-turn and multi-vector sequences (active Safe House recall work). Indirect injection partial — novel payloads with no payload-library similarity may score below block threshold
ASI02 Tool Misusedata_exfiltrationPolicy Engine: bounded_actions enforcement, forbidden-rule Managed Rules, tool-capability mappingsPartial — policy layer + outbound screen. Tool execution is constrained by the Policy Engine before it reaches Safe House; Safe House’s back-door checkpoint catches data-exfiltration-via-tool. Mnemom does not intercept every unsafe tool invocation at the gateway — declared-scope enforcement is the primary control
ASI03 Identity & Privilege Abuseprivilege_escalationAAP alignment cards declare the autonomy envelope; CLPI policy engine enforcesShipped. Runtime privilege claims blocked at the front door. Declared envelope enforced at the policy layer
ASI04 Agentic Supply Chain VulnerabilitiesAEGIS substrate fingerprinting + L1 cross-tenant aggregatorCovered at the AEGIS layer, not a Safe House pattern. Runtime-behavior deviation consistent with a compromised dependency/substrate is detected cross-tenant. Does not replace build-time package provenance — see supply-chain trust
ASI05 Unexpected Code ExecutionPolicy Engine: bounded_actions constrains which tools/actions an agent may invokeGap (front-door). Safe House has no pattern that intercepts code-execution payloads. The Policy Engine limits the action surface (an agent can only invoke declared tools), which reduces blast radius, but Mnemom does not sandbox or screen executed code. Pair with an application-layer execution sandbox
ASI06 Memory & Context PoisoningGap. Persistent memory/context store attacks are not in the shipped Safe House pattern library. AIP thinking-block analysis and AAP drift detection give partial observability of downstream effects, but not upstream interception — see limits
ASI07 Insecure Inter-Agent Communicationagent_spoofingPartial. Safe House treats unauthenticated authority/identity claims arriving as inbound runtime messages as suspicious by design. Legitimate agent-to-agent authority must be encoded in alignment cards at configuration time. This screens the content of inter-agent messages; it is not a transport-authentication scheme
ASI08 Cascading FailuresAAP drift detection + CLPI lifecycle governance (observability of degraded agent behavior)Gap. No shipped Safe House pattern targets multi-agent cascading failure. AAP/CLPI provide some observability of an individual agent drifting, but Mnemom does not model or circuit-break failure propagation across an agent fleet. This is an application-architecture concern (timeouts, bulkheads, circuit breakers)
ASI09 Human-Agent Trust Exploitationbec_fraud, social_engineering, agent_spoofingShipped. Authority/urgency/secrecy manipulation and impersonated-authority claims from inbound messages are treated as suspicious by design
ASI10 Rogue AgentsAAP alignment cards declare the autonomy envelope; CLPI lifecycle governance; AEGIS reputationCovered at the governance layer, not a Safe House pattern. A “rogue” agent operating outside its declared envelope is constrained by Policy Engine bounded_actions enforcement and surfaced by CLPI lifecycle + reputation signals. Safe House screens inbound/outbound message content; it does not by itself decommission a misbehaving agent

Gaps and limits

ASI06 — Memory & Context Poisoning

No shipped Safe House pattern targets memory/context store attacks. An attacker who can write to an agent’s persistent memory — conversation history, vector store, retrieved documents — can influence future turns without a detectable inbound signal. The AIP thinking-block analysis layer provides partial mitigation: if the poisoned memory causes the agent to reason in ways inconsistent with its alignment card, AIP may flag it before the action lands. That is detection of downstream effect, not upstream interception. Recommended defense-in-depth: treat memory stores as an untrusted input boundary, apply the same L1/L2 screening to memory-retrieved content as to external tool results, and enable AIP to catch the reasoning anomalies that poisoned memory produces.

ASI05 — Unexpected Code Execution

Safe House does not intercept code-execution payloads at the gateway. The Policy Engine’s bounded_actions enforcement limits which tools an agent may invoke (reducing the surface that can reach an executor), but Mnemom neither sandboxes nor statically screens executed code. For agents that can run code, pair Mnemom with an application-layer execution sandbox and treat the executor as an untrusted boundary.

ASI08 — Cascading Failures

Mnemom screens per-agent message content and surfaces individual-agent drift (AAP/CLPI), but it does not model failure propagation across a multi-agent fleet. Cascading-failure resilience is an application-architecture responsibility: apply per-agent timeouts, bulkheads, and circuit breakers between agents so one degraded agent cannot fan out.

ASI01 — indirect injection, novel payloads

Indirect injection via tool results is partially covered. MinHash similarity matching compares tool results against a library of known injection payloads. Sufficiently novel payloads that bear no structural or semantic similarity to known patterns will score below L1 thresholds. L2 analysis provides a second pass, but detection accuracy is bounded by the analysis model’s capability. The arena flywheel closes this gap over time: canary credentials and cross-agent campaign detection surface novel attack infrastructure, and confirmed patterns promote to active detection. For high-sensitivity environments, consider adding application-layer validation of tool results before they re-enter the agent context.

ASI01 — multi-turn hijack, human escalation at default threshold

The hijack_attempt pattern routes to human review (not autonomous block) at the default 0.7 confidence threshold. This is intentional — legitimate multi-topic conversations produce similar L3 signals. If your use case can tolerate more aggressive autonomous blocking, lower the hijack_attempt threshold in the protection card.

Pairing Safe House with application-layer controls

OWASP guidance recommends pairing runtime substrate controls with application-layer controls. For the gaps above:
  • ASI06 (Memory & Context Poisoning): Apply Safe House’s inbound screening to memory-retrieved content as well as direct inbound messages. This is not the default — configure the SDK to route memory fetches through the Safe House evaluation layer.
  • ASI05 (Unexpected Code Execution): Run agent-executed code in an application-layer sandbox; scope bounded_actions so the agent can only reach the executor when its function genuinely requires it.
  • ASI04 (Agentic Supply Chain): AEGIS covers the runtime-behavior dimension. Pair with SLSA/Sigstore package provenance for the build-time dimension. See supply-chain trust.
  • ASI08 (Cascading Failures): Add per-agent timeouts, bulkheads, and circuit breakers at the application/orchestration layer so a degraded agent cannot cascade across the fleet.
  • ASI10 (Rogue Agents): Card design is the primary defense. Scope bounded_actions as narrowly as the agent’s function permits; CLPI enforces declared scope and surfaces lifecycle/reputation signals for an agent operating outside its envelope.

See also