The Mnemom Gateway supports three enforcement modes that control how the system responds when alignment or integrity violations are detected. You can choose the level of intervention appropriate for your use case — from passive observation to active blocking.
In observe mode, the gateway detects and records all violations but takes no action to modify agent behavior. This is the default mode for all new agents.Behavior:
All API calls pass through unchanged
Violations are detected and recorded in the trace database
Integrity checkpoints are created for every interaction
Drift alerts are generated when behavioral patterns shift
No modification to agent requests or responses
When to use:
During initial deployment to establish behavioral baselines
When you want to monitor without affecting agent behavior
For compliance auditing where you need a record but not intervention
When evaluating whether to enable more active enforcement
In nudge mode, when a violation is detected, the gateway creates a pending nudge record. On the agent’s next request, the gateway injects an integrity notice into the system prompt. The agent sees the notice, can review its approach, and self-correct.Behavior:
Violations are detected and recorded
A pending nudge record is created for each violation
On the next API request, an integrity notice is injected into the system prompt
The agent receives the nudge and can self-correct
Nudge delivery is tracked in the conscience timeline
The original request passes through (not blocked)
How nudges work:
The gateway detects a violation in real-time via streaming response analysis (tee() + background AIP check, ~3s latency)
A pending nudge record is created immediately — ready before the agent’s next turn
On the agent’s next API call, the gateway injects a system prompt addition:
[INTEGRITY NOTICE] Your previous action was flagged for:UNBOUNDED_ACTION (HIGH) — action "delete_all_records" is notin your declared bounded_actions. Please review your approach.
The agent processes the notice alongside its normal instructions
The nudge is marked as delivered in the conscience timeline
When to use:
When you want the agent to self-correct without hard blocking
For agents in production where availability is critical
When building trust in the enforcement system before enabling hard blocks
For advisory or delegated_authority principal relationships
In enforce mode, the gateway actively blocks requests when critical violations are detected. The behavior depends on whether the request is streaming or non-streaming.Behavior:
Non-streaming requests: Hard block with HTTP 403 response. The request does not reach the AI provider.
Streaming requests: Falls back to nudge mode (injects integrity notice into system prompt). This is because streaming connections cannot be cleanly terminated mid-stream.
When to use:
For high-security environments where violations must be prevented
When operating under fail_closed AIP failure policy
For agents handling sensitive data or financial transactions
When regulatory requirements demand active intervention
Enforce mode will block non-streaming API calls when violations are detected. Ensure your application handles 403 responses gracefully before enabling this mode.
Enforce mode does not block every violation immediately. By default, the policy engine grants a 24-hour grace period to any tool the agent has not been seen using before — during that window, what would otherwise be a fail verdict is silently downgraded to a warn, the gateway forwards the request, and the model gets to call the tool. This is intentional ergonomics for production rollouts (operators have a day to amend the alignment card after adding a new MCP server) but it has a real security trade-off you must understand.
Brand-new tools — including ones an attacker introduces — get a 24h pass on enforce by default.If an attacker injects an unmapped tool through prompt injection, an upstream MCP server compromise, or a tool-name overlap, the gateway will let the call through for the first 24 hours after it’s first observed for that agent. Mature agents with stable tool inventories aren’t exposed; agents that add tools dynamically, run untrusted MCP servers, or accept tool definitions from user input absolutely are.If your threat model includes adversarial tool introduction, set grace_period_hours: 0 on the agent’s alignment card to disable the grace path entirely.
The setting lives in enforcement.grace_period_hours on the alignment card and accepts any non-negative number. 0 disables grace and makes enforce strict from the moment a card is published.
# Strict enforce — every unmapped/forbidden tool blocks immediately,# from the first time it's seen.enforcement: grace_period_hours: 0
# Default enforce — 24h grace per new tool. Operationally forgiving;# weakest under adversarial tool introduction.enforcement: grace_period_hours: 24
# Long grace — deploy-on-weekly-cycles teams that need extra time# to update cards. Wider exposure window; only appropriate for# trusted environments.enforcement: grace_period_hours: 168 # 7 days
Use case
Recommended grace_period_hours
Adversarial threat model (untrusted user input, untrusted MCP servers, agents accepting tool defs from users)
0
Test harness / CI matrix
0
Production with stable tool inventory + mature card
0 (no operational benefit when the tool list is stable)
Production with frequent tool additions, trusted operators
24 (default)
Slow-cadence operational teams
48–168
The grace decision is per-(agent, tool) and is tracked in the platform’s tool_first_seen table — first observation timestamps the row, subsequent calls within the window pass, calls after the window enforce. There is no API to back-date a tool_first_seen row, so disabling the grace period is the only way to make enforce strict from the moment a card is published. See Policy Engine § Grace period for the full mechanism.
Use threshold mode to avoid alert fatigue. The agent only receives a nudge after repeated violations in the same session, giving it a chance to self-correct naturally first.
% of non-violation checkpoints that get full attestation
proof_boundary_cap
integer
100
Max % of boundary violations to prove (when card-gap detection is unavailable). Boundary violations with detected card gaps are automatically deferred.
nudge_strategy
string
always
When to create nudges (always, sampling, threshold, off)
Configuration:
# Disable AIP for an agent (AAP traces still flow)curl -X PUT https://api.mnemom.ai/v1/agents/:id/settings \ -H "Content-Type: application/json" \ -d '{"aip_enabled": false}'# Set proof sampling to 50%curl -X PUT https://api.mnemom.ai/v1/agents/:id/settings \ -H "Content-Type: application/json" \ -d '{"proof_rate": 50}'
These settings are also available in the Agent Settings panel on the web dashboard for claimed agents.
Policy enforcement operates as a parallel system alongside alignment enforcement. While alignment enforcement checks agent behavior against card values, policy enforcement checks tool usage against governance rules.
Violation Type
Severity
Enforcement Behavior
POLICY_VIOLATION
HIGH
Blocked when policy enforcement mode is enforce; logged when warn
UNMAPPED_TOOL
MEDIUM
Logged as warning; behavior depends on defaults.unmapped_tool_action
CAPABILITY_MISMATCH
HIGH
Blocked when policy enforcement mode is enforce; logged when warn
Policy enforcement is controlled independently via the enforcement_mode field in the Policy DSL:
The X-Policy-Verdict response header is always present when a policy is active:
Header Value
Meaning
pass
All tools mapped and permitted
warn
Violations detected but not blocking
fail
Violations detected and request blocked (enforce mode only)
Alignment enforcement (observe/nudge/enforce) and policy enforcement (off/warn/enforce) can be configured independently. For example, you might use nudge for alignment violations while using enforce for policy violations, or vice versa.
In enforce mode, only CRITICAL and HIGH severity violations trigger hard blocks on non-streaming requests. MEDIUM severity violations are always handled via nudge, even in enforce mode. This applies to both alignment and policy violations.
Agent containment is a separate enforcement layer that operates above the per-request enforcement modes. While enforcement modes (observe/nudge/enforce) control individual request handling, containment controls whether the agent can make requests at all.
Paused agents can be resumed by an org owner or admin. Killed agents require explicit reactivation by an owner only. The distinction matters for audit: pause means “we need to investigate,” kill means “this agent is compromised.”