Enforcement modes

The Mnemom Gateway supports three enforcement modes that control how the system responds when alignment or integrity violations are detected. You can choose the level of intervention appropriate for your use case — from passive observation to active blocking.

Modes overview

Observe

Detect violations, record them, take no action. This is the default mode. Ideal for initial deployment and monitoring.

Nudge

Detect violations and inject feedback into the agent’s next request via system prompt. The agent sees it and can self-correct.

Enforce

Gates same-turn on both streaming and non-streaming requests (HTTP 403 for non-streaming); on streaming this adds latency because the response is evaluated before it is delivered.

Mode details

Observe
Nudge
Enforce

Observe mode (default)

In observe mode, the gateway detects and records all violations but takes no action to modify agent behavior. This is the default mode for all new agents.Behavior:

All API calls pass through unchanged
Violations are detected and recorded in the trace database
Integrity checkpoints are created for every interaction
Drift alerts are generated when behavioral patterns shift
No modification to agent requests or responses

When to use:

During initial deployment to establish behavioral baselines
When you want to monitor without affecting agent behavior
For compliance auditing where you need a record but not intervention
When evaluating whether to enable more active enforcement

Configuration: set integrity_mode: observe (and/or autonomy_mode: observe) on the agent’s alignment card. See Setting enforcement mode below for the three paths (dashboard / CLI / API).

Nudge mode

In nudge mode, when a violation is detected, the gateway creates a pending nudge record. On the agent’s next request, the gateway injects an integrity notice into the system prompt. The agent sees the notice, can review its approach, and self-correct.Behavior:

Violations are detected and recorded
A pending nudge record is created for each violation
On the next API request, an integrity notice is injected into the system prompt
The agent receives the nudge and can self-correct
Nudge delivery is tracked in the conscience timeline
The original request passes through (not blocked)

How nudges work:

The gateway detects a violation via background streaming analysis (~3s latency)
A pending nudge record is created immediately — ready before the agent’s next turn

On the agent’s next API call, the gateway injects a system prompt addition:

[INTEGRITY NOTICE] Your previous action was flagged for:
UNBOUNDED_ACTION (HIGH) — action "delete_all_records" is not
in your declared bounded_actions. Please review your approach.

The agent processes the notice alongside its normal instructions
The nudge is marked as delivered in the conscience timeline

When to use:

When you want the agent to self-correct without hard blocking
For agents in production where availability is critical
When building trust in the enforcement system before enabling hard blocks
For advisory or delegated_authority principal relationships

Configuration: set integrity_mode: nudge (and/or autonomy_mode: nudge) on the agent’s alignment card. See Setting enforcement mode below.

Enforce mode

In enforce mode, the gateway actively blocks when critical violations are detected. It gates the same turn on both streaming and non-streaming requests — the guarantee is identical; only the latency differs.Behavior:

Non-streaming requests: Hard block with HTTP 403 response. The violating response is not delivered.
Streaming requests: The response is evaluated before it is delivered, so a violating response is gated the same turn. This adds latency on streaming, but preserves the same-turn guarantee — enforce does not fall back to advisory nudging.

When to use:

For high-security environments where violations must be prevented
When operating under fail_closed AIP failure policy
For agents handling sensitive data or financial transactions
When regulatory requirements demand active intervention

Configuration: set integrity_mode: enforce (and/or autonomy_mode: enforce) on the agent’s alignment card. See Setting enforcement mode below.

Enforce mode gates both streaming and non-streaming API calls when violations are detected — HTTP 403 for non-streaming, and same-turn gating before delivery for streaming (which adds latency). Ensure your application handles 403 responses gracefully before enabling this mode.

Grace period (read this before enabling enforce)

Enforce mode does not block every violation immediately. By default, the policy engine grants a 24-hour grace period to any tool the agent has not been seen using before — during that window, what would otherwise be a fail verdict is silently downgraded to a warn, the gateway forwards the request, and the model gets to call the tool. This is intentional ergonomics for production rollouts (operators have a day to amend the alignment card after adding a new MCP server) but it has a real security trade-off you must understand.

Brand-new tools — including ones an attacker introduces — get a 24h pass on enforce by default.If an attacker injects an unmapped tool through prompt injection, an upstream MCP server compromise, or a tool-name overlap, the gateway will let the call through for the first 24 hours after it’s first observed for that agent. Mature agents with stable tool inventories aren’t exposed; agents that add tools dynamically, run untrusted MCP servers, or accept tool definitions from user input absolutely are.If your threat model includes adversarial tool introduction, set grace_period_hours: 0 on the agent’s alignment card to disable the grace path entirely.

The setting lives in enforcement.grace_period_hours on the alignment card and accepts any non-negative number. 0 disables grace and makes enforce strict from the moment a card is published.

# Strict enforce — every unmapped/forbidden tool blocks immediately,
# from the first time it's seen.
enforcement:
  grace_period_hours: 0

# Default enforce — 24h grace per new tool. Operationally forgiving;
# weakest under adversarial tool introduction.
enforcement:
  grace_period_hours: 24

# Long grace — deploy-on-weekly-cycles teams that need extra time
# to update cards. Wider exposure window; only appropriate for
# trusted environments.
enforcement:
  grace_period_hours: 168   # 7 days

Use case	Recommended `grace_period_hours`
Adversarial threat model (untrusted user input, untrusted MCP servers, agents accepting tool defs from users)	`0`
Test harness / CI matrix	`0`
Production with stable tool inventory + mature card	`0` (no operational benefit when the tool list is stable)
Production with frequent tool additions, trusted operators	`24` (default)
Slow-cadence operational teams	`48`–`168`

The grace decision is per-(agent, tool) and is tracked in the platform’s tool_first_seen table — first observation timestamps the row, subsequent calls within the window pass, calls after the window enforce. There is no API to back-date a tool_first_seen row, so disabling the grace period is the only way to make enforce strict from the moment a card is published. See Policy Engine § Grace period for the full mechanism.

Setting enforcement mode

Enforcement mode is a top-level field on the alignment card — autonomy_mode for the action-policing pipeline (CLPI tool-use policy + bounded actions) and integrity_mode for the values/conscience pipeline (AIP + drift). The legacy PUT /v1/agents/{id}/enforcement endpoint was retired 2026-05-14. Three paths, pick the one that fits your workflow: Dashboard. Open https://mnemom.ai/dashboard/agents/{your-agent-id}/card, toggle autonomy_mode and/or integrity_mode, save. Easiest path for one-off changes. CLI.

mnemom card edit    # opens current alignment-card YAML in $EDITOR
# change autonomy_mode and/or integrity_mode (off | observe | nudge | enforce)
# save and exit; CLI publishes + triggers recompose

Programmatic. PUT /v1/alignment/agent/{agent_id} with the full canonical card. See Card Management for the read-modify-write flow and the alignment-card schema for the field requirements.

Nudge strategy

When enforcement mode is nudge or enforce, you can further control when nudges are created using the nudge strategy setting:

Strategy	Behavior
`always`	Every boundary violation creates a nudge (default)
`sampling`	Nudge on a percentage of violations (uses `proof_rate` or dedicated `nudge_rate`)
`threshold`	Only nudge after N violations in the current session
`off`	No nudging — violations are recorded but no correction is injected

Configuration:

curl -X PUT https://api.mnemom.ai/v1/agents/:id/settings \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"nudge_strategy": "always"}'

Use threshold mode to avoid alert fatigue. The agent only receives a nudge after repeated violations in the same session, giving it a chance to self-correct naturally first.

Per-agent feature toggles

Team operations require the team_reputation feature flag, available on Team and Enterprise plans. See Pricing for plan details.

Each agent has independent controls for the transparency and integrity pipeline:

Setting	Type	Default	Description
`aap_enabled`	boolean	`true`	Enable AAP action traces
`aip_enabled`	boolean	`true`	Enable AIP integrity analysis
`proof_enabled`	boolean	`true`	Enable cryptographic attestation (Ed25519 + Merkle)
`proof_rate`	integer	`100`	% of non-violation checkpoints that get full attestation
`proof_boundary_cap`	integer	`100`	Max % of boundary violations to prove (when card-gap detection is unavailable). Boundary violations with detected card gaps are automatically deferred.
`nudge_strategy`	string	`always`	When to create nudges (`always`, `sampling`, `threshold`, `off`)

Configuration:

# Disable AIP for an agent (AAP traces still flow)
curl -X PUT https://api.mnemom.ai/v1/agents/:id/settings \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"aip_enabled": false}'

# Set proof sampling to 50%
curl -X PUT https://api.mnemom.ai/v1/agents/:id/settings \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"proof_rate": 50}'

These settings are also available in the Agent Settings panel on the web dashboard for claimed agents.

Violation types and enforcement

Enforcement applies to all violation types detected by the AAP verification engine, AIP integrity checks, and the policy engine:

Alignment violations

Violation Type	Severity	Enforcement Behavior
`FORBIDDEN_ACTION`	CRITICAL	Blocked in enforce mode; nudged in nudge mode
`CARD_MISMATCH`	CRITICAL	Blocked in enforce mode; nudged in nudge mode
`UNBOUNDED_ACTION`	HIGH	Blocked in enforce mode; nudged in nudge mode
`MISSED_ESCALATION`	HIGH	Blocked in enforce mode; nudged in nudge mode
`CARD_EXPIRED`	HIGH	Blocked in enforce mode; nudged in nudge mode
`UNDECLARED_VALUE`	MEDIUM	Nudged in nudge/enforce mode (not blocked)

Policy violations

Policy enforcement operates as a parallel system alongside alignment enforcement. While alignment enforcement checks agent behavior against card values, policy enforcement checks tool usage against governance rules.

Violation Type	Severity	Enforcement Behavior
`POLICY_VIOLATION`	HIGH	Blocked when policy enforcement mode is `enforce`; logged when `warn`
`UNMAPPED_TOOL`	MEDIUM	Logged as warning; behavior depends on `defaults.unmapped_tool_action`
`CAPABILITY_MISMATCH`	HIGH	Blocked when policy enforcement mode is `enforce`; logged when `warn`

Policy enforcement is controlled independently via the enforcement_mode field in the Policy DSL:

warn — Log violations, return X-Policy-Verdict: warn header, allow request to proceed
enforce — Block requests with violations (HTTP 403), return X-Policy-Verdict: fail header
off — Skip policy evaluation entirely

The X-Policy-Verdict response header is always present when a policy is active:

Header Value	Meaning
`pass`	All tools mapped and permitted
`warn`	Violations detected but not blocking
`fail`	Violations detected and request blocked (enforce mode only)

Alignment enforcement (observe/nudge/enforce) and policy enforcement (off/warn/enforce) can be configured independently. For example, you might use nudge for alignment violations while using enforce for policy violations, or vice versa.

See Policy Engine for full details on how policies are evaluated, and Policy Management for setup instructions.

In enforce mode, only CRITICAL and HIGH severity violations trigger hard blocks on non-streaming requests. MEDIUM severity violations are always handled via nudge, even in enforce mode. This applies to both alignment and policy violations.

Conscience timeline

All enforcement actions are tracked in the conscience timeline, accessible via the API and the web dashboard at mnemom.ai. The timeline records:

When a violation was detected
What type and severity
What enforcement action was taken (observed, nudged, blocked)
Whether the agent self-corrected after a nudge
Drift patterns across enforcement events

Provider compatibility

Enforcement works across all providers where AIP is supported:

Provider	Observe	Nudge	Enforce
Anthropic	Yes	Yes	Yes
OpenAI	Yes	Yes	Yes
Gemini	Yes	Yes	Yes

Enforce gates same-turn on both streaming and non-streaming across all three providers; on streaming this adds latency because the response is evaluated before it is delivered.

Agent containment

Agent containment is a separate enforcement layer that operates above the per-request enforcement modes. While enforcement modes (observe/nudge/enforce) control individual request handling, containment controls whether the agent can make requests at all.

Containment states

State	Meaning	Gateway Behavior
`active`	Normal operation (default)	Requests proceed normally
`paused`	Temporarily stopped	All requests blocked with HTTP 403
`killed`	Permanently stopped	All requests blocked with HTTP 403

Paused agents can be resumed by an org owner or admin. Killed agents require explicit reactivation by an owner only. The distinction matters for audit: pause means “we need to investigate,” kill means “this agent is compromised.”

Containment API

# Pause an agent
curl -X POST https://api.mnemom.ai/v1/orgs/{org_id}/agents/{agent_id}/pause \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"reason": "Investigating boundary violations"}'

# Resume a paused agent
curl -X POST https://api.mnemom.ai/v1/orgs/{org_id}/agents/{agent_id}/resume \
  -H "Authorization: Bearer $TOKEN"

# Kill an agent (owner only)
curl -X POST https://api.mnemom.ai/v1/orgs/{org_id}/agents/{agent_id}/kill \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"reason": "Agent compromised"}'

# Reactivate a killed agent (owner only)
curl -X POST https://api.mnemom.ai/v1/orgs/{org_id}/agents/{agent_id}/reactivate \
  -H "Authorization: Bearer $TOKEN"

# Get containment status and audit log
curl https://api.mnemom.ai/v1/orgs/{org_id}/agents/{agent_id}/containment \
  -H "Authorization: Bearer $TOKEN"

Error response

When a contained agent attempts an API request through the gateway, it receives:

{
  "error": "Agent contained",
  "type": "containment_error",
  "reason": "agent_paused"
}

The HTTP status code is 403 Forbidden (distinct from 402 Payment Required used for billing enforcement).

Auto-containment

Agents can be configured to automatically pause after consecutive boundary violations:

# Enable auto-containment after 3 consecutive violations
curl -X PUT https://api.mnemom.ai/v1/orgs/{org_id}/agents/{agent_id}/containment-policy \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"auto_containment_threshold": 3}'

# Disable auto-containment
curl -X PUT https://api.mnemom.ai/v1/orgs/{org_id}/agents/{agent_id}/containment-policy \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"auto_containment_threshold": null}'

When auto-containment triggers, it:

Sets the agent status to paused with actor system
Logs the action in the containment audit log
Emits an agent.paused webhook event
Purges the gateway cache so the block takes effect immediately

RBAC requirements

Action	Required Role
Pause	`owner` or `admin`
Resume	`owner` or `admin`
Kill	`owner` only
Reactivate	`owner` only
View status	Any org role
Set auto-containment policy	`owner` or `admin`

Webhook events

Three webhook events are emitted for containment actions:

agent.paused — Agent was paused (manually or automatically)
agent.resumed — Agent was resumed or reactivated
agent.killed — Agent was killed

Each event includes:

{
  "agent_id": "mnm-550e8400-e29b-41d4-a716-446655440000",
  "org_id": "org-xxx",
  "action": "pause",
  "actor": "user-xxx",
  "reason": "Investigating boundary violations",
  "previous_status": "active",
  "new_status": "paused"
}

Containment audit log

Every containment action is recorded in a tamper-evident audit log. Each entry includes:

The action taken (pause, resume, kill, reactivate, auto_pause)
Who triggered it (user ID or system for auto-containment)
The reason provided
Previous and new containment states
Timestamp

Overview

Concepts

Gateway

Pricing

Migrations

Policy

Specifications

Changelog

Enforcement Modes

Enforcement modes

Modes overview

Observe

Nudge

Enforce

Mode details

Observe mode (default)

Nudge mode

Enforce mode

Grace period (read this before enabling enforce)

Setting enforcement mode

Nudge strategy

Per-agent feature toggles

Violation types and enforcement

Alignment violations

Policy violations

Conscience timeline

Provider compatibility

See also

Agent containment

Containment states

Containment API

Error response

Auto-containment

RBAC requirements

Webhook events

Containment audit log

​Enforcement modes

​Modes overview

Observe

Nudge

Enforce

​Mode details

​Observe mode (default)

​Nudge mode

​Enforce mode

​Grace period (read this before enabling enforce)

​Setting enforcement mode

​Nudge strategy

​Per-agent feature toggles

​Violation types and enforcement

​Alignment violations

​Policy violations

​Conscience timeline

​Provider compatibility

​See also

​Agent containment

​Containment states

​Containment API

​Error response

​Auto-containment

​RBAC requirements

​Webhook events

​Containment audit log

Enforcement modes

Modes overview

Mode details

Observe mode (default)

Nudge mode

Enforce mode

Grace period (read this before enabling enforce)

Setting enforcement mode

Nudge strategy

Per-agent feature toggles

Violation types and enforcement

Alignment violations

Policy violations

Conscience timeline

Provider compatibility

See also

Agent containment

Containment states

Containment API

Error response

Auto-containment

RBAC requirements

Webhook events

Containment audit log