The simulate endpoint runs the gateway and observer policy evaluators against an agent’s current composed spec — without crossing into the real gateway / observer runtime. You describe a hypothetical input or tool call; the platform tells you whether it would be allowed.Documentation Index
Fetch the complete documentation index at: https://docs.mnemom.ai/llms.txt
Use this file to discover all available pages before exploring further.
When to simulate
- Before a manifest change goes live — verify the new spec accepts the calls you want and rejects the ones you don’t.
- Before an agent prompt change — check whether a new tool the agent might call would clear policy.
- During post-incident review — replay a problematic tool call to understand why the gateway flagged it.
- In CI — wire simulate into your test suite to gate spec changes on hypothetical-call coverage.
The request shape
Two body fields, either or both:candidate_tool_call is set without candidate_input, the platform synthesizes an assistant message carrying the toolUses block. If both are set, the explicit candidate_input.messages wins. If neither is set, simulate runs against the empty conversation (useful for “does this agent’s spec compose cleanly?”).
The response shape
The allowed verdict
allowed is one of three values:
| Value | Meaning |
|---|---|
"true" | Both gateway and observer passed cleanly. The call would clear policy. |
"false" | At least one evaluator returned verdict: "fail". The call would be rejected. |
"conditional" | One or both evaluators returned verdict: "warn", OR a gateway_hook_missing_receipt violation surfaced. conditions[] enumerates what’s needed. |
| Gateway | Observer | + missing receipts? | allowed |
|---|---|---|---|
| pass | pass | no | "true" |
| fail | * | * | "false" |
| * | fail | * | "false" |
| warn | * | * | "conditional" |
| * | warn | * | "conditional" |
| pass | pass | yes | "conditional" |
Common patterns
Test a manifest before committing
When iterating on a new manifest, PUT it to a sandbox agent, simulate the calls you expect the agent to make, then promote to production once they all pass.Discover required receipts
The gateway hooks bind on catalog values likepolicy_attentiveness and deliberation_before_action. They require specific consultation receipts (typically think) before invoking certain tool classes. Simulate surfaces the requirement:
think consultation to the agent’s prompt scaffolding, then re-simulate to confirm the receipt closes the requirement.
Wire simulate into CI
Themnemom/cards-action (Wave 5 deliverable) drives simulate per-PR — every spec change runs a configurable set of hypothetical calls and posts the verdicts as a PR comment. Until that ships, you can wire it manually:
Pure-sync only
Simulate calls@mnemom/policy-engine directly twice (once with context: "gateway", once with context: "observer") using dryRun: true. It never crosses worker boundaries — no real gateway state change, no observer ledger entry. The evaluation is deterministic for a given (spec, candidate) pair.
Protection scope
POST /v1/protection/agent/<id>/simulate returns a Phase 5 deferred shape — the dedicated protection-policy evaluator isn’t shipped yet. For V1, the endpoint exists so the URL surface is symmetric; the protection card itself still composes through the same scope cascade as alignment, surfacable via GET /v1/protection/agent/<id>/effective.
Rate limits + cost
Simulate is pure-sync — no LLM call, no rate-limit budget consumed. You can simulate freely in CI loops, test suites, and pre-commit hooks.Related reading
- AI helpers — overview of the four AI-forward verbs.
- Explain and remediate — for understanding violations on the current spec.
- Sub-resource verbs — for applying the fix once simulate identifies what’s needed.
- Policy engine — the evaluator simulate drives.