Verifiable Verdicts
Verifiable Verdicts extend the AIP attestation layer with zero-knowledge proofs that cryptographically demonstrate a verdict was derived correctly from its inputs. While signatures prove who signed a verdict and hash chains prove when it was signed, ZK proofs prove how it was computed — closing the remaining trust gap.
What Gets Proven
The ZK proof covers the verdict derivation rules: the deterministic mapping from analysis outputs to the final verdict. Specifically, the proof demonstrates:
-
Category-to-severity mapping: Each concern category (e.g.,
boundary_violation, prompt_injection, value_misalignment) is mapped to the correct severity level (low, medium, high, critical) according to the protocol rules.
-
Severity-to-verdict escalation: The most severe concern determines the checkpoint verdict:
- No concerns or all
low severity results in clear
medium severity results in review_needed
high or critical severity results in boundary_violation
-
Concerns hash integrity: The hash of the concerns array in the proof journal matches the hash in the certificate, proving the verdict was derived from the exact concerns that were reported.
-
Card constraint evaluation: The verdict accounts for the agent’s declared autonomy envelope — bounded actions, forbidden actions, and escalation triggers — as specified in the Alignment Card.
What Does NOT Get Proven
The ZK proof intentionally excludes:
-
LLM inference: The analysis LLM’s reasoning process is not proven. LLM inference is non-deterministic and computationally infeasible to prove inside a zkVM. The proof covers only the deterministic post-processing of the LLM’s structured output.
-
Thinking block content: The raw thinking block is never included in the proof. Only the SHA-256 hash (
thinking_block_hash) is referenced, preserving the agent’s reasoning privacy.
-
Analysis model selection: The choice of which LLM to use for analysis is an operational decision outside the proof scope.
The ZK proof guarantees that the verdict was correctly derived from the analysis output, not that the analysis itself was correct. The analysis quality remains bounded by the analysis LLM’s capability. See Limitations.
Guest Program
The verdict derivation logic runs inside an SP1 zkVM guest program. The guest program is a deterministic Rust binary that takes structured inputs, applies the verdict rules, and produces public outputs.
These are provided to the guest program but are not revealed in the proof:
| Input | Type | Description |
|---|
concerns | Vec<Concern> | The array of concerns from the analysis LLM, each with category, severity, and description. |
card | CardSummary | Compressed Alignment Card: values, bounded actions, forbidden actions, escalation triggers. |
model | String | Analysis model identifier. |
Computation
The guest program executes the following deterministic steps:
- Validate input structure: Verify concerns array and card summary are well-formed.
- Apply category mapping: Map each concern’s category to the protocol-defined severity rules.
- Compute verdict: Determine the final verdict from the highest severity across all concerns.
- Compute concerns hash: SHA-256 hash of the serialized concerns array (deterministic key ordering).
- Write journal: Output the verdict, concerns hash, and card ID as public commitments.
Outputs (Public Journal)
These are committed to the proof’s public journal and are visible to verifiers:
| Output | Type | Description |
|---|
verdict | String | The computed verdict (clear, review_needed, boundary_violation). |
concerns_hash | String | SHA-256 of the serialized concerns array. Verifiers compare this to the certificate’s concerns to confirm consistency. |
card_id | String | The Alignment Card ID, proving the verdict was derived against a specific card. |
The proof is an SP1 STARK (Scalable Transparent ARgument of Knowledge), stored and transmitted in three components:
| Component | Description |
|---|
receipt | The STARK proof receipt. Contains the cryptographic proof data. Base64-encoded. |
journal | The public journal output. Contains the committed public values (verdict, concerns_hash, card_id). Base64-encoded. |
image_id | The hash of the guest program binary. Uniquely identifies which program was executed. Used to verify the proof was generated by the correct verdict derivation program. |
The image_id is a content hash of the compiled guest program. It acts as a commitment to the exact computation that was proven. A verifier checks:
- The
image_id matches the expected verdict derivation program (published in the Mnemom GitHub repository).
- The
receipt is a valid STARK proof for the given image_id and journal.
- The
journal outputs are consistent with the certificate’s claims.
Optimistic Proving Strategy
Not every checkpoint receives a ZK proof. Proving has non-trivial cost and latency, so AIP uses an optimistic proving strategy:
| Verdict | Proving Policy | Rationale |
|---|
boundary_violation | Always proven | Highest-consequence verdicts must be independently verifiable. A false boundary_violation can block an agent; a missed one can allow harm. |
review_needed | 10% stochastic sampling | Sampled proofs provide statistical confidence without proving every checkpoint. |
clear | 10% stochastic sampling | Same stochastic rate. Sampling clear verdicts catches systematic errors in the derivation logic. |
The 10% sampling rate means that over a session of 100 checkpoints, approximately 10 will have ZK proofs. This provides strong probabilistic assurance that the derivation logic is functioning correctly, while keeping proving costs proportional to usage.
On-Demand Proving
Any checkpoint can be proven on demand via POST /v1/checkpoints/{checkpoint_id}/prove. This is useful when:
- An operator wants to verify a specific
review_needed verdict
- An auditor requires proof for a specific checkpoint
- The stochastic sample missed a checkpoint of interest
Cost and Latency
| Metric | Value | Notes |
|---|
| Proving cost | ~$0.001/proof | Modal H100 GPU, per-second billing. Scales to zero when idle. |
| Proving latency | ~700ms | Wall-clock time from proof request to receipt on H100 GPU. Previously 3-15 minutes on CPU. |
| Verification latency | <100 milliseconds | STARK verification is fast. Local verification avoids network round-trips. |
| Proof size | ~2.7 MB | SP1 compressed STARK receipt. |
Proving is asynchronous. The checkpoint verdict is returned immediately; the proof is generated in the background (~700ms on GPU) and attached to the certificate when complete. With GPU acceleration, proof generation completes before the next integrity checkpoint in most conversations.
When the gateway creates a proof request, it stores the full proving inputs alongside the verdict_proofs row:
| Column | Description |
|---|
analysis_json | The full analysis response text from the AIP observer |
thinking_hash | SHA-256 hash of the thinking block |
card_hash | SHA-256 hash of the Alignment Card at the time of analysis |
values_hash | SHA-256 hash of the conscience values at the time of analysis |
model | The analysis model identifier |
This ensures that proof inputs survive prover restarts, network failures, and machine stops. If the initial proving attempt fails, the retry loop can reconstruct the full proof request from the database without needing the original HTTP payload.
Self-Healing Retry Loop
The prover service runs a background retry loop every 30 seconds that:
- Queries
get_pending_proofs() for proofs with status pending or failed (up to 5 retries)
- Filters out proofs without stored input data (legacy orphans)
- Sets status to
proving and spawns prove_verdict_derivation() with the stored inputs
- On success, calls
complete_proof() to persist the receipt, journal, and verification result
- On failure, calls
fail_proof() to increment the retry counter
This makes the proving pipeline self-healing: transient failures (prover unavailable, machine restarts, OOM) are automatically recovered without operator intervention.
If a proof fails to generate after 5 retries, it remains in failed status. The checkpoint is still valid — it just lacks the additional computational integrity guarantee. Proofs without stored inputs (created before input persistence was added) are skipped by the retry loop and marked as permanently failed.
Verification
Server-Side
Submit the full certificate to POST /v1/verify. The API delegates STARK verification to the prover service and returns a structured result including checks.verdict_derivation.valid.
Local Verification
Use the SP1 verifier SDK to verify locally:
use sp1_sdk::{ProverClient, SP1ProofWithPublicValues};
let client = ProverClient::from_env();
let (_, vk) = client.setup(ELF);
let proof: SP1ProofWithPublicValues = deserialize(&cert.proofs.verdict_derivation.receipt);
// Verify the proof
client.verify(&proof, &vk).expect("Proof verification failed");
// Read the public values
let journal = proof.public_values.read::<VerdictJournal>();
assert_eq!(journal.verdict, cert.claims.verdict);
Checking Proof Status
For checkpoints where proving is in progress, query the status:
curl https://api.mnemom.ai/v1/checkpoints/{checkpoint_id}/proof
The status field progresses through: pending -> proving -> completed (or failed).
Further Reading