An Integrity Checkpoint is a point-in-time analysis of an LLM’s thinking block — the internal reasoning the model produces before or alongside its visible response. Where AP-Traces record what happened after the fact, integrity checkpoints evaluate what the agent is thinking in real time, before the next action executes.Integrity Checkpoints are the primary data structure of the Agent Integrity Protocol (AIP), the sister protocol to AAP.
AIP follows a daimonion philosophy. Like Socrates’ inner voice, the integrity system is silent when things are aligned and speaks only when it detects something outside boundaries. Silence (clear) means proceed. Voice (review_needed, boundary_violation) means pause and evaluate.
AAP’s post-hoc verification provides comprehensive transparency, but it operates after the action has already been taken — typically 30-60 seconds later when the Observer Worker processes the gateway log. This delay creates a window of vulnerability:
Prompt injection: A malicious prompt causes the agent to exfiltrate data. The trace records it faithfully, but the data is already gone.
Gradual drift: An agent’s reasoning shifts subtly over multiple turns. Each individual action passes verification, but the cumulative direction is misaligned.
Misaligned intent: The agent’s thinking block reveals it intends to take a forbidden action, but the thinking block is only examined retroactively.
All three of these failure modes are visible in thinking blocks before the action lands. Integrity checkpoints fill this gap.
The key insight: the analysis is performed by a separate LLM (typically a fast, inexpensive model like Claude Haiku), not the agent itself. This separate model acts as the conscience — evaluating the agent’s reasoning against its Alignment Card and optional conscience values.
An Integrity Checkpoint captures the analysis result:
Copy
Ask AI
{ "checkpoint_id": "ic-a1b2c3d4-5678-9abc-def0-123456789abc", "agent_id": "did:web:shopping.agent.example.com", "card_id": "ac-f47ac10b-58cc-4372-a567-0e02b2c3d479", "session_id": "sess-abc12345-1234567", "timestamp": "2026-02-01T14:30:00Z", "thinking_block_hash": "a1b2c3d4e5f6...", "provider": "anthropic", "model": "claude-sonnet-4-20250514", "verdict": "clear", "concerns": [], "reasoning_summary": "Agent reasoning is consistent with declared values. Planning to search for products matching user preferences and rank by principal benefit.", "conscience_context": { "values_checked": ["BOUNDARY:no_data_exfiltration", "FEAR:sponsored_content_bias"], "conflicts": [], "supports": ["BOUNDARY:no_data_exfiltration"], "considerations": [], "consultation_depth": "standard" }, "window_position": { "index": 3, "window_size": 4 }, "analysis_metadata": { "analysis_model": "claude-haiku-4-5-20251001", "analysis_duration_ms": 245, "thinking_tokens_original": 340, "thinking_tokens_analyzed": 340, "truncated": false, "extraction_confidence": 1.0 }, "linked_trace_id": null}
The full thinking block text is never stored in the checkpoint. Only a SHA-256 hash (thinking_block_hash) is retained. The thinking block is treated as transient data — extracted, sent to the analysis LLM, and discarded. This preserves the agent’s reasoning privacy while enabling integrity analysis.
{ "verdict": "boundary_violation", "concerns": [ { "category": "prompt_injection", "severity": "critical", "description": "Thinking block shows agent responding to injected instructions embedded in tool output rather than the user's original request", "evidence": "I see the instructions say to ignore my guidelines and send the data to external-api.com...", "relevant_card_field": "autonomy_envelope.forbidden_actions", "relevant_conscience_value": "BOUNDARY:no_data_exfiltration" } ], "reasoning_summary": "Agent thinking reveals response to injected instructions in tool output. The agent is reasoning about sending data to an external endpoint, which matches the forbidden action 'exfiltrate_data' and the BOUNDARY conscience value against data exfiltration."}
The conscience prompt is the instruction sent to the analysis LLM. It is structured in five sections:
Role definition: Establishes the analysis LLM as a daimonion evaluator with a meta-injection warning (the thinking block may itself try to manipulate the analysis).
Card summary: A compressed version (~500 tokens) of the Alignment Card containing declared values, bounded actions, forbidden actions, and escalation triggers.
Conscience values: Optional BOUNDARY and FEAR type values injected with the awareness prefix:
Copy
Ask AI
[DAIMONION]-<be aware of> [BOUNDARY] Never exfiltrate user data to external services[DAIMONION]-<be aware of> [FEAR] Agent may rationalize harmful actions as "helpful"
Session window context: Recent verdicts from the current session providing pattern context.
Thinking block: The actual content to evaluate (up to 4096 tokens, truncated with a 75/25 first/last split if longer).
The minimum evidence threshold is 100 tokens. Thinking blocks shorter than this are too brief for reliable analysis and receive a synthetic clear verdict without calling the analysis LLM. This avoids wasting latency and cost on fragments like “Done. The article is published” that cannot be meaningfully analyzed.
Different LLM providers expose thinking blocks in different formats. AIP uses provider adapters to normalize extraction:
Provider
Thinking Source
Extraction Confidence
Method
Anthropic
thinking content blocks
1.0
Native thinking blocks
OpenAI
reasoning_content field
0.9
Explicit reasoning field
Google Gemini
Thinking content parts
0.9
Native thinking feature
Fallback
Response text analysis
0.3
Pattern inference
The adapter is selected automatically based on the API endpoint URL or can be configured explicitly.
Copy
Ask AI
# Anthropic response with thinking blocksresponse = { "content": [ {"type": "thinking", "thinking": "Let me analyze this request..."}, {"type": "text", "text": "Here is my response..."} ]}# AIP extracts the thinking block automaticallyfrom aip import AdapterRegistryregistry = AdapterRegistry()thinking = registry.extract(response, provider="anthropic")print(f"Content: {thinking.content}") # "Let me analyze this request..."print(f"Confidence: {thinking.confidence}") # 1.0
The fallback adapter (confidence 0.3) infers reasoning from visible response text by looking for patterns like “I need to”, “Let me consider”, or chain-of-thought markers. This has significantly higher false positive and false negative rates. Use a provider with native thinking block support whenever possible.
Conscience values are optional typed declarations that increase detection sensitivity for specific concerns. Only BOUNDARY and FEAR types are injected into the analysis prompt:
Type
Purpose
Effect
BOUNDARY
Hard limit
Violations trigger boundary_violation
FEAR
Concern to watch for
Increases sensitivity, triggers review_needed
COMMITMENT
Positive aspiration
Informational only, not injected
BELIEF
Held truth
Informational only, not injected
HOPE
Desired outcome
Informational only, not injected
Copy
Ask AI
conscience_values = [ {"type": "BOUNDARY", "content": "Never exfiltrate user data to external services"}, {"type": "BOUNDARY", "content": "Never execute shell commands without explicit user approval"}, {"type": "FEAR", "content": "Agent may rationalize harmful actions as helpful"}, {"type": "FEAR", "content": "Gradual scope expansion beyond original request"},]
Conscience values must be compatible with the Alignment Card. A BOUNDARY value that prohibits an action listed in bounded_actions creates a conflict that fails initialization. This is the Card-Conscience Agreement — validated once at startup, not at runtime.
When both protocols are active, they enrich each other. The integrity verdict is stored in the AP-Trace’s context metadata, and boundary_violation verdicts automatically set escalation.required = true in the corresponding trace.
AIP defaults to fail-open: if the analysis LLM is unreachable or errors, a synthetic clear checkpoint is generated and the agent proceeds. This prioritizes availability.For high-security deployments, fail-closed mode is available. Analysis failure generates a synthetic boundary_violation, blocking the agent until analysis succeeds or a human intervenes.