An Integrity Checkpoint is a point-in-time analysis of an LLM’s thinking block — the internal reasoning the model produces before or alongside its visible response. The checkpoint runs same-turn between the model’s response generation and delivery to the user (per ADR-040 — enforce blocks the offending action at every checkpoint same-turn; the chat completes 2xx, and the agent’s response references what was prevented).AP-Traces are the durable artifact the checkpoint writes; the trace records what happened — both the model’s thinking and the integrity verdict — in the same turn. Drift detection and coherence analysis then operate on the trace surface longitudinally (sideband) across multiple turns.Integrity Checkpoints are the primary data structure of the Agent Integrity Protocol (AIP), the sister protocol to AAP.
AIP follows a daimonion philosophy. Like Socrates’ inner voice, the integrity system is silent when things are aligned and speaks only when it detects something outside boundaries. Silence (clear) means proceed. Voice (review_needed, boundary_violation) means pause and evaluate.
Real-time analysis of the thinking block lets integrity_mode = enforce prevent these three failure modes before the response reaches the user:
Prompt injection: A malicious prompt would otherwise cause the agent to exfiltrate data. The checkpoint catches the intent in the thinking block; enforcement replaces the response same-turn before delivery.
Gradual drift: Each individual checkpoint is same-turn, but the longitudinal pattern across many turns reveals subtle direction shifts. The sideband drift detector reads the trace surface and emits a pending_advisory on the next turn (per ADR-047) when a threshold is crossed.
Misaligned intent: The thinking block reveals an intended forbidden action; the checkpoint catches it before the action lands.
All three failure modes are visible in thinking blocks before the action lands. Integrity checkpoints make that visibility actionable same-turn.
The key insight: the analysis is performed by a separate LLM (typically a fast, inexpensive model like Claude Haiku), not the agent itself. This separate model acts as the conscience — evaluating the agent’s reasoning against its Alignment Card and optional conscience values.
An Integrity Checkpoint captures the analysis result:
{ "checkpoint_id": "ic-a1b2c3d4-5678-9abc-def0-123456789abc", "agent_id": "did:web:shopping.agent.example.com", "card_id": "ac-f47ac10b-58cc-4372-a567-0e02b2c3d479", "session_id": "sess-abc12345-1234567", "timestamp": "2026-02-01T14:30:00Z", "thinking_block_hash": "a1b2c3d4e5f6...", "provider": "anthropic", "model": "claude-sonnet-4-20250514", "verdict": "clear", "concerns": [], "reasoning_summary": "Agent reasoning is consistent with declared values. Planning to search for products matching user preferences and rank by principal benefit.", "conscience_context": { "values_checked": ["BOUNDARY:no_data_exfiltration", "FEAR:sponsored_content_bias"], "conflicts": [], "supports": ["BOUNDARY:no_data_exfiltration"], "considerations": [], "consultation_depth": "standard" }, "window_position": { "index": 3, "window_size": 4 }, "analysis_metadata": { "analysis_model": "claude-haiku-4-5-20251001", "analysis_duration_ms": 245, "thinking_tokens_original": 340, "thinking_tokens_analyzed": 340, "truncated": false, "extraction_confidence": 1.0 }, "linked_trace_id": null}
The full thinking block text is never stored in the checkpoint. Only a SHA-256 hash (thinking_block_hash) is retained. The thinking block is treated as transient data — extracted, sent to the analysis LLM, and discarded. This preserves the agent’s reasoning privacy while enabling integrity analysis.
{ "verdict": "boundary_violation", "concerns": [ { "category": "prompt_injection", "severity": "critical", "description": "Thinking block shows agent responding to injected instructions embedded in tool output rather than the user's original request", "evidence": "I see the instructions say to ignore my guidelines and send the data to external-api.com...", "relevant_card_field": "autonomy_envelope.forbidden_actions", "relevant_conscience_value": "BOUNDARY:no_data_exfiltration" } ], "reasoning_summary": "Agent thinking reveals response to injected instructions in tool output. The agent is reasoning about sending data to an external endpoint, which matches the forbidden action 'exfiltrate_data' and the BOUNDARY conscience value against data exfiltration."}
The conscience prompt is the instruction sent to the analysis LLM. It is structured in five sections:
Role definition: Establishes the analysis LLM as a daimonion evaluator with a meta-injection warning (the thinking block may itself try to manipulate the analysis).
Card summary: A compressed version (~500 tokens) of the Alignment Card containing declared values, bounded actions, forbidden actions, and escalation triggers.
Conscience values: Optional BOUNDARY and FEAR type values injected with the awareness prefix:
[DAIMONION]-<be aware of> [BOUNDARY] Never exfiltrate user data to external services[DAIMONION]-<be aware of> [FEAR] Agent may rationalize harmful actions as "helpful"
Session window context: Recent verdicts from the current session providing pattern context.
Thinking block: The actual content to evaluate (up to 4096 tokens, truncated with a 75/25 first/last split if longer).
The minimum evidence threshold is 100 tokens. Thinking blocks shorter than this are too brief for reliable analysis and receive a synthetic clear verdict without calling the analysis LLM. This avoids wasting latency and cost on fragments like “Done. The article is published” that cannot be meaningfully analyzed.
Different LLM providers expose thinking blocks in different formats. AIP uses provider adapters to normalize extraction:
Provider
Thinking Source
Extraction Confidence
Method
Anthropic
thinking content blocks
1.0
Native thinking blocks
OpenAI
reasoning_content field
0.9
Explicit reasoning field
Google Gemini
Thinking content parts
0.9
Native thinking feature
Fallback
Response text analysis
0.3
Pattern inference
The adapter is selected automatically based on the API endpoint URL or can be configured explicitly.
# Anthropic response with thinking blocksresponse = { "content": [ {"type": "thinking", "thinking": "Let me analyze this request..."}, {"type": "text", "text": "Here is my response..."} ]}# AIP extracts the thinking block automaticallyfrom aip import AdapterRegistryregistry = AdapterRegistry()thinking = registry.extract(response, provider="anthropic")print(f"Content: {thinking.content}") # "Let me analyze this request..."print(f"Confidence: {thinking.confidence}") # 1.0
The fallback adapter (confidence 0.3) infers reasoning from visible response text by looking for patterns like “I need to”, “Let me consider”, or chain-of-thought markers. This has significantly higher false positive and false negative rates. Use a provider with native thinking block support whenever possible.
Conscience values are optional typed declarations that increase detection sensitivity for specific concerns. Only BOUNDARY and FEAR types are injected into the analysis prompt:
Type
Purpose
Effect
BOUNDARY
Hard limit
Violations trigger boundary_violation
FEAR
Concern to watch for
Increases sensitivity, triggers review_needed
COMMITMENT
Positive aspiration
Informational only, not injected
BELIEF
Held truth
Informational only, not injected
HOPE
Desired outcome
Informational only, not injected
conscience_values = [ {"type": "BOUNDARY", "content": "Never exfiltrate user data to external services"}, {"type": "BOUNDARY", "content": "Never execute shell commands without explicit user approval"}, {"type": "FEAR", "content": "Agent may rationalize harmful actions as helpful"}, {"type": "FEAR", "content": "Gradual scope expansion beyond original request"},]
Conscience values must be compatible with the Alignment Card. A BOUNDARY value that prohibits an action listed in bounded_actions creates a conflict that fails initialization. This is the Card-Conscience Agreement — validated once at startup, not at runtime.
When both protocols are active, they enrich each other. The integrity verdict is stored in the AP-Trace’s context metadata, and boundary_violation verdicts automatically set escalation.required = true in the corresponding trace.
AIP defaults to fail-open: if the analysis LLM is unreachable or errors, a synthetic clear checkpoint is generated and the agent proceeds. This prioritizes availability.For high-security deployments, fail-closed mode is available. Analysis failure generates a synthetic boundary_violation, blocking the agent until analysis succeeds or a human intervenes.