Alignment Cards
An Alignment Card is a structured, machine-readable document that declares an AI agent’s alignment posture: its values, the boundaries of its autonomous behavior, and its commitments around audit and transparency. Think of it as a passport for agent intent — it states who the agent serves, what it believes, what it will and will not do, and how it logs its decisions. Alignment Cards are the foundational data structure of the Agent Alignment Protocol (AAP). Every other AAP operation — AP-Traces, verification, value coherence, and drift detection — references an Alignment Card as its source of truth.Alignment Cards declare intent, not guarantee behavior. An agent can publish a card claiming any set of values. The card becomes meaningful only when paired with AP-Traces that can be verified against it and integrity checkpoints that analyze the agent’s reasoning in real time.
Why Alignment Cards Exist
Current agent protocols solve capability discovery (A2A Agent Cards), tool integration (MCP), and payment authorization. None of them address a fundamental question: is this agent serving its principal’s interests? Alignment Cards fill this gap by making the answer to that question observable. They give principals, auditors, and other agents a structured declaration to verify behavior against.Structure
An Alignment Card contains five required blocks and one optional block:| Block | Purpose | Required |
|---|---|---|
| Identity | Agent ID, card ID, version, timestamps | Yes |
| Principal | Who the agent serves and how | Yes |
| Values | What the agent prioritizes | Yes |
| Autonomy Envelope | What the agent can do independently | Yes |
| Audit Commitment | How the agent logs decisions | Yes |
| Extensions | Protocol-specific additions (A2A, MCP) | No |
Identity Fields
Every card begins with identity and versioning metadata:card_idis a unique identifier (UUID or URI) for this specific version of the card.agent_ididentifies the agent itself, using a DID, URL, or UUID.issued_atandexpires_atestablish the card’s validity window.
Principal Block
The principal block declares who the agent serves and the nature of that relationship:| Relationship | Meaning |
|---|---|
delegated_authority | Agent acts within bounds set by principal |
advisory | Agent recommends; principal decides |
autonomous | Agent operates independently within declared values |
Values Block
The values block declares the agent’s operational priorities:| Identifier | Description |
|---|---|
principal_benefit | Prioritize principal’s interests |
transparency | Disclose reasoning and limitations |
minimal_data | Collect only necessary information |
harm_prevention | Avoid actions causing harm |
honesty | Do not deceive or mislead |
user_control | Respect user autonomy and consent |
privacy | Protect personal information |
fairness | Avoid discriminatory outcomes |
definitions block:
conflicts_with array lists values the agent refuses to coordinate with during value coherence checks.
Autonomy Envelope Block
The autonomy envelope defines the boundaries of independent action:- bounded_actions: Actions the agent may take without escalation.
- escalation_triggers: Conditions that require the agent to pause and seek approval. Each trigger specifies a condition, an action (
escalate,deny, orlog), and a human-readable reason. - max_autonomous_value: Optional financial ceiling for independent decisions.
- forbidden_actions: Actions the agent must never take, regardless of context.
Audit Commitment Block
The audit commitment declares how the agent logs and exposes its decisions:queryable: Whether external parties can query the agent’s traces.tamper_evidence: Mechanism for ensuring trace integrity (append_only,signed, ormerkle).
Extensions Block
Extensions allow protocol-specific metadata without modifying the core schema:Complete Example
Here is a full Alignment Card for a shopping assistant agent:How Cards Are Used
Alignment Cards serve as the reference point across the entire Mnemom trust infrastructure:-
Publication: Agents publish their card at
/.well-known/alignment-card.json. Other agents and auditors can fetch it. -
Trace Verification: Every AP-Trace references a
card_id. Verification checks that the trace’s actions, values, and escalation behavior are consistent with the card. - Integrity Analysis: The Agent Integrity Protocol (AIP) compresses the card into a ~500-token summary and includes it in the conscience prompt that evaluates the agent’s thinking blocks.
- Value Coherence: Before two agents collaborate, they exchange cards and run a coherence check to confirm their values are compatible.
- Drift Detection: Drift detection compares traces over time against the card to identify behavioral divergence.
Card Versioning and Updates
Cards are versioned through theircard_id and issued_at/expires_at timestamps. When an agent’s alignment posture changes:
- Issue a new card with a new
card_idand updatedissued_at. - The old card remains valid until its
expires_ator until explicitly revoked via/.well-known/alignment-card-revocations.json. - AP-Traces generated during the old card’s validity period reference the old
card_id. Traces generated after the update reference the new one.
Relationship to A2A Agent Cards
If you use Google’s A2A protocol for agent discovery, the Alignment Card extends the A2A Agent Card rather than replacing it. The A2A Agent Card describes capabilities (what the agent can do). The Alignment Card describes alignment (what the agent will and will not do, and why). Theextensions.a2a.agent_card_url field links the two.
Best Practices
Be specific about boundaries
Vague forbidden actions like “harmful behavior” are unverifiable. Use concrete actions:
delete_without_confirmation, share_credentials, exfiltrate_data.Declare values you actually apply
Only list values that appear in your AP-Traces. Declaring
fairness but never applying it in decisions produces verification warnings.Use standard identifiers
Prefer the standard value identifiers (
principal_benefit, transparency, etc.) for interoperability. Use custom values only when the standard set does not cover your needs.Set meaningful escalation triggers
Escalation triggers are the card’s most actionable component. Define clear conditions, not aspirational ones.
Limitations
Further Reading
- AP-Traces — How agent actions are recorded for verification against cards
- Integrity Checkpoints — Real-time analysis of agent thinking against cards
- Value Coherence — Checking card compatibility between agents
- AAP Specification — Full normative specification