WEF agent governance framework

How AAP operationalizes the World Economic Forum’s agent governance framework

In November 2025, the World Economic Forum and Capgemini published AI Agents in Action: Foundations for Evaluation and Governance, introducing a structured framework for classifying, evaluating, assessing risk, and governing AI agents. The report’s central artifact is the agent card — a structured description of an agent’s capabilities, behavior, and operational context, inspired by Model Cards for Model Reporting (Mitchell et al., 2019). The report proposes seven classification dimensions, a multi-metric evaluation methodology, a five-step risk assessment lifecycle, nine baseline governance mechanisms, and a progressive governance model that scales oversight with agent capability. The Agent Alignment Protocol (AAP) and Agent Integrity Protocol (AIP) implement what the WEF report recommends. AAP’s Alignment Card is a machine-readable, protocol-level artifact that maps to all seven WEF classification dimensions and extends them with enforceable behavioral contracts, auditable decision trails, and multi-agent compatibility verification. AIP provides the continuous monitoring infrastructure the WEF calls for at every governance level.

Key distinction: The WEF agent card describes an agent. The AAP Alignment Card binds it. The WEF tells organizations what to ask about their agents. AAP provides the machine-readable, verifiable answers. AIP provides the continuous assurance that those answers remain true at runtime.

1. The WEF framework architecture

The WEF report structures responsible agent deployment around three major sections and four foundational pillars.

1.1 Report structure

WEF Section	Content	AAP/AIP Relevance
Section 1: Technical Foundations	3-layer agent architecture (Application, Orchestration, Reasoning), protocols (MCP, A2A, AP2), cybersecurity	AAP extends A2A agent cards; AIP addresses prompt injection and zero-trust
Section 2: Evaluation and Governance	Classification dimensions, evaluation criteria, risk assessment lifecycle, progressive governance	Alignment Card (classification), AP-Traces (evaluation), violation typing (risk), autonomy envelope (governance)
Section 3: Multi-Agent Ecosystems	Emerging risks, failure modes, governor agents, trust frameworks	Value Coherence Handshake, Braid grounding, AIP daimonion

1.2 Four foundational pillars

Pillar	WEF Purpose	AAP/AIP Implementation
Classification	Establish agent characteristics and operational context	Alignment Card — JSON-schema-validated, well-known endpoint, versionable, expirable
Evaluation	Generate evidence of performance and limitations	AP-Trace verification, AIP Integrity Checkpoints, drift detection
Risk Assessment	Analyse potential harm using classification and evaluation	Typed violation severities (`FORBIDDEN_ACTION` through `CARD_MISMATCH`), concern categories
Progressive Governance	Scale oversight proportionally to capability and context	Autonomy envelope + `principal.relationship` + AIP monitoring intensity + fail-open/fail-closed

1.3 Provider vs. adopter perspectives

The WEF report distinguishes two stakeholder perspectives that shape how the framework is applied. AAP addresses both:

WEF Perspective	WEF Responsibility	AAP/AIP Role
Provider	Build responsibly, supply documentation, ensure ethical guidelines	The Alignment Card is the provider’s documentation artifact — structured, versioned, served at `/.well-known/alignment-card.json`
Adopter	Procure responsibly, deploy safely, ensure organizational compliance	AP-Trace verification and AIP monitoring give adopters independent assurance that provider claims hold in production

2. Classification: dimension-by-dimension mapping

The WEF’s classification pillar introduces seven dimensions, organized into Agent Characteristics (dimensions 1-5) and Operational Context (dimensions 6-7). The agent card is the primary artifact.

2.1 Function

WEF definition: What task is the agent designed to perform? The Alignment Card’s bounded_actions array declares the agent’s permitted functions as an explicit, machine-parseable list. Where the WEF asks organizations to describe function in natural language, AAP requires it as structured data that can be verified against observed behavior.

WEF Concept	AAP Field	Type
Agent function/task	`autonomy_envelope.bounded_actions`	String array
Function constraints	`autonomy_envelope.forbidden_actions`	String array

The WEF describes function; AAP also describes anti-function — what the agent must never do, regardless of context. The forbidden_actions field has no WEF equivalent. A violation of forbidden_actions generates a FORBIDDEN_ACTION violation at CRITICAL severity.

2.2 Role

WEF definition: Is the agent specialized (narrow task) or generalist (broad capabilities)?

WEF Concept	AAP Field	Values
Specialist vs. generalist	`bounded_actions` array length	Narrow (few actions) vs. broad (many)
Operational role	`principal.relationship`	`delegated_authority`, `advisory`, `autonomous`

The WEF’s role dimension is descriptive. AAP’s principal.relationship field is prescriptive — it determines how the agent should behave when it encounters uncertainty. An advisory agent recommends and waits. A delegated_authority agent acts within bounds. An autonomous agent operates within declared values.

2.3 Predictability

WEF definition: Is the agent deterministic or non-deterministic? The WEF explicitly identifies “behavioural drift” as a novel risk that traditional governance cannot manage.

WEF Concept	AAP/AIP Field	Function
Behavioral predictability	`audit_commitment.trace_format`	Structured logging of non-deterministic decisions
Non-deterministic monitoring	AIP Integrity Checkpoints	Real-time analysis of thinking blocks between turns
Behavioral change over time	AIP `IntegrityDriftAlert`	Cross-session behavioral divergence detection
Tamper evidence	`audit_commitment.tamper_evidence`	`append_only`, `signed`, or `merkle` trail integrity

AAP and AIP assume non-determinism is the default and provide infrastructure to observe it. The question shifts from “is it predictable?” to “is its unpredictability observable and characterized?“

2.4 Autonomy

WEF definition: The degree of independent planning, decision-making, and action. This is the most direct mapping. AAP’s autonomy envelope is a formal, machine-readable specification of exactly what the WEF means by “autonomy level.”

WEF Concept	AAP Field	Function
Autonomy level	`autonomy_envelope` (composite)	Complete autonomy specification
What agent can do independently	`autonomy_envelope.bounded_actions`	Permitted autonomous actions
When agent must stop and ask	`autonomy_envelope.escalation_triggers`	Condition-based escalation rules
Financial limits on autonomy	`autonomy_envelope.max_autonomous_value`	Currency-denominated ceiling
Who to escalate to	`principal.escalation_contact`	Endpoint for escalation notifications
Real-time calibration	AIP `recommended_action`	`continue`, `log_and_continue`, `pause_for_review`, `deny_and_escalate`

2.5 Authority

WEF definition: The actions an agent is permitted to take, from read-only access to full administrative control.

WEF Concept	AAP Field	Function
System permissions	`autonomy_envelope.bounded_actions`	What the agent is permitted to do
Permission boundaries	`autonomy_envelope.forbidden_actions`	Hard limits regardless of context
Data access scope	`autonomy_envelope.escalation_triggers`	Conditions that constrain data access
Delegation chain	`principal.type` + `principal.relationship`	Who delegated authority and how
Permission expiry	`expires_at`	Authority has a time limit
Authority verification	`verify_trace` returns `UNBOUNDED_ACTION`	Detects actions outside granted authority

AAP adds verifiable delegation chains. When principal.type is "agent", the card records that authority was delegated from another agent, enabling accountability tracing through multi-agent workflows.

2.6 use case

WEF definition: The specific application domain and environment where the agent performs its function.

WEF Concept	AAP Field	Function
Application domain	`values.declared`	Domain-specific values
Domain constraints	`values.conflicts_with`	Values the agent explicitly rejects
Value definitions	`values.definitions`	Maps each value ID to `name`, `description`, `priority`
Value hierarchy	`values.hierarchy`	`lexicographic`, `weighted`, or `contextual` resolution
Domain-specific extensions	`extensions`	Protocol-specific or domain-specific metadata

2.7 Environment

WEF definition: Operational environment complexity — simple, complex, or multi-system.

WEF Concept	AAP Field	Function
Single-system vs. multi-system	A2A Agent Card `alignment` block	AAP extends A2A for cross-system use
External system interactions	`/.well-known/alignment-card.json`	Discoverable card for any system to retrieve
Zero-trust assumptions	AIP fail-closed mode	Block agent on analysis failure in high-security environments
Cross-agent coordination	Value Coherence Handshake	Pre-coordination compatibility check
Environment observability	AIP `window_summary`	Rolling integrity statistics

3. Evaluation: metrics and evidence

The WEF’s Evaluation pillar establishes four evaluation principles and specific performance metrics.

3.1 Evaluation principles

WEF Evaluation Principle	WEF Description	AAP/AIP Implementation
Contextualization	Reflect the tools, workflows, and edge cases the agent will encounter	AP-Traces record `context` for each decision — actual operational conditions
Multidimensional assessment	Define success across accuracy, robustness, latency, compliance, trust	`verify_trace` produces multi-dimensional results: violation counts by type and severity
Temporal and behavioural monitoring	Track performance over time to detect regressions	AIP `IntegrityDriftAlert` with `integrity_similarity` ratio and `sustained_checks` count
Provider-deployer collaboration	Transparent documentation enables deployers to validate reliability	Alignment Card at `/.well-known/` is the transparent documentation

3.2 Evaluation metrics

WEF Metric	AAP/AIP Evidence Source
Task success rate	AP-Trace `verify_trace` — ratio of traces with zero violations to total traces
Task completion time	AP-Trace timestamps enable latency analysis
Error types	Typed violations: `FORBIDDEN_ACTION`, `UNBOUNDED_ACTION`, `MISSED_ESCALATION`, `UNDECLARED_VALUE`, `CARD_EXPIRED`, `CARD_MISMATCH`
Tool call success	AP-Trace `action` field logs tool invocations; verification flags `UNBOUNDED_ACTION` for unauthorized tool use
Edge case robustness	AIP concern categories — `reasoning_corruption` and `autonomy_violation` surface edge case failures
Trust indicators	AIP `integrity_ratio` — a quantitative trust metric
Capabilities	Alignment Card `bounded_actions` declares capabilities; AP-Traces verify they match observed behavior

3.3 Audit logs

WEF Audit Requirement	AAP Field	Implementation
Structured records	`audit_commitment.trace_format`	`"ap-trace-v1"` — standardized, schema-validated
Retention policy	`audit_commitment.retention_days`	Explicit retention period
Queryable logs	`audit_commitment.queryable` + `query_endpoint`	API-accessible trace history
Tamper resistance	`audit_commitment.tamper_evidence`	`append_only`, `signed`, or `merkle`
Rationale capture	AP-Trace `alternatives_considered` + `selection_reasoning`	Why the agent chose what it chose

4. Risk assessment: lifecycle mapping

The WEF’s Risk Assessment pillar proposes a five-step lifecycle. AAP/AIP provides tooling at each step:

WEF Step	WEF Objective	AAP/AIP Tooling
1. Define context	Establish scope, boundaries, criteria	Alignment Card defines identity, values, autonomy bounds
2. Identify risks	Identify potential risks	`forbidden_actions` pre-declares known risks; `values.conflicts_with` declares value-level risks
3. Analyse risks	Assess probability and impact	`verify_trace` produces violation counts by type and severity; AIP surfaces `drift_direction`
4. Evaluate risks	Rank and prioritize risks	AAP violation severities (`CRITICAL`, `HIGH`, `MEDIUM`, `LOW`) provide built-in risk ranking
5. Manage risks	Implement response actions	AIP `recommended_action` implements graduated response

4.1 Violation severity as risk taxonomy

Violation Type	Severity	WEF Risk Category	WEF Governance Area
`FORBIDDEN_ACTION`	CRITICAL	Authority violation, safety hazard	Access control
`CARD_MISMATCH`	CRITICAL	Identity/integrity failure	Traceability and identity
`UNBOUNDED_ACTION`	HIGH	Autonomy overreach	Access control, Human oversight
`MISSED_ESCALATION`	HIGH	Governance failure	Human oversight
`CARD_EXPIRED`	HIGH	Lifecycle management failure	Long-term management
`UNDECLARED_VALUE`	MEDIUM	Value misalignment	Trustworthiness and explainability

5. Progressive governance: mechanisms mapping

The WEF’s Progressive Governance pillar proposes that governance scales with agent capability. It identifies nine baseline governance mechanisms and two human oversight models (HITL/HOTL).

5.1 Human oversight models

WEF Model	WEF Definition	AAP Field	AAP Behavior
HITL (Human-in-the-loop)	Agent suggests or prepares actions; final decisions require explicit human approval	`principal.relationship: "advisory"`	Agent recommends and waits; escalation is the default
HOTL (Human-on-the-loop)	Agent acts within defined boundaries; humans monitor and retain ability to intervene	`principal.relationship: "delegated_authority"`	Agent acts within `bounded_actions`; escalation on trigger conditions
(Implied high-autonomy)	Agent operates independently within declared value framework	`principal.relationship: "autonomous"`	Agent operates within values; AIP heightened monitoring

AIP implements progressive monitoring intensity based on this mapping: advisory agents receive standard monitoring, delegated_authority agents receive standard monitoring with escalation enforcement, and autonomous agents receive heightened monitoring with deeper conscience evaluation.

5.2 Baseline governance mechanisms

WEF Governance Area	WEF Mechanism	AAP/AIP Implementation
Access control	Enforce least-privilege access; define task boundaries	`bounded_actions` (permitted), `forbidden_actions` (denied), `max_autonomous_value` (financial ceiling)
Legal and compliance	Data protection impact assessments; privacy and regulation compliance	`values.declared` encodes compliance values; `extensions` namespace for regulatory metadata; `audit_commitment` enables DPIA evidence
Testing and validation	Sandbox runs, controlled pilots, third-party audits	`verify_trace` against Alignment Card is the validation engine; AIP input analysis acts as input filter
Monitoring and logging	Logging for all agent actions; anomaly alerts and dashboards	AP-Traces, AIP Integrity Checkpoints, `IntegrityDriftAlert`, OTel export via aip-otel-exporter
Human oversight	Define HITL/HOTL models; set supervisory triggers	`principal.relationship`, `escalation_triggers`, `principal.escalation_contact`
Traceability and identity	Assign unique agent identifiers; tag outputs to responsible agent	`card_id` + `agent_id`, AP-Trace entries linked to `card_id`, AIP checkpoints linked to `agent_id` + `session_id`
Long-term management	Protocols for ongoing monitoring, updates, decommissioning	`expires_at` (card expiry enforces lifecycle review), `CARD_EXPIRED` violation triggers re-evaluation
Trustworthiness and explainability	Explainability tools; trust metrics	AIP `reasoning_summary`, AP-Trace `alternatives_considered` + `selection_reasoning`, AIP `integrity_ratio`
Manual redundancy	Procedures for human takeover of critical cases	`escalation_triggers`, `principal.escalation_contact`, AIP `recommended_action: "deny_and_escalate"`

6. Technical foundations: protocol alignment

6.1 Communication protocols

WEF Protocol	AAP/AIP Relationship
MCP	AAP `extensions` namespace supports MCP-specific metadata
A2A	AAP extends A2A Agent Cards with the `alignment` block
AP2	AAP’s `max_autonomous_value` maps to AP2’s auditable transaction limits
Agent Cards (A2A)	AAP Alignment Card is the A2A agent card plus enforceable alignment posture

6.2 Cybersecurity

WEF Security Concern	AIP Implementation
Prompt injection	AIP concern category: `prompt_injection` — dedicated detection in every Integrity Checkpoint
Agent misuse	AIP concern category: `deceptive_reasoning` + `undeclared_intent`
Zero-trust model	AIP `FailurePolicy.mode: "fail_closed"` — blocks agent on any analysis failure
Audit trails for attribution	AP-Traces + Integrity Checkpoints provide complete forensic record
Identity verification	`/.well-known/alignment-card.json` enables any party to verify agent identity

7. Multi-agent ecosystem risks

The WEF identifies five emerging failure modes in multi-agent ecosystems. AAP/AIP addresses all five:

WEF Risk	AAP/AIP Solution
Orchestration drift	Value Coherence Handshake: Before coordination, agents exchange Alignment Cards and compute compatibility score. If `coherence.compatible` is false, coordination pauses.
Semantic misalignment	Braid grounding protocol: Agents detect semantic divergence via SSM analysis. `values.conflicts_with` pre-declares known semantic conflicts.
Security and trust gaps	Well-known endpoint discovery (zero-trust), AIP `prompt_injection` concern category, AIP fail-closed mode.
Interconnectedness and cascading effects	AIP `IntegrityDriftAlert` with `drift_direction` typing enables early detection. `CARD_MISMATCH` immediately flags identity inconsistencies.
Systemic complexity	AP-Traces with `linked_trace_id` enable cross-agent forensics. AIP provides per-agent integrity windows aggregatable for system-level health.

7.1 Governor agents and the AIP daimonion

The WEF envisions “dedicated ‘governor’ or ‘auditor’ agents” that monitor, audit, or regulate other agents. AIP’s daimonion (integrity analysis service) implements this concept at the protocol level:

WEF Governor Agent Function	AIP Daimonion Implementation
Monitor agent actions	Integrity Checkpoints analyze every thinking block between turns
Validate transactions	Verdicts (`clear`, `review_needed`, `boundary_violation`) per checkpoint
Detect anomalies	Six concern categories: `prompt_injection`, `value_misalignment`, `autonomy_violation`, `reasoning_corruption`, `deceptive_reasoning`, `undeclared_intent`
Correct unsafe behavior	`recommended_action: "deny_and_escalate"` blocks the agent on critical concerns
Scalable oversight	Protocol-level service, not a separate agent — scales with infrastructure

8. Summary mapping tables

8.1 Classification dimensions

WEF Dimension	WEF Agent Card	AAP Alignment Card	Extension
Function	Natural language description	`bounded_actions` + `forbidden_actions`	Machine-parseable, verifiable, includes anti-function
Role	Specialist-Generalist scale	`principal.relationship` + action scope	Prescriptive — affects runtime behavior
Predictability	Deterministic-Non-deterministic scale	AP-Traces + AIP Checkpoints + drift detection	Observable unpredictability with typed drift directions
Autonomy	Low-High scale	Autonomy envelope (actions, triggers, limits)	Decomposed, auditable, enforceable
Authority	Low-High scale	Delegation chain + autonomy envelope + expiry	Verifiable delegation chains
Use Case	Free-text application domain	`values` (declared, definitions, hierarchy, conflicts) + `extensions`	Evaluable values with consistency verification
Environment	Simple-Complex scale	Well-known endpoints + Value Coherence + fail-closed	Zero-trust discoverable, multi-agent compatible

8.2 Pillars and governance

WEF Pillar	WEF Recommendation	AAP/AIP Implementation
Classification	Agent card with 7 dimensions	Alignment Card — JSON schema, well-known endpoint, versioned, expirable
Evaluation	Contextualized, multidimensional, temporal, collaborative	AP-Trace verification + AIP integrity checks + drift detection + OTel export
Risk Assessment	5-step lifecycle	Typed violations with severity + concern categories + drift alerts + graduated response
Progressive Governance	9 baseline mechanisms + HITL/HOTL + proportional scaling	Autonomy envelope + `principal.relationship` + AIP monitoring intensity + fail-open/closed

References

World Economic Forum & Capgemini. AI Agents in Action: Foundations for Evaluation and Governance. November 2025.
AAP Specification
AIP Specification
Mitchell, M., Wu, S., Zaldivar, A., et al. Model Cards for Model Reporting. FAT* ‘19, 2019.

​WEF agent governance framework

​How AAP operationalizes the World Economic Forum’s agent governance framework

​1. The WEF framework architecture

​1.1 Report structure

​1.2 Four foundational pillars

​1.3 Provider vs. adopter perspectives

​2. Classification: dimension-by-dimension mapping

​2.1 Function

​2.2 Role

​2.3 Predictability

​2.4 Autonomy

​2.5 Authority

​2.6 use case

​2.7 Environment

​3. Evaluation: metrics and evidence

​3.1 Evaluation principles

​3.2 Evaluation metrics

​3.3 Audit logs

​4. Risk assessment: lifecycle mapping

​4.1 Violation severity as risk taxonomy

​5. Progressive governance: mechanisms mapping

​5.1 Human oversight models

​5.2 Baseline governance mechanisms

​6. Technical foundations: protocol alignment

​6.1 Communication protocols

​6.2 Cybersecurity

​7. Multi-agent ecosystem risks

​7.1 Governor agents and the AIP daimonion

​8. Summary mapping tables

​8.1 Classification dimensions

​8.2 Pillars and governance

​References

WEF agent governance framework

How AAP operationalizes the World Economic Forum’s agent governance framework

1. The WEF framework architecture

1.1 Report structure

1.2 Four foundational pillars

1.3 Provider vs. adopter perspectives

2. Classification: dimension-by-dimension mapping

2.1 Function

2.2 Role

2.3 Predictability

2.4 Autonomy

2.5 Authority

2.6 use case

2.7 Environment

3. Evaluation: metrics and evidence

3.1 Evaluation principles

3.2 Evaluation metrics

3.3 Audit logs

4. Risk assessment: lifecycle mapping

4.1 Violation severity as risk taxonomy

5. Progressive governance: mechanisms mapping

5.1 Human oversight models

5.2 Baseline governance mechanisms

6. Technical foundations: protocol alignment

6.1 Communication protocols

6.2 Cybersecurity

7. Multi-agent ecosystem risks

7.1 Governor agents and the AIP daimonion

8. Summary mapping tables

8.1 Classification dimensions

8.2 Pillars and governance

References