Skip to main content

WEF Agent Governance Framework

How AAP Operationalizes the World Economic Forum’s Agent Governance Framework

In November 2025, the World Economic Forum and Capgemini published AI Agents in Action: Foundations for Evaluation and Governance, introducing a structured framework for classifying, evaluating, assessing risk, and governing AI agents. The report’s central artifact is the agent card — a structured description of an agent’s capabilities, behavior, and operational context, inspired by Model Cards for Model Reporting (Mitchell et al., 2019). The report proposes seven classification dimensions, a multi-metric evaluation methodology, a five-step risk assessment lifecycle, nine baseline governance mechanisms, and a progressive governance model that scales oversight with agent capability. The Agent Alignment Protocol (AAP) and Agent Integrity Protocol (AIP) implement what the WEF report recommends. AAP’s Alignment Card is a machine-readable, protocol-level artifact that maps to all seven WEF classification dimensions and extends them with enforceable behavioral contracts, auditable decision trails, and multi-agent compatibility verification. AIP provides the continuous monitoring infrastructure the WEF calls for at every governance level.
Key distinction: The WEF agent card describes an agent. The AAP Alignment Card binds it. The WEF tells organizations what to ask about their agents. AAP provides the machine-readable, verifiable answers. AIP provides the continuous assurance that those answers remain true at runtime.

1. The WEF Framework Architecture

The WEF report structures responsible agent deployment around three major sections and four foundational pillars.

1.1 Report Structure

WEF SectionContentAAP/AIP Relevance
Section 1: Technical Foundations3-layer agent architecture (Application, Orchestration, Reasoning), protocols (MCP, A2A, AP2), cybersecurityAAP extends A2A agent cards; AIP addresses prompt injection and zero-trust
Section 2: Evaluation and GovernanceClassification dimensions, evaluation criteria, risk assessment lifecycle, progressive governanceAlignment Card (classification), AP-Traces (evaluation), violation typing (risk), autonomy envelope (governance)
Section 3: Multi-Agent EcosystemsEmerging risks, failure modes, governor agents, trust frameworksValue Coherence Handshake, Braid grounding, AIP daimonion

1.2 Four Foundational Pillars

PillarWEF PurposeAAP/AIP Implementation
ClassificationEstablish agent characteristics and operational contextAlignment Card — JSON-schema-validated, well-known endpoint, versionable, expirable
EvaluationGenerate evidence of performance and limitationsAP-Trace verification, AIP Integrity Checkpoints, drift detection
Risk AssessmentAnalyse potential harm using classification and evaluationTyped violation severities (FORBIDDEN_ACTION through CARD_MISMATCH), concern categories
Progressive GovernanceScale oversight proportionally to capability and contextAutonomy envelope + principal.relationship + AIP monitoring intensity + fail-open/fail-closed

1.3 Provider vs. Adopter Perspectives

The WEF report distinguishes two stakeholder perspectives that shape how the framework is applied. AAP addresses both:
WEF PerspectiveWEF ResponsibilityAAP/AIP Role
ProviderBuild responsibly, supply documentation, ensure ethical guidelinesThe Alignment Card is the provider’s documentation artifact — structured, versioned, served at /.well-known/alignment-card.json
AdopterProcure responsibly, deploy safely, ensure organizational complianceAP-Trace verification and AIP monitoring give adopters independent assurance that provider claims hold in production

2. Classification: Dimension-by-Dimension Mapping

The WEF’s classification pillar introduces seven dimensions, organized into Agent Characteristics (dimensions 1-5) and Operational Context (dimensions 6-7). The agent card is the primary artifact.

2.1 Function

WEF definition: What task is the agent designed to perform? The Alignment Card’s bounded_actions array declares the agent’s permitted functions as an explicit, machine-parseable list. Where the WEF asks organizations to describe function in natural language, AAP requires it as structured data that can be verified against observed behavior.
WEF ConceptAAP FieldType
Agent function/taskautonomy_envelope.bounded_actionsString array
Function constraintsautonomy_envelope.forbidden_actionsString array
The WEF describes function; AAP also describes anti-function — what the agent must never do, regardless of context. The forbidden_actions field has no WEF equivalent. A violation of forbidden_actions generates a FORBIDDEN_ACTION violation at CRITICAL severity.

2.2 Role

WEF definition: Is the agent specialized (narrow task) or generalist (broad capabilities)?
WEF ConceptAAP FieldValues
Specialist vs. generalistbounded_actions array lengthNarrow (few actions) vs. broad (many)
Operational roleprincipal.relationshipdelegated_authority, advisory, autonomous
The WEF’s role dimension is descriptive. AAP’s principal.relationship field is prescriptive — it determines how the agent should behave when it encounters uncertainty. An advisory agent recommends and waits. A delegated_authority agent acts within bounds. An autonomous agent operates within declared values.

2.3 Predictability

WEF definition: Is the agent deterministic or non-deterministic? The WEF explicitly identifies “behavioural drift” as a novel risk that traditional governance cannot manage.
WEF ConceptAAP/AIP FieldFunction
Behavioral predictabilityaudit_commitment.trace_formatStructured logging of non-deterministic decisions
Non-deterministic monitoringAIP Integrity CheckpointsReal-time analysis of thinking blocks between turns
Behavioral change over timeAIP IntegrityDriftAlertCross-session behavioral divergence detection
Tamper evidenceaudit_commitment.tamper_evidenceappend_only, signed, or merkle trail integrity
AAP and AIP assume non-determinism is the default and provide infrastructure to observe it. The question shifts from “is it predictable?” to “is its unpredictability observable and characterized?“

2.4 Autonomy

WEF definition: The degree of independent planning, decision-making, and action. This is the most direct mapping. AAP’s autonomy envelope is a formal, machine-readable specification of exactly what the WEF means by “autonomy level.”
WEF ConceptAAP FieldFunction
Autonomy levelautonomy_envelope (composite)Complete autonomy specification
What agent can do independentlyautonomy_envelope.bounded_actionsPermitted autonomous actions
When agent must stop and askautonomy_envelope.escalation_triggersCondition-based escalation rules
Financial limits on autonomyautonomy_envelope.max_autonomous_valueCurrency-denominated ceiling
Who to escalate toprincipal.escalation_contactEndpoint for escalation notifications
Real-time calibrationAIP recommended_actioncontinue, log_and_continue, pause_for_review, deny_and_escalate

2.5 Authority

WEF definition: The actions an agent is permitted to take, from read-only access to full administrative control.
WEF ConceptAAP FieldFunction
System permissionsautonomy_envelope.bounded_actionsWhat the agent is permitted to do
Permission boundariesautonomy_envelope.forbidden_actionsHard limits regardless of context
Data access scopeautonomy_envelope.escalation_triggersConditions that constrain data access
Delegation chainprincipal.type + principal.relationshipWho delegated authority and how
Permission expiryexpires_atAuthority has a time limit
Authority verificationverify_trace returns UNBOUNDED_ACTIONDetects actions outside granted authority
AAP adds verifiable delegation chains. When principal.type is "agent", the card records that authority was delegated from another agent, enabling accountability tracing through multi-agent workflows.

2.6 Use Case

WEF definition: The specific application domain and environment where the agent performs its function.
WEF ConceptAAP FieldFunction
Application domainvalues.declaredDomain-specific values
Domain constraintsvalues.conflicts_withValues the agent explicitly rejects
Value definitionsvalues.definitionsMaps each value ID to name, description, priority
Value hierarchyvalues.hierarchylexicographic, weighted, or contextual resolution
Domain-specific extensionsextensionsProtocol-specific or domain-specific metadata

2.7 Environment

WEF definition: Operational environment complexity — simple, complex, or multi-system.
WEF ConceptAAP FieldFunction
Single-system vs. multi-systemA2A Agent Card alignment blockAAP extends A2A for cross-system use
External system interactions/.well-known/alignment-card.jsonDiscoverable card for any system to retrieve
Zero-trust assumptionsAIP fail-closed modeBlock agent on analysis failure in high-security environments
Cross-agent coordinationValue Coherence HandshakePre-coordination compatibility check
Environment observabilityAIP window_summaryRolling integrity statistics

3. Evaluation: Metrics and Evidence

The WEF’s Evaluation pillar establishes four evaluation principles and specific performance metrics.

3.1 Evaluation Principles

WEF Evaluation PrincipleWEF DescriptionAAP/AIP Implementation
ContextualizationReflect the tools, workflows, and edge cases the agent will encounterAP-Traces record context for each decision — actual operational conditions
Multidimensional assessmentDefine success across accuracy, robustness, latency, compliance, trustverify_trace produces multi-dimensional results: violation counts by type and severity
Temporal and behavioural monitoringTrack performance over time to detect regressionsAIP IntegrityDriftAlert with integrity_similarity ratio and sustained_checks count
Provider-deployer collaborationTransparent documentation enables deployers to validate reliabilityAlignment Card at /.well-known/ is the transparent documentation

3.2 Evaluation Metrics

WEF MetricAAP/AIP Evidence Source
Task success rateAP-Trace verify_trace — ratio of traces with zero violations to total traces
Task completion timeAP-Trace timestamps enable latency analysis
Error typesTyped violations: FORBIDDEN_ACTION, UNBOUNDED_ACTION, MISSED_ESCALATION, UNDECLARED_VALUE, CARD_EXPIRED, CARD_MISMATCH
Tool call successAP-Trace action field logs tool invocations; verification flags UNBOUNDED_ACTION for unauthorized tool use
Edge case robustnessAIP concern categories — reasoning_corruption and autonomy_violation surface edge case failures
Trust indicatorsAIP integrity_ratio — a quantitative trust metric
CapabilitiesAlignment Card bounded_actions declares capabilities; AP-Traces verify they match observed behavior

3.3 Audit Logs

WEF Audit RequirementAAP FieldImplementation
Structured recordsaudit_commitment.trace_format"ap-trace-v1" — standardized, schema-validated
Retention policyaudit_commitment.retention_daysExplicit retention period
Queryable logsaudit_commitment.queryable + query_endpointAPI-accessible trace history
Tamper resistanceaudit_commitment.tamper_evidenceappend_only, signed, or merkle
Rationale captureAP-Trace alternatives_considered + selection_reasoningWhy the agent chose what it chose

4. Risk Assessment: Lifecycle Mapping

The WEF’s Risk Assessment pillar proposes a five-step lifecycle. AAP/AIP provides tooling at each step:
WEF StepWEF ObjectiveAAP/AIP Tooling
1. Define contextEstablish scope, boundaries, criteriaAlignment Card defines identity, values, autonomy bounds
2. Identify risksIdentify potential risksforbidden_actions pre-declares known risks; values.conflicts_with declares value-level risks
3. Analyse risksAssess probability and impactverify_trace produces violation counts by type and severity; AIP surfaces drift_direction
4. Evaluate risksRank and prioritize risksAAP violation severities (CRITICAL, HIGH, MEDIUM, LOW) provide built-in risk ranking
5. Manage risksImplement response actionsAIP recommended_action implements graduated response

4.1 Violation Severity as Risk Taxonomy

Violation TypeSeverityWEF Risk CategoryWEF Governance Area
FORBIDDEN_ACTIONCRITICALAuthority violation, safety hazardAccess control
CARD_MISMATCHCRITICALIdentity/integrity failureTraceability and identity
UNBOUNDED_ACTIONHIGHAutonomy overreachAccess control, Human oversight
MISSED_ESCALATIONHIGHGovernance failureHuman oversight
CARD_EXPIREDHIGHLifecycle management failureLong-term management
UNDECLARED_VALUEMEDIUMValue misalignmentTrustworthiness and explainability

5. Progressive Governance: Mechanisms Mapping

The WEF’s Progressive Governance pillar proposes that governance scales with agent capability. It identifies nine baseline governance mechanisms and two human oversight models (HITL/HOTL).

5.1 Human Oversight Models

WEF ModelWEF DefinitionAAP FieldAAP Behavior
HITL (Human-in-the-loop)Agent suggests or prepares actions; final decisions require explicit human approvalprincipal.relationship: "advisory"Agent recommends and waits; escalation is the default
HOTL (Human-on-the-loop)Agent acts within defined boundaries; humans monitor and retain ability to interveneprincipal.relationship: "delegated_authority"Agent acts within bounded_actions; escalation on trigger conditions
(Implied high-autonomy)Agent operates independently within declared value frameworkprincipal.relationship: "autonomous"Agent operates within values; AIP heightened monitoring
AIP implements progressive monitoring intensity based on this mapping: advisory agents receive standard monitoring, delegated_authority agents receive standard monitoring with escalation enforcement, and autonomous agents receive heightened monitoring with deeper conscience evaluation.

5.2 Baseline Governance Mechanisms

WEF Governance AreaWEF MechanismAAP/AIP Implementation
Access controlEnforce least-privilege access; define task boundariesbounded_actions (permitted), forbidden_actions (denied), max_autonomous_value (financial ceiling)
Legal and complianceData protection impact assessments; privacy and regulation compliancevalues.declared encodes compliance values; extensions namespace for regulatory metadata; audit_commitment enables DPIA evidence
Testing and validationSandbox runs, controlled pilots, third-party auditsverify_trace against Alignment Card is the validation engine; AIP input analysis acts as input filter
Monitoring and loggingLogging for all agent actions; anomaly alerts and dashboardsAP-Traces, AIP Integrity Checkpoints, IntegrityDriftAlert, OTel export via aip-otel-exporter
Human oversightDefine HITL/HOTL models; set supervisory triggersprincipal.relationship, escalation_triggers, principal.escalation_contact
Traceability and identityAssign unique agent identifiers; tag outputs to responsible agentcard_id + agent_id, AP-Trace entries linked to card_id, AIP checkpoints linked to agent_id + session_id
Long-term managementProtocols for ongoing monitoring, updates, decommissioningexpires_at (card expiry enforces lifecycle review), CARD_EXPIRED violation triggers re-evaluation
Trustworthiness and explainabilityExplainability tools; trust metricsAIP reasoning_summary, AP-Trace alternatives_considered + selection_reasoning, AIP integrity_ratio
Manual redundancyProcedures for human takeover of critical casesescalation_triggers, principal.escalation_contact, AIP recommended_action: "deny_and_escalate"

6. Technical Foundations: Protocol Alignment

6.1 Communication Protocols

WEF ProtocolAAP/AIP Relationship
MCPAAP extensions namespace supports MCP-specific metadata
A2AAAP extends A2A Agent Cards with the alignment block
AP2AAP’s max_autonomous_value maps to AP2’s auditable transaction limits
Agent Cards (A2A)AAP Alignment Card is the A2A agent card plus enforceable alignment posture

6.2 Cybersecurity

WEF Security ConcernAIP Implementation
Prompt injectionAIP concern category: prompt_injection — dedicated detection in every Integrity Checkpoint
Agent misuseAIP concern category: deceptive_reasoning + undeclared_intent
Zero-trust modelAIP FailurePolicy.mode: "fail_closed" — blocks agent on any analysis failure
Audit trails for attributionAP-Traces + Integrity Checkpoints provide complete forensic record
Identity verification/.well-known/alignment-card.json enables any party to verify agent identity

7. Multi-Agent Ecosystem Risks

The WEF identifies five emerging failure modes in multi-agent ecosystems. AAP/AIP addresses all five:
WEF RiskAAP/AIP Solution
Orchestration driftValue Coherence Handshake: Before coordination, agents exchange Alignment Cards and compute compatibility score. If coherence.compatible is false, coordination pauses.
Semantic misalignmentBraid grounding protocol: Agents detect semantic divergence via SSM analysis. values.conflicts_with pre-declares known semantic conflicts.
Security and trust gapsWell-known endpoint discovery (zero-trust), AIP prompt_injection concern category, AIP fail-closed mode.
Interconnectedness and cascading effectsAIP IntegrityDriftAlert with drift_direction typing enables early detection. CARD_MISMATCH immediately flags identity inconsistencies.
Systemic complexityAP-Traces with linked_trace_id enable cross-agent forensics. AIP provides per-agent integrity windows aggregatable for system-level health.

7.1 Governor Agents and the AIP Daimonion

The WEF envisions “dedicated ‘governor’ or ‘auditor’ agents” that monitor, audit, or regulate other agents. AIP’s daimonion (integrity analysis service) implements this concept at the protocol level:
WEF Governor Agent FunctionAIP Daimonion Implementation
Monitor agent actionsIntegrity Checkpoints analyze every thinking block between turns
Validate transactionsVerdicts (clear, review_needed, boundary_violation) per checkpoint
Detect anomaliesSix concern categories: prompt_injection, value_misalignment, autonomy_violation, reasoning_corruption, deceptive_reasoning, undeclared_intent
Correct unsafe behaviorrecommended_action: "deny_and_escalate" blocks the agent on critical concerns
Scalable oversightProtocol-level service, not a separate agent — scales with infrastructure

8. Summary Mapping Tables

8.1 Classification Dimensions

WEF DimensionWEF Agent CardAAP Alignment CardExtension
FunctionNatural language descriptionbounded_actions + forbidden_actionsMachine-parseable, verifiable, includes anti-function
RoleSpecialist-Generalist scaleprincipal.relationship + action scopePrescriptive — affects runtime behavior
PredictabilityDeterministic-Non-deterministic scaleAP-Traces + AIP Checkpoints + drift detectionObservable unpredictability with typed drift directions
AutonomyLow-High scaleAutonomy envelope (actions, triggers, limits)Decomposed, auditable, enforceable
AuthorityLow-High scaleDelegation chain + autonomy envelope + expiryVerifiable delegation chains
Use CaseFree-text application domainvalues (declared, definitions, hierarchy, conflicts) + extensionsEvaluable values with consistency verification
EnvironmentSimple-Complex scaleWell-known endpoints + Value Coherence + fail-closedZero-trust discoverable, multi-agent compatible

8.2 Pillars and Governance

WEF PillarWEF RecommendationAAP/AIP Implementation
ClassificationAgent card with 7 dimensionsAlignment Card — JSON schema, well-known endpoint, versioned, expirable
EvaluationContextualized, multidimensional, temporal, collaborativeAP-Trace verification + AIP integrity checks + drift detection + OTel export
Risk Assessment5-step lifecycleTyped violations with severity + concern categories + drift alerts + graduated response
Progressive Governance9 baseline mechanisms + HITL/HOTL + proportional scalingAutonomy envelope + principal.relationship + AIP monitoring intensity + fail-open/closed

References

  1. World Economic Forum & Capgemini. AI Agents in Action: Foundations for Evaluation and Governance. November 2025.
  2. AAP Specification
  3. AIP Specification
  4. Mitchell, M., Wu, S., Zaldivar, A., et al. Model Cards for Model Reporting. FAT* ‘19, 2019.