Mnemom Trust Rating - Mnemom Docs

Overview

The Mnemom Trust Rating™ is a composite trust metric for AI agents — the equivalent of a credit score, but for autonomous software. It answers a question no other system answers: Based on independently verified behavioral evidence, how trustworthy is this agent? Unlike self-reported trust claims or capability benchmarks, the Mnemom Trust Rating is:

Independently verified — Scores are computed from AIP integrity checkpoints, not self-assessments
Continuous — Updated hourly from live behavioral data, not point-in-time audits
Transparent — Every component, weight, and data source is published and inspectable
Cryptographically provable — The underlying checkpoints are signed and Merkle-attested
Multi-dimensional — Five weighted components capture different aspects of trustworthiness

Mnemom Trust Ratings power trust decisions across the AI agent ecosystem: pre-interaction trust checks, fleet governance policies, compliance reporting, marketplace listings, and inter-agent delegation via A2A.

The Mnemom Trust Rating requires a minimum of 50 analyzed integrity checkpoints before a public score is published. This minimum prevents gaming through selective checkpoint submission and ensures statistical significance.

Checkpoints where the thinking block contains fewer than 100 tokens receive a synthetic clear verdict and are excluded from the analyzed checkpoint count. This means an agent’s total checkpoint count may differ from its analyzed checkpoint count — only substantive thinking analysis counts toward the 50-checkpoint eligibility minimum and the Integrity Ratio calculation.

Score range and grades

Scores range from 0 to 1000 and map to letter grades inspired by bond credit ratings. Higher scores indicate stronger demonstrated trustworthiness.

Grade	Score Range	Tier	Meaning
AAA	900 — 1000	Exemplary	Consistently demonstrates exceptional alignment. Highest tier of independently verified trust.
AA	800 — 899	Established	Strong track record with minimal violations. Trusted for high-stakes autonomous operations.
A	700 — 799	Reliable	Solid behavioral record with occasional minor concerns. Suitable for standard autonomous tasks.
BBB	600 — 699	Developing	Building a track record. Some violations or drift events, but trending positively.
BB	500 — 599	Emerging	Limited or mixed track record. More data needed for confident assessment.
B	400 — 499	Concerning	Elevated violation rate or significant drift. Active remediation recommended.
CCC	200 — 399	Critical	Serious integrity concerns. Human oversight strongly recommended for all operations.
NR	—	Not Rated	Fewer than 50 analyzed checkpoints. Score is being built.

The grade system is deliberately conservative. An “AAA” rating is rare and requires sustained exemplary behavior across all five components. Most well-maintained agents stabilize in the A to AA range.

Score components

The composite score is a weighted sum of five normalized components, each scored 0 — 1000:

S = w_1 * integrity_ratio + w_2 * compliance + w_3 * drift_stability
  + w_4 * trace_completeness + w_5 * coherence_compatibility

Integrity ratio (40%)

The dominant component. Measures the proportion of integrity checkpoints that returned a clear verdict.

integrity_ratio = (clear_count / analyzed_count) * 1000

Only protocol-verified AIP checkpoints are counted. Unanalyzed checkpoints (those below the 100-token minimum evidence threshold) are excluded from both numerator and denominator.

Data source: AIP checkpoints
Update frequency: Every new checkpoint
Improvement lever: Fix alignment card violations, resolve review_needed checkpoints

Compliance (20%)

Measures boundary adherence using a session-capped power curve. Violations are grouped by session — only the worst violation per session counts — then scored with a 1-week half-life decay. This prevents a single bad session (e.g., a busted alignment card producing multiple false-positive violations) from catastrophically cratering the score.

score = 1000 / (1 + Σ max_impact_per_session)^1.5

The power curve degrades meaningfully but never craters to zero from moderate violation counts, preserving signal differentiation across the full range. After reclassification, violations classified as card_gap are excluded from the compliance calculation. Only behavior_gap violations (genuine behavioral errors) count. This enables trust recovery when violations were caused by configuration drift rather than agent misbehavior.

Data source: AIP checkpoints with boundary_violation verdict, grouped by session_id
Update frequency: Hourly
Improvement lever: Time without new violations; resolve root causes to prevent recurrence; reclassify card-gap violations

Drift stability (20%)

Measures the proportion of sessions where no sustained drift alert was triggered.

drift_stability = (stable_sessions / total_sessions) * 1000

A session is “stable” if the agent maintained consistent behavioral similarity to its baseline throughout the session. Sessions where 3+ consecutive checkpoints fell below the drift threshold are counted as unstable.

Data source: AIP drift detection
Update frequency: End of each session
Improvement lever: Consistent values application, avoid aggressive behavioral pattern changes

Trace completeness (10%)

Measures audit trail quality — the ratio of decisions that were logged as AP-Traces versus expected decisions.

trace_completeness = (logged_decisions / expected_decisions) * 1000

Agents that log every decision create a complete audit trail. Gaps in trace coverage reduce this component.

Data source: AAP traces
Update frequency: Hourly
Improvement lever: Ensure all agent decisions are logged via AAP

Coherence compatibility (10%)

The mean coherence score across fleet interactions, normalized to the 0 — 1000 scale.

coherence_compatibility = mean_coherence_score * 1000

Defaults to 750 if the agent has no fleet interaction data. This default ensures new agents are not penalized for operating solo.

Data source: Fleet coherence analysis
Update frequency: After each coherence check
Improvement lever: Align values with fleet peers, resolve value conflicts

Confidence levels

The number of analyzed checkpoints determines the confidence level displayed alongside the score:

Confidence	Checkpoint Count	Display
Insufficient	< 50	Score not published (NR grade)
Low	≥50 and <200	”Low Confidence” indicator
Medium	≥200 and <1,000	”Medium Confidence” indicator
High	≥1,000	”High Confidence” indicator

Confidence communicates how much behavioral evidence supports the score. A score of 800 with “Low Confidence” (60 checkpoints) carries less weight than a score of 780 with “High Confidence” (2,000 checkpoints).

The 50-checkpoint minimum is a hard gate for score publication. Agents below this threshold display an “NR” (Not Rated) badge with a progress indicator showing how many checkpoints remain until eligibility.

Score computation

Frequency

Hourly recomputation: Scores are recalculated every hour from the latest behavioral data
Weekly snapshots: A frozen snapshot is saved each week for historical trend tracking. Snapshots are created every Monday at 00:00 UTC.
On-demand: The API always returns the most recently computed score

Anti-gaming measures

The scoring methodology includes several protections against manipulation:

Minimum checkpoint count — 50 analyzed checkpoints required before score publication prevents agents from cherry-picking favorable interactions
Synthetic checkpoint detection — Checkpoints with abnormally uniform patterns or machine-generated timing signatures are flagged and excluded
Rate limiting — Checkpoint submission rates are monitored per agent; anomalous bursts are throttled
Independent verification — All checkpoints are analyzed by a separate LLM (the daimonion), not the agent itself

Trend tracking

Every ReputationScore includes a trend_30d field — a signed delta comparing the current score to 30 days ago. This enables at-a-glance trajectory assessment:

Positive trend (+): Score is improving
Negative trend (-): Score is declining
Flat trend (0): Score is stable

SDK usage

TypeScript

import { fetchReputation, fetchReputationHistory } from '@mnemom/reputation';
import { scoreToGrade, confidenceLevel, confidenceLabel } from '@mnemom/reputation';

// Get current reputation for any agent
const reputation = await fetchReputation('agent-xyz');

if (reputation) {
  console.log(`Score: ${reputation.score}`);
  console.log(`Grade: ${reputation.grade} (${reputation.tier})`);
  console.log(`Confidence: ${confidenceLabel(reputation.confidence)}`);
  console.log(`30-day trend: ${reputation.trend_30d > 0 ? '+' : ''}${reputation.trend_30d}`);

  // Inspect components
  for (const component of reputation.components) {
    console.log(`  ${component.label}: ${component.score}/1000 (weight: ${component.weight})`);
  }
}

// Get weekly history for trend analysis
const history = await fetchReputationHistory('agent-xyz');
for (const snapshot of history) {
  console.log(`${snapshot.week_start}: ${snapshot.score} (${snapshot.grade})`);
}

Python

import httpx

API_BASE = "https://api.mnemom.ai"

# Get current reputation for any agent (public endpoint)
response = httpx.get(f"{API_BASE}/v1/reputation/agent-xyz")
reputation = response.json()

print(f"Score: {reputation['score']}")
print(f"Grade: {reputation['grade']} ({reputation['tier']})")
print(f"Confidence: {reputation['confidence']}")
print(f"30-day trend: {reputation['trend_30d']:+d}")

for component in reputation["components"]:
    print(f"  {component['label']}: {component['score']}/1000 (weight: {component['weight']})")

# Get weekly history
history_response = httpx.get(f"{API_BASE}/v1/reputation/agent-xyz/history")
for snapshot in history_response.json()["snapshots"]:
    print(f"{snapshot['week_start']}: {snapshot['score']} ({snapshot['grade']})")

API reference

The primary endpoint for fetching reputation data:

GET /v1/reputation/{agent_id}

No authentication required. Returns the full score with all components. Response:

{
  "agent_id": "agent-xyz",
  "score": 782,
  "grade": "A",
  "tier": "Reliable",
  "is_eligible": true,
  "checkpoint_count": 347,
  "confidence": "medium",
  "components": [
    {
      "key": "integrity_ratio",
      "label": "Integrity Ratio",
      "score": 920,
      "weight": 0.40,
      "weighted_score": 368,
      "factors": ["97.2% clear verdict rate across 347 checkpoints"]
    },
    {
      "key": "compliance",
      "label": "Compliance",
      "score": 850,
      "weight": 0.20,
      "weighted_score": 170,
      "factors": ["3 violations across 2 sessions (session-capped), effective impact: 0.52"]
    },
    {
      "key": "drift_stability",
      "label": "Drift Stability",
      "score": 700,
      "weight": 0.20,
      "weighted_score": 140,
      "factors": ["2 drift events across 28 sessions"]
    },
    {
      "key": "trace_completeness",
      "label": "Trace Completeness",
      "score": 650,
      "weight": 0.10,
      "weighted_score": 65,
      "factors": ["65% of expected decisions logged"]
    },
    {
      "key": "coherence_compatibility",
      "label": "Coherence Compatibility",
      "score": 390,
      "weight": 0.10,
      "weighted_score": 39,
      "factors": ["Mean coherence 0.39 across 3 fleet interactions"]
    }
  ],
  "computed_at": "2026-02-21T14:00:00.000Z",
  "trend_30d": 12,
  "visibility": "public"
}

For the complete API reference including batch, search, compare, and benchmark endpoints, see the Reputation API Overview.

Use cases

Pre-interaction trust checks

Before delegating a task to another agent via A2A, check their reputation:

const reputation = await fetchReputation(theirAgentId);

if (!reputation || reputation.grade === 'CCC' || reputation.grade === 'NR') {
  return escalateToPrincipal({
    reason: `Agent ${theirAgentId} has insufficient trust rating (${reputation?.grade ?? 'NR'})`,
  });
}

// Proceed with delegation
return executeDelegation(theirCard, task);

Fleet governance

Set minimum reputation requirements for agents in your organization:

# Enforce minimum A grade for production agents
for agent in org_agents:
    rep = httpx.get(f"{API_BASE}/v1/reputation/{agent['id']}").json()
    if rep["score"] < 700:
        print(f"WARNING: {agent['id']} below production threshold ({rep['score']})")
        pause_agent(agent["id"])

Compliance reporting

Export weekly snapshots for audit trails:

const history = await fetchReputationHistory(agentId);

// Generate compliance report
const report = history.map(snapshot => ({
  week: snapshot.week_start,
  score: snapshot.score,
  grade: snapshot.grade,
  checkpoints: snapshot.checkpoint_count,
  components: snapshot.components,
}));

Marketplace listings

Display reputation badges on agent directories, npm packages, and documentation sites. See Embeddable Badges for embed code.

Cryptographic verification

Every Mnemom Trust Rating is backed by a cryptographic proof chain that enables independent verification without trusting Mnemom’s infrastructure. The verification endpoint at GET /v1/reputation/{agent_id}/verify returns the full proof chain.

Proof chain structure

The proof chain consists of three layers that independently attest to score integrity:

┌──────────────────────────────────────────┐
│            Certificate Hash               │
│  SHA-256 of the latest IntegrityCertificate│
│  covering this agent's checkpoints        │
├──────────────────────────────────────────┤
│              Merkle Root                  │
│  Root hash of the Merkle tree over all    │
│  analyzed checkpoints                     │
├──────────────────────────────────────────┤
│          Hash Chain Validation            │
│  Consecutive checkpoint hashes linked     │
│  in a tamper-evident chain                │
└──────────────────────────────────────────┘

Certificate hash — The SHA-256 hash of the IntegrityCertificate that covers the agent’s checkpoint history. This certificate is Ed25519-signed by the Mnemom attestation key and can be independently verified using the public key from the key registry. Merkle root — The root of a Merkle tree constructed over all analyzed checkpoints. Any individual checkpoint can be verified for inclusion using a Merkle inclusion proof without revealing other checkpoints in the tree. Hash chain validation — Each checkpoint includes a hash of the previous checkpoint, forming a tamper-evident chain. If any checkpoint is modified or removed, the chain breaks. The hash_chain_valid field confirms the entire chain is intact.

Verifying a score

# 1. Fetch the verification proof
curl https://api.mnemom.ai/v1/reputation/agent-xyz/verify

# 2. Cross-reference the certificate hash
curl https://api.mnemom.ai/v1/checkpoints/{latest_checkpoint_id}/certificate

# 3. Verify the Merkle root against the agent's checkpoint tree
curl https://api.mnemom.ai/v1/agents/agent-xyz/merkle-root

The verification endpoint is public and requires no authentication. Third parties (auditors, compliance officers, delegating agents) can independently confirm that a reputation score is genuine without any privileged access.

Verification confirms that the score was computed from authentic, tamper-evident data. It does not guarantee the scoring algorithm itself is correct — for that, see the published scoring methodology.

A2A trust extension

The reputation API includes a pre-built trust block for inter-agent reputation sharing via A2A. When fetching an agent’s reputation, the response includes an a2a_trust_extension field that can be directly embedded in an A2A Agent Card.

How it works

Agent A ──[fetches reputation]──→ Mnemom API
                                      │
                                      ▼
                            a2a_trust_extension
                                      │
Agent A ──[embeds trust block]──→ A2A Agent Card
                                      │
Agent B ──[reads trust block]──→ Delegation decision
                                      │
Agent B ──[verifies via verify_url]──→ Independent confirmation

When Agent A fetches its own reputation from GET /v1/reputation/{agent_id}, the response includes:

{
  "a2a_trust_extension": {
    "provider": "mnemom",
    "score": 782,
    "grade": "A",
    "verified_url": "https://api.mnemom.ai/v1/reputation/agent-xyz",
    "badge_url": "https://api.mnemom.ai/v1/reputation/agent-xyz/badge.svg",
    "verify_url": "https://api.mnemom.ai/v1/reputation/agent-xyz/verify"
  }
}

Agent A embeds this as the trust block in its A2A Agent Card. When Agent B discovers Agent A through A2A, it can:

Read the static score and grade for a quick trust check
Fetch verified_url for the latest real-time score
Fetch verify_url for cryptographic proof that the score is genuine

SDK helpers

Both SDKs provide helpers to fetch and embed the trust extension: TypeScript:

import { getA2AReputationExtension } from '@mnemom/reputation';

const trustBlock = await getA2AReputationExtension('my-agent-id');
agentCard.trust = trustBlock;

Python:

from mnemom_reputation import get_a2a_reputation_extension

trust_block = get_a2a_reputation_extension(agent_id="my-agent-id")
agent_card["trust"] = trust_block

The trust block is a snapshot. Agent B SHOULD verify via verified_url or verify_url before making high-stakes delegation decisions. The badge_url always returns the current score for display purposes.

For the full A2A integration guide including value coherence handshakes and reputation gates, see A2A Integration.

Public trust surfaces

Trust Ratings are not just internal metrics — they power public-facing trust signals across the ecosystem.

Public reputation pages

Every agent with a published score gets a public page at:

https://mnemom.ai/agents/{agent_id}/reputation

The page includes full component breakdown, trend chart, grade badge, and a cryptographic verification link. No authentication required — anyone can inspect an agent’s trust history.

Trust directory

A searchable directory of all publicly rated agents at:

https://mnemom.ai/trust-directory

Filter by grade, confidence level, and trend direction. Sort by score or checkpoint count. The directory enables discovery of trusted agents across the ecosystem.

Embeddable badges

Dynamic SVG badges that display an agent’s current Trust Rating anywhere — GitHub READMEs, websites, documentation, A2A Agent Cards, and package registries. Four variants available: score, score_tier, score_trend, and compact. See Embeddable Badges for embed code.

GitHub action

The mnemom/reputation-check@v1 GitHub Action gates CI/CD pipelines on minimum reputation scores. Fail builds when agent scores drop below a configurable threshold — integrating trust checks directly into deployment workflows.

A2A trust extension

Pre-built trust blocks for inter-agent reputation sharing via Google’s A2A protocol. Embed live score, grade, and verification URLs directly in an agent’s A2A Agent Card. See A2A Integration.

How scores differ from alternatives

Dimension	Mnemom Reputation	Self-Reported Trust	Capability Benchmarks
Evidence source	Independently verified behavioral data	Agent’s own claims	Synthetic test suites
Update frequency	Continuous (hourly)	Manual updates	Periodic re-evaluation
Verifiability	Cryptographically provable via Merkle proofs	Unverifiable	Reproducible but narrow
Scope	Alignment, drift, completeness, coherence	Whatever the agent declares	Task-specific accuracy
Gaming resistance	Minimum thresholds, synthetic detection, independent analysis	Trivially gameable	Benchmark contamination
Trend visibility	30-day delta, weekly snapshots	None	Version-to-version only

Team reputation

Teams have their own parallel reputation scoring system — the Team Trust Rating. While individual Trust Ratings measure a single agent’s trustworthiness, the Team Trust Rating evaluates whether a group of agents operates reliably together. Key differences from individual scoring:

Different components: 5-component model optimized for team dynamics (coherence history, member quality, operational record, structural stability, assessment density)
Same grade scale: Teams use the same AAA–NR grades and 0–1000 score range, enabling direct comparison
Lower eligibility bar: 10 team risk assessments (vs. 50 integrity checkpoints for individuals)
One-way dependency: The team’s Member Quality component reads individual Trust Ratings (read-only) — it never modifies individual scores

Trust Recovery

When violations are caused by card gaps (configuration errors) rather than genuine agent misbehavior, scores can be recovered through reclassification. The workflow:

Identify violations caused by missing card capabilities (e.g., agent used a tool correctly but the card didn’t declare it)
Submit a reclassification request marking the violation as card_gap
Amend the alignment card to include the missing capability
Trigger score recomputation — card_gap violations are excluded from Compliance and Drift Stability components
Score recovers on the next hourly recomputation cycle

Trust graph propagation ensures related agents (team members, fleet peers) also benefit from the correction, with a 0.85 decay factor per hop up to 3 hops deep. See the Trust Recovery Guide for step-by-step instructions and the Reclassification API for endpoint details.

On-chain verification

Reputation scores can be anchored on-chain via the MnemoReputationRegistry smart contract on Base L2. On-chain anchoring provides:

Immutability — Published scores cannot be altered after anchoring
Independent verification — Anyone can query the contract directly without trusting Mnemom infrastructure
Tamper evidence — Merkle roots anchored on-chain prove the integrity of the full checkpoint tree

See On-Chain Verification for architecture details and the On-Chain API for publishing endpoints.

​Overview

​Score range and grades

​Score components

​Integrity ratio (40%)

​Compliance (20%)

​Drift stability (20%)

​Trace completeness (10%)

​Coherence compatibility (10%)

​Confidence levels

​Score computation

​Frequency

​Anti-gaming measures

​Trend tracking

​SDK usage

​TypeScript

​Python

​API reference

​Use cases

​Pre-interaction trust checks

​Fleet governance

​Compliance reporting

​Marketplace listings

​Cryptographic verification

​Proof chain structure

​Verifying a score

​A2A trust extension

​How it works

​SDK helpers

​Public trust surfaces

​Public reputation pages

​Trust directory

​Embeddable badges

​GitHub action

​A2A trust extension

​How scores differ from alternatives

​Team reputation

​Trust Recovery

​On-chain verification

​See also

Overview

Score range and grades

Score components

Integrity ratio (40%)

Compliance (20%)

Drift stability (20%)

Trace completeness (10%)

Coherence compatibility (10%)

Confidence levels

Score computation

Frequency

Anti-gaming measures

Trend tracking

SDK usage

TypeScript

Python

API reference

Use cases

Pre-interaction trust checks

Fleet governance

Compliance reporting

Marketplace listings

Cryptographic verification

Proof chain structure

Verifying a score

A2A trust extension

How it works

SDK helpers

Public trust surfaces

Public reputation pages

Trust directory

Embeddable badges

GitHub action

A2A trust extension

How scores differ from alternatives

Team reputation

Trust Recovery

On-chain verification

See also