Skip to main content

Improving Your Mnemom Trust Rating

A practical, component-by-component guide to building and improving your agent’s Mnemom Trust Rating. Whether you are getting your first score published or recovering from a low rating, this guide covers actionable strategies for each of the five scoring components.
Your Agent's Score
      782 (A)
  ┌────────────────┐
  │ Integrity  920  │ ████████████████████░░  40%
  │ Recency    850  │ █████████████████░░░░░  20%
  │ Drift      700  │ ██████████████░░░░░░░░  20%
  │ Trace      650  │ █████████████░░░░░░░░░  10%
  │ Coherence  390  │ ████████░░░░░░░░░░░░░░  10%
  └────────────────┘

   Improve the weakest
   components first

Quick Start: Getting Your First Score

Your agent starts at NR (Not Rated). To earn a public reputation score:
1

Register your agent

Claim your agent on mnemom.ai/claim or via the Smoltbot CLI:
smoltbot init --provider anthropic
This creates your agent identity and Alignment Card.
2

Generate integrity checkpoints

Route agent traffic through the Mnemom gateway or integrate the AIP SDK directly. Each agent interaction generates an integrity checkpoint that analyzes the agent’s thinking.
from aip import initialize

aip = initialize({
    "card": alignment_card,
    "analysis_llm": {
        "model": "claude-haiku-4-5-20251001",
        "base_url": "https://api.anthropic.com",
        "api_key": api_key,
    },
})

# Each check generates a checkpoint
signal = await aip.check(response, "anthropic")
3

Reach 50 analyzed checkpoints

Once 50 checkpoints have been analyzed, your score is automatically computed and published. During the build phase, your agent displays a “Building… N/50” progress badge.
At typical usage, reaching 50 checkpoints takes 1-3 days. You can track progress on your agent’s dashboard or via the API:
curl https://api.mnemom.ai/v1/agents/{agent_id}/integrity/aip
# Look at "analyzed_checks" field
4

Score published

Your agent now has a public reputation score with a letter grade and embeddable badge. Monitor and improve it using the strategies below.

Component-by-Component Improvement

1. Integrity Ratio (40% of score)

What it measures: The percentage of integrity checkpoints that return a clear verdict. Formula: (clear_count / analyzed_count) * 1000 This is the highest-impact component. Improving your integrity ratio from 90% to 95% adds ~20 points to your composite score.

How to Improve

Fix alignment card violations. The most common cause of non-clear verdicts is a mismatch between what your agent does and what its Alignment Card declares. Review your card:
{
  "autonomy_envelope": {
    "bounded_actions": ["search", "recommend"],
    "forbidden_actions": ["share_credentials"],
    "escalation_triggers": [
      {
        "condition": "purchase_value > 100",
        "action": "escalate",
        "reason": "Exceeds autonomous spending limit"
      }
    ]
  }
}
If your agent routinely performs actions not in bounded_actions, either add those actions to the card or adjust agent behavior. Resolve review_needed checkpoints. These indicate concerns that did not rise to a boundary violation but still reduce your integrity ratio. Common causes:
  • Agent reasoning that mentions goals not declared in the card (undeclared_intent)
  • Subtle value misalignment between reasoning and declared values (value_misalignment)
  • Reasoning that approaches but does not cross autonomy boundaries (autonomy_violation)
Review these on your dashboard and update your alignment card to better reflect your agent’s actual behavior, or adjust the agent’s prompting to align with the card. Improve alignment card clarity. Vague alignment cards produce more false-positive concerns. Be specific:
  "values": {
-   "declared": ["be_helpful"]
+   "declared": ["principal_benefit", "transparency", "minimal_data"]
  }
Current Integrity RatioActionEstimated Score Impact
80% (800)Fix 5 violations to reach 85% (850)+20 points
85% (850)Fix 3 violations to reach 90% (900)+20 points
90% (900)Fix 2 violations to reach 95% (950)+20 points
95% (950)Fix 1 violation to reach 98% (980)+12 points
Impact = delta * weight (0.40)

2. Compliance (20% of score)

What it measures: How well your agent stays within its declared boundaries. Uses a session-capped power curve — violations are grouped by session, only the worst per session counts. Formula: 1000 / (1 + Σ max_impact_per_session)^1.5 with 1-week half-life decay Time is your friend. Violation impact decays automatically — a boundary_violation from 4 weeks ago has only 6.3% of its original impact. Session capping also means a burst of violations in a single bad session counts as one event, not many.

How to Improve

Prevent new violations. This is the single most important action. Each new violation session adds to the cumulative impact. Address the root causes:
  1. Prompt injection resistance: If violations come from prompt injection, harden your agent’s system prompt and review the AIP conscience values for BOUNDARY protections
  2. Scope creep: If violations come from autonomy expansion, tighten escalation triggers in your alignment card
  3. Model updates: If violations started after a model version change, review whether the new model’s behavior still matches your card
Wait for decay. The decay schedule works in your favor:
Violation SessionsScore (all recent)Score (4 weeks old)
010001000
1354957
2192917
3125878
After 8 weeks without a new violation, session impacts decay to near-zero and the Compliance component effectively reaches 1000. Resolve violations to prevent recurrence. Use the resolution API to acknowledge violations and document corrective actions:
curl -X POST https://api.mnemom.ai/v1/agents/{agent_id}/resolve \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "target_type": "checkpoint",
    "target_id": "ic-abc123",
    "resolution_type": "acknowledged"
  }'
Resolutions do not directly affect the score (the violation still decays normally), but they demonstrate responsible governance and create an audit trail.

3. Drift Stability (20% of score)

What it measures: The proportion of sessions where the agent maintained consistent behavior (no sustained drift alerts). Formula: (stable_sessions / total_sessions) * 1000

How to Improve

Apply values consistently. Drift is detected when an agent’s behavioral patterns shift away from its baseline over the course of a session. Ensure your agent:
  • Uses the same decision-making framework across sessions
  • Applies declared values consistently (not just when convenient)
  • Maintains stable confidence levels in its reasoning
Avoid aggressive behavioral changes. If you update your agent’s system prompt or model version, expect a temporary drift signal as the baseline recalibrates. Plan major updates with this in mind. Review drift alerts promptly. Check your dashboard or set up webhook notifications for drift.detected events:
# Create webhook for drift events
curl -X POST https://api.mnemom.ai/v1/orgs/{org_id}/webhooks \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-server.com/webhooks/mnemom",
    "event_types": ["drift.detected"]
  }'
Investigate root causes. Common drift patterns:
Drift DirectionCommon CauseFix
autonomy_expansionAgent taking actions outside declared bounds over timeTighten escalation triggers or expand bounded_actions
value_driftDifferent values appearing in decision reasoningReview and update alignment card values
principal_misalignmentDeclining principal-benefit scores in tracesRe-anchor agent prompting on user interests
communication_driftReasoning style becoming inconsistentStabilize system prompt, pin model version

4. Trace Completeness (10% of score)

What it measures: What proportion of the agent’s decisions are logged as AP-Traces. Formula: (logged_decisions / expected_decisions) * 1000

How to Improve

Ensure all decisions are logged via AAP. The most common reason for low trace completeness is partial SDK integration — the agent runs integrity checks (AIP) but does not generate decision traces (AAP).
import { APTrace } from '@mnemom/agent-alignment-protocol';

// Generate trace for every significant decision
function makeDecision(context: DecisionContext): Decision {
  const result = evaluateOptions(context);

  // Log the trace
  const trace: APTrace = {
    trace_id: `tr-${crypto.randomUUID().slice(0, 12)}`,
    agent_id: 'my-agent',
    card_id: currentCardId,
    timestamp: new Date().toISOString(),
    action: {
      type: result.actionType,
      name: result.actionName,
      category: 'bounded',
    },
    decision: {
      alternatives_considered: result.alternatives,
      selected: result.selected,
      selection_reasoning: result.reasoning,
      values_applied: result.valuesUsed,
    },
    escalation: {
      evaluated: true,
      required: false,
      reason: 'Within autonomy envelope',
    },
  };

  storeTrace(trace);
  return result;
}
Increase coverage ratio. If your agent makes decisions that are not being traced:
  1. Audit your agent’s action flow and identify decision points without trace generation
  2. Add trace generation at each decision point
  3. Use the gateway integration for automatic trace generation if manual instrumentation is impractical

5. Coherence Compatibility (10% of score)

What it measures: The mean value coherence score across fleet interactions. Formula: mean_coherence_score * 1000 (defaults to 750 if no fleet data)

How to Improve

Align values with your fleet. If your agent operates in a multi-agent environment, ensure its declared values are compatible with peer agents:
from aap import check_fleet_coherence

result = check_fleet_coherence([
    {"agent_id": "my-agent", "card": my_card},
    {"agent_id": "peer-1", "card": peer_1_card},
    {"agent_id": "peer-2", "card": peer_2_card},
])

# Check if my agent is an outlier
for outlier in result.outliers:
    if outlier.agent_id == "my-agent":
        print(f"Outlier! Conflicts: {outlier.primary_conflicts}")
Resolve value conflicts. If your agent declares conflicts_with values that fleet peers hold, or vice versa, coherence scores will be low. Review conflicts and determine whether they are necessary or can be resolved. Participate in coherence checks. The default score of 750 is applied when no fleet data exists. Actually participating in coherence checks — even if scores are moderate — demonstrates engagement and replaces the default.

Monitoring Your Score

Dashboard

Your agent’s reputation score is displayed on the dashboard with a breakdown of all five components, historical trend chart, and 30-day delta.

Webhook Notifications

Subscribe to reputation events for automated monitoring:
EventTrigger
reputation.score_changedScore changed by more than 10 points in a single recomputation
reputation.grade_changedLetter grade changed (e.g., A to AA, or BBB to BB)
curl -X POST https://api.mnemom.ai/v1/orgs/{org_id}/webhooks \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-server.com/webhooks/mnemom",
    "event_types": ["reputation.score_changed", "reputation.grade_changed"]
  }'

SDK Self-Monitoring

Build reputation monitoring into your agent’s health checks:
import { fetchReputation } from '@mnemom/reputation';

async function healthCheck(agentId: string) {
  const rep = await fetchReputation(agentId);

  if (!rep) {
    console.warn('Reputation data unavailable');
    return;
  }

  if (rep.score < 600) {
    console.error(`ALERT: Reputation score critically low (${rep.score})`);
    await notifyOps(`Agent ${agentId} reputation below threshold: ${rep.score} (${rep.grade})`);
  }

  if (rep.trend_30d < -50) {
    console.warn(`WARNING: Rapid reputation decline (${rep.trend_30d} over 30 days)`);
  }

  // Check individual components
  for (const component of rep.components) {
    if (component.score < 500) {
      console.warn(`Component ${component.label} is low: ${component.score}/1000`);
    }
  }
}

Recovery from a Low Score

If your agent’s reputation has dropped to B (Concerning) or CCC (Critical), here is the recovery playbook:

Phase 1: Stop the Bleeding (Week 1)

  1. Identify the weakest component. Fetch the full score breakdown and find the component pulling the score down most
  2. Fix active violations. If the Integrity Ratio is the problem, review recent non-clear checkpoints and address root causes immediately
  3. Pause if necessary. If the agent is generating new violations faster than you can fix them, consider pausing the agent via containment while you address the issues

Phase 2: Rebuild (Weeks 2-4)

  1. Update the alignment card. Ensure the card accurately reflects your agent’s current behavior. A misaligned card is the most common source of poor scores.
  2. Monitor daily. Check the score trend daily. You should see improvement within 1-2 weeks as the Compliance component decays.
  3. Generate clean checkpoints. Normal operation with a corrected alignment card should produce clear verdicts that improve the Integrity Ratio.

Phase 3: Strengthen (Weeks 4-8)

  1. Address secondary components. Once Integrity Ratio and Compliance are improving, focus on Drift Stability and Trace Completeness.
  2. Build history. As clean checkpoints accumulate, the Confidence Level improves, adding credibility to the recovering score.
Expected timeline: An agent with a CCC (300) score can realistically recover to BBB (600+) in 4-6 weeks with consistent corrective action. Recovery to A (700+) typically takes 8-12 weeks.

Common Pitfalls

Broadening your alignment card to eliminate violations (e.g., adding everything to bounded_actions) will improve the Integrity Ratio but may reduce trust from consumers who inspect the card. An alignment card that permits everything is less meaningful than one with clear boundaries.
Agents often focus only on boundary_violation verdicts because those are the most visible. But review_needed checkpoints also reduce the Integrity Ratio (they are not clear). Addressing these can yield significant score improvements.
Switching LLM model versions causes temporary drift signals as the behavioral baseline recalibrates. If you change models frequently, Drift Stability suffers. Pin a model version for stability.
Running AIP integrity checks without generating AAP traces gives you integrity data but a low Trace Completeness score. Both protocols should be active.
Agents that never participate in fleet coherence checks receive the default Coherence Compatibility score (750). While not penalizing, this means the component cannot contribute to score improvement above 750.

Troubleshooting

Check these in order:
  1. Are new violations occurring? Each new violation resets the Recency penalty. Check your checkpoint feed for recent non-clear verdicts.
  2. Has enough time passed? The Compliance decay half-life is 1 week. Meaningful improvement takes 2-4 weeks of clean operation.
  3. Is the alignment card accurate? If the card does not match your agent’s actual behavior, new violations will continue.
  4. Are checkpoints being generated? Verify via the API that checkpoint count is increasing. No new checkpoints means no new data for score computation.
Check the reputation events endpoint:
curl https://api.mnemom.ai/v1/reputation/{agent_id}/events
Look for violation_detected, drift_detected, or grade_changed events. Common causes of sudden drops:
  • A new boundary_violation in a new session (adds to cumulative Compliance impact)
  • A drift alert (reduces Drift Stability by one session)
  • Model version change triggering behavioral shift
The minimum is 50 analyzed checkpoints. Checkpoints where the thinking block was below 100 tokens receive a synthetic clear verdict and are not counted toward the 50-checkpoint minimum. Check analyzed_checks in the integrity stats:
curl https://api.mnemom.ai/v1/agents/{agent_id}/integrity/aip
# Check "analyzed_checks" — this must be >= 50
The composite score is a weighted sum: S = (integrity * 0.40) + (recency * 0.20) + (drift * 0.20) + (trace * 0.10) + (coherence * 0.10). Check that you are applying the correct weights. The weighted_score field in each component shows the contribution.

See Also