Overview
Fleet coherence answers the question “is this team of agents pulling in the same direction?” — but without collapsing that question into a single number. Mnemom’s production coherence scorer (per ADR-025) is dimensional: it reports a vector with narrative helpers rather than a blended percentage. The rationale is honesty. Any single blended score is a lossy compression, and the specific compression the classical “Jaccard-style” scorer uses actively distorts legitimate fleets:- Silence counts as disagreement. A value that one agent declares but another doesn’t mention deflates the score — even though absence from a role-specialist card isn’t disagreement, it’s specialization.
- Role specialization is punished. A monitor agent and a remediator agent that share all 7 core governance values but differ on 5 role-specific values land at roughly 58% under Jaccard — not because they conflict, but because the denominator counts every unique value as a potential disagreement.
- Fleet = mean-of-pairs loses structure. No asymmetry between a universal conscience floor (which must be shared) and role extensions (which should diverge). No surfacing of the weakest pair, the conflict surface, or the specialization structure.
@mnemom/team-coherence/v2, which reports the dimensions separately and exposes pre-computed narrative helpers.
The v2 output shape
fleet_score field. UI surfaces that need a single number must derive one from this vector and take responsibility for that compression. The Mnemom product does not — every coherence surface in the dashboard reads the vector.
Pairwise scoring
The v2 pairwise scorer is evidence-based:- Silence is neutral. Values declared by only one agent don’t enter the denominator. They contribute to the
diversity_rateside channel as positive specialization signal. - Only explicit conflicts count. A value in one card’s
conflicts_withthat the other card declares is real disagreement. Everything else is tolerated specialization. - Insufficient evidence returns
null, not a fabricated zero. Pairs with no overlap and no conflicts surface honestly as “not enough data to score.” - Bounded in [0, 1].
governance_score = 1when there are only shared values;governance_score = 0when every evidence item is a conflict.
Concrete scenarios
Four showcase agents from the Mnemom incident-response demo:| Pair | Shared | Conflicts | Silent | Baseline (Jaccard) | v2 governance |
|---|---|---|---|---|---|
| Sentinel ↔ Sentinel (self-pair) | 9 | 0 | 0 | 1.00 | 1.00 |
| Sentinel ↔ Patch (role specialists) | 7 | 0 | 5 | 0.58 | 1.00 |
Triage ↔ Patch (explicit conflict on move_fast_break_things) | 7 | 1 | 4 | 0.50 | 0.875 |
| Two agents, no shared values, no conflicts | 0 | 0 | 8 | 0.00 | null |
Fleet scoring: a vector, not a mean
computeTeamCoherence(cards) returns structural information, not a single number:
Aggregates
pairwise_governance_floor— the weakest pair’s governance score. If a fleet has one bad pair, this number tells you that directly. More actionable than a mean.pairwise_governance_median— median across scored pairs. Typical-case health.conflict_edge_count— how many pairs have at least one explicit conflict. If this is zero, the fleet has no hard disagreements even if some pairs have low overlap.insufficient_evidence_pairs— pairs where scoring returnednull. Signals sparse cards more than bad alignment.
Structural invariants
When the fleet uses unified cards withconscience and integrity sections, the scorer checks two binary invariants:
conscience_universal—trueif every agent shares the exact same conscience commitment set.falseif any agent’s conscience set differs from the modal set (the minority diverges, not the majority).nullif any card lacks a conscience section.integrity_uniform—trueif all agents are in the sameintegrity.enforcement_mode(observe,nudge, orenforce).falseif modes differ.
Outlier analysis
An agent is an outlier if its mean pairwise governance score is more than 1σ below the fleet mean. Outliers surface with theirdeviation_sigma so you can tell a mild outlier (1.1σ) from a severe one (2.7σ).
Narrative helpers
The scorer pre-computes the answers to the most common human questions so every UI surface tells the same story:weakest_pair— the pair with the lowest governance score, with full conflict evidence attached. Answers “where should I look first?”most_conflicted_agent— the agent involved in the most conflict pairs. Answers “who needs attention?”specializations— per-agent values that only that agent declares. Answers “what does each agent uniquely bring?”conflict_surface— flat list of every explicit conflict, with evidence (which agent declares the value, which agent lists it as a conflict). Answers “what are all the actual disagreements?”
SDK usage
@mnemom/team-coherence/v2 is a public npm package. It accepts a structural subset interface — both unified cards (full fidelity with conscience + integrity) and AAP 1.0 AlignmentCard (reduced fidelity, invariants return null) satisfy it.
Fault-line analysis
Fault-line classification — grouping divergences intoresolvable / priority_mismatch / incompatible / complementary buckets and surfacing structural fault lines — is a separate layer built on top of coherence scoring. It continues to be emitted by the mnemom-api /v1/teams/fault-lines endpoint alongside the v2 coherence vector.
See the Fault Line Analysis guide for the classification model, and the Intelligence API reference for the endpoint shape.
API
TeamCoherenceResult shape. The legacy AAP-shaped FleetCoherenceResult is retired from the product surface; consumers that want it can import @mnemom/team-coherence/baseline and compute it client-side.
Results are cached for 5 minutes. The org-level endpoint requires the nway_coherence feature flag (Enterprise plan).
Use cases
- Fleet management — monitor shared governance commitments across all agents; detect conscience drift before it becomes a coordination failure.
- Compliance — surface explicit value conflicts for audit. Role specialization is reported as a positive signal, not as a compliance red flag.
- Incident response — verify that a response team’s cards actually agree on the core governance commitments before handing coordination authority to the fleet.
- Onboarding — compute coherence including a new agent; see whether it lands inside or outside the existing specialization structure.
- Algorithm honesty — pair with the
/baselinere-exports on the showcase page to demonstrate the concrete delta between naive and honest scoring on your own cards.
See also
- Agent Cards — The two-card model (alignment + protection)
- Alignment Card — Full unified alignment card schema
- Card Composition — Platform > Org > Agent scope
- Fault Line Analysis — Classifying divergences for action
- Value Coherence — The pairwise AAP handshake (baseline scorer)
- Team Trust Rating — Team-level reputation scoring