Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mnemom.ai/llms.txt

Use this file to discover all available pages before exploring further.

Substrate fingerprint

The substrate fingerprint is the identity of the AI substrate an agent ran on. Every evaluation, integrity checkpoint, arena attempt, and sideband analysis is stamped server-side with a substrate_id field that names the provider, the model, and — when the customer opts in via two headers — the SDK identifier and a lockfile hash. The fingerprint is the L0 stamp of the Protection Network — without it, the L1 cross-tenant aggregator has no substrate axis to aggregate against. This page is the conceptual page; the supply-chain trust guide covers the customer-facing how-to for package-level provenance and the runtime layer below.

The threat model

The supply-chain attack class AEGIS is positioned against is the compromised AI substrate: a bad SDK patch, a fine-tuned model whose alignment has been silently regressed, a vendored prompt template with a hidden injection. The May 2026 Mini Shai-Hulud worm compromised more than 170 npm packages — including Mistral AI’s SDK on npm and PyPI and Guardrails AI’s package on PyPI — and shipped valid SLSA Build Level 3 attestations on the malicious packages. It was the first documented case of legitimate signed provenance for malicious packages: the attacker controlled the build pipeline, so the provenance system did exactly what it was supposed to do — sign what was built — and the result was a signed worm. A runtime layer that watches behavior at the agent transaction level, across every customer running on the same substrate, is the layer the attacker cannot also control. Where SLSA verifies the package at build time, the substrate fingerprint identifies the runtime substrate so cross-tenant behavioral aggregation can flag what no single customer would catch.

What a substrate fingerprint is

The substrate identity is stamped automatically on every observation row as it is written — no customer action required. The base derivation (<provider>:<model>) is always present. The SDK and lockfile-hash enrichments are opt-in via two request headers (covered below); when present, they compose into the four-component substrate_id.

Production row shape

Tablesubstrate_id formatExample
integrity_checkpoints<provider>:<model>[:<sdk@ver>[:<lockfile-hash>]] — collapses belowanthropic:claude-sonnet-4-6:@anthropic-ai/[email protected]:9e8a…
arena_attemptsarena:<model> (or arena:unknown for pre-AEGIS rows)arena:claude-haiku-4-5
sideband_analysessideband:<analyzer_model>sideband:claude-haiku-4-5
The three table prefixes let the L1 aggregator bucket production vs. adversarial vs. analyst signal independently — a behavioral deviation observed only in integrity_checkpoints rows (production traffic) is a different signal than the same deviation in arena_attempts rows (probe traffic).

The four collapse forms

The production trigger (migration 252) composes substrate_id from up to four components: <provider>, <model>, the SDK identifier <sdk@ver>, and the customer-set lockfile hash. The trigger collapses based on which optional inputs are present:
Inputs presentComponentsResulting substrate_id shape
Provider + model only2<provider>:<model>
SDK only3<provider>:<model>:<sdk@ver>
Lockfile only4<provider>:<model>::<lockfile-hash> (empty third slot)
Both4<provider>:<model>:<sdk@ver>:<lockfile-hash>
The separator is : throughout. The @ inside <sdk@ver> is internal to the SDK identifier (@anthropic-ai/[email protected]); it is not a top-level separator. The four collapse forms are covered by the aegis23_substrate_trigger pgTAP suite.

The two customer-set headers

The fingerprint enrichments are surfaced through two opt-in request headers. Workers Headers handling is case-insensitive; the canonical names below are the casing used in the gateway’s LOCKFILE_HASH_HEADER and SDK_VERSION_HEADER constants in mnemom-platform/gateway/src/substrate-fingerprint.ts.
HeaderEffectValidation
X-Mnemom-Lockfile-HashCustomer-computed digest of the resolved manifest. Adds the lockfile-hash dimension to the substrate-id.SHA-256, 64 hex chars. Mixed case accepted, normalized to lowercase. Malformed → 400 + X-Mnemom-Error: invalid-lockfile-hash.
X-Mnemom-Sdk-VersionCustomer-set override of the SDK identifier (preferred over User-Agent dispatch).Free-form string in the SDK-canonical <package>@<version> form.
If X-Mnemom-Sdk-Version is not sent, the gateway falls back to dispatching the request’s User-Agent over seven known SDK families to derive <sdk@ver>; if that also fails, the SDK component is null (collapsing the fingerprint per the table above). The lockfile hash is never inferred — it is only the digest the customer sends. Both opt-ins are header-only; there is no admin-UI toggle. See the lockfile-hash opt-in guide for the customer-facing how-to.

The four axis-identity fields

The substrate fingerprint is one of four L0 axis-identity fields stamped on every row. The full set:
FieldWhat it identifiesGA derivation
substrate_idThe AI substrate the agent ran on<provider>:<model> from request context
vertical_idThe customer industry verticalDefaults to unspecified at GA; Phase 2 enrichment from agent.metadata.vertical
pattern_fingerprintThe canonicalized hash of the detection pattern (or the verdict, for integrity rows)Per-table derivation: ic:<verdict> for integrity, <category>:<technique> for arena, prefix-based for sideband
source_fingerprintThe canonicalized hash of the request source identityagent_id for direct rows; attempt_id for sideband
Together the four fields define the L1 aggregator bucket dimension. A campaign that spans every customer running on anthropic:claude-sonnet-4-6 with a specific BEC-style pattern across financial-services agents shows up as a per-bucket signal even when no single customer can see the cross-tenant pattern.

How fingerprints flow to the aggregator

The full path from row stamp to cross-tenant signal:
1.  Agent transaction hits the gateway


2.  Integrity checkpoint (or arena probe, or sideband analysis) row created
    BEFORE INSERT trigger stamps:
        substrate_id, vertical_id, pattern_fingerprint, source_fingerprint


3.  L1 cross-tenant aggregator (network_campaign_state) rolls up per-axis stats
    per (axis, bucket, window) — e.g., per substrate × 24h


4.  When a bucket's rolling delta crosses threshold:
        threat_level transitions calm → elevated → high → under_attack


5.  L2 under-attack overlay engages (Phase 4) / L3 Managed Rule promotion candidate
    L4 threat thermometer surface updates / L5 IoC entry candidate
The customer-readable read of the aggregator is GET /v1/network/threat-state?axis=substrate, which returns per-axis bucket states. The internal aggregator table schema is platform-internal — the wire format documented in the spec is what is on the wire.

Why this catches what package-layer provenance cannot

SLSA / Sigstore / package provenance verifies the package against the build pipeline. The cryptographic claim is “this package was built by that build pipeline.” That claim is true even when the build pipeline is compromised — the system is doing exactly what it was designed to do, and the result is a valid signature on malicious code. The runtime layer is different. AEGIS’s substrate fingerprint plus the L1 aggregator observes behavior at the agent transaction layer, cross-tenant, in aggregate. Behavioral deviation that appears at every customer running on the same substrate simultaneously is the signature the attacker does not control — because the attacker controls the build pipeline, not every customer’s traffic.
Honest claim. AEGIS detects behavioral signatures consistent with supply-chain compromise. It does not replace package-level provenance verification. SLSA and Sigstore are the package-layer; AEGIS is the runtime layer; the two compose, neither replaces the other.

OWASP ASI06 mapping

Substrate fingerprinting maps to ASI06 (Agentic Supply Chain Compromise) in the OWASP Top 10 for Agentic Applications, published December 2025. The mapping is a mapping, not a coverage guarantee: AEGIS covers the runtime-behavior dimension of ASI06; package-layer provenance and dependency-graph hygiene remain the customer’s responsibility.

See also

  • AEGIS — the protection-layer framing
  • Protection Network — the L0-L5 model the fingerprint feeds (the substrate axis at L0)
  • Supply-chain trust guide — package-level provenance + the runtime AEGIS layer
  • IoCs — substrate fingerprints surface as STIX 2.1 indicators when AEGIS publishes them
  • Threat state response schema — wire format for the L1 read; substrate is one of four axes
  • Managed Rules — the signed rule plane that substrate-attributed deviation patterns flow into