Substrate Fingerprint

The substrate fingerprint is the identity of the AI substrate an agent ran on. Every evaluation, integrity checkpoint, arena attempt, and sideband analysis is stamped server-side with a substrate_id field that names the provider, the model, and — when the customer opts in via two headers — the SDK identifier and a lockfile hash. The fingerprint is the L0 stamp of the Protection Network — without it, the L1 cross-tenant aggregator has no substrate axis to aggregate against. This page is the conceptual page; the supply-chain trust guide covers the customer-facing how-to for package-level provenance and the runtime layer below.

The threat model

The supply-chain attack class AEGIS is positioned against is the compromised AI substrate: a bad SDK patch, a fine-tuned model whose alignment has been silently regressed, a vendored prompt template with a hidden injection. The May 2026 Mini Shai-Hulud worm compromised more than 170 npm packages — including Mistral AI’s SDK on npm and PyPI and Guardrails AI’s package on PyPI — and shipped valid SLSA Build Level 3 attestations on the malicious packages. It was the first documented case of legitimate signed provenance for malicious packages: the attacker controlled the build pipeline, so the provenance system did exactly what it was supposed to do — sign what was built — and the result was a signed worm. A runtime layer that watches behavior at the agent transaction level, across every customer running on the same substrate, is the layer the attacker cannot also control. Where SLSA verifies the package at build time, the substrate fingerprint identifies the runtime substrate so cross-tenant behavioral aggregation can flag what no single customer would catch.

What a substrate fingerprint is

The substrate identity is stamped automatically on every observation row as it is written — no customer action required. The base derivation (<provider>:<model>) is always present. The SDK and lockfile-hash enrichments are opt-in via two request headers (covered below); when present, they compose into the four-component substrate_id.

Production row shape

Table	`substrate_id` format	Example
`integrity_checkpoints`	`<provider>:<model>[:<sdk@ver>[:<lockfile-hash>]]` — collapses below	`anthropic:claude-sonnet-4-6:@anthropic-ai/[email protected]:9e8a…`
`arena_attempts`	`arena:<model>` (or `arena:unknown` for pre-AEGIS rows)	`arena:claude-haiku-4-5`
`sideband_analyses`	`sideband:<analyzer_model>`	`sideband:claude-haiku-4-5`

The three table prefixes let the L1 aggregator bucket production vs. adversarial vs. analyst signal independently — a behavioral deviation observed only in integrity_checkpoints rows (production traffic) is a different signal than the same deviation in arena_attempts rows (probe traffic).

The four collapse forms

The production trigger (migration 252) composes substrate_id from up to four components: <provider>, <model>, the SDK identifier <sdk@ver>, and the customer-set lockfile hash. The trigger collapses based on which optional inputs are present:

Inputs present	Components	Resulting `substrate_id` shape
Provider + model only	2	`<provider>:<model>`
SDK only	3	`<provider>:<model>:<sdk@ver>`
Lockfile only	4	`<provider>:<model>::<lockfile-hash>` (empty third slot)
Both	4	`<provider>:<model>:<sdk@ver>:<lockfile-hash>`

The separator is : throughout. The @ inside <sdk@ver> is internal to the SDK identifier (@anthropic-ai/[email protected]); it is not a top-level separator. The four collapse forms are covered by the aegis23_substrate_trigger pgTAP suite.

The two customer-set headers

The fingerprint enrichments are surfaced through two opt-in request headers. Workers Headers handling is case-insensitive; the canonical names below are the casing used in the gateway’s LOCKFILE_HASH_HEADER and SDK_VERSION_HEADER constants in mnemom-platform/gateway/src/substrate-fingerprint.ts.

Header	Effect	Validation
`X-Mnemom-Lockfile-Hash`	Customer-computed digest of the resolved manifest. Adds the lockfile-hash dimension to the substrate-id.	SHA-256, 64 hex chars. Mixed case accepted, normalized to lowercase. Malformed → 400 + `X-Mnemom-Error: invalid-lockfile-hash`.
`X-Mnemom-Sdk-Version`	Customer-set override of the SDK identifier (preferred over User-Agent dispatch).	Free-form string in the SDK-canonical `<package>@<version>` form.

If X-Mnemom-Sdk-Version is not sent, the gateway falls back to dispatching the request’s User-Agent over seven known SDK families to derive <sdk@ver>; if that also fails, the SDK component is null (collapsing the fingerprint per the table above). The lockfile hash is never inferred — it is only the digest the customer sends. Both opt-ins are header-only; there is no admin-UI toggle. See the lockfile-hash opt-in guide for the customer-facing how-to.

The four axis-identity fields

The substrate fingerprint is one of four L0 axis-identity fields stamped on every row. The full set:

Field	What it identifies	GA derivation
`substrate_id`	The AI substrate the agent ran on	`<provider>:<model>` from request context
`vertical_id`	The customer industry vertical	Defaults to `unspecified` at GA; Phase 2 enrichment from `agent.metadata.vertical`
`pattern_fingerprint`	The canonicalized hash of the detection pattern (or the verdict, for integrity rows)	Per-table derivation: `ic:<verdict>` for integrity, `<category>:<technique>` for arena, prefix-based for sideband
`source_fingerprint`	The canonicalized hash of the request source identity	`agent_id` for direct rows; `attempt_id` for sideband

Together the four fields define the L1 aggregator bucket dimension. A campaign that spans every customer running on anthropic:claude-sonnet-4-6 with a specific BEC-style pattern across financial-services agents shows up as a per-bucket signal even when no single customer can see the cross-tenant pattern.

How fingerprints flow to the aggregator

The full path from row stamp to cross-tenant signal:

1.  Agent transaction hits the gateway
        │
        ▼
2.  Integrity checkpoint (or arena probe, or sideband analysis) row created
    BEFORE INSERT trigger stamps:
        substrate_id, vertical_id, pattern_fingerprint, source_fingerprint
        │
        ▼
3.  L1 cross-tenant aggregator (network_campaign_state) rolls up per-axis stats
    per (axis, bucket, window) — e.g., per substrate × 24h
        │
        ▼
4.  When a bucket's rolling delta crosses threshold:
        threat_level transitions calm → elevated → high → under_attack
        │
        ▼
5.  L2 under-attack overlay engages (Phase 4) / L3 Managed Rule promotion candidate
    L4 threat thermometer surface updates / L5 IoC entry candidate

The customer-readable read of the aggregator is GET /v1/network/threat-state?axis=substrate, which returns per-axis bucket states. The internal aggregator table schema is platform-internal — the wire format documented in the spec is what is on the wire.

Why this catches what package-layer provenance cannot

SLSA / Sigstore / package provenance verifies the package against the build pipeline. The cryptographic claim is “this package was built by that build pipeline.” That claim is true even when the build pipeline is compromised — the system is doing exactly what it was designed to do, and the result is a valid signature on malicious code. The runtime layer is different. AEGIS’s substrate fingerprint plus the L1 aggregator observes behavior at the agent transaction layer, cross-tenant, in aggregate. Behavioral deviation that appears at every customer running on the same substrate simultaneously is the signature the attacker does not control — because the attacker controls the build pipeline, not every customer’s traffic.

Honest claim. AEGIS detects behavioral signatures consistent with supply-chain compromise. It does not replace package-level provenance verification. SLSA and Sigstore are the package-layer; AEGIS is the runtime layer; the two compose, neither replaces the other.

OWASP ASI06 mapping

Substrate fingerprinting maps to ASI06 (Agentic Supply Chain Compromise) in the OWASP Top 10 for Agentic Applications, published December 2025. The mapping is a mapping, not a coverage guarantee: AEGIS covers the runtime-behavior dimension of ASI06; package-layer provenance and dependency-graph hygiene remain the customer’s responsibility.

Overview

Concepts

Gateway

Pricing

Migrations

Policy

Specifications

Changelog

Substrate Fingerprint

The threat model

What a substrate fingerprint is

Production row shape

The four collapse forms

The two customer-set headers

The four axis-identity fields

How fingerprints flow to the aggregator

Why this catches what package-layer provenance cannot

OWASP ASI06 mapping

See also

​The threat model

​What a substrate fingerprint is

​Production row shape

​The four collapse forms

​The two customer-set headers

​The four axis-identity fields

​How fingerprints flow to the aggregator

​Why this catches what package-layer provenance cannot

​OWASP ASI06 mapping

​See also

The threat model

What a substrate fingerprint is

Production row shape

The four collapse forms

The two customer-set headers

The four axis-identity fields

How fingerprints flow to the aggregator

Why this catches what package-layer provenance cannot

OWASP ASI06 mapping

See also