Skip to main content

CFD Monitoring

Once CFD is running, the Security Observatory gives you a live view of the threat landscape across your agents. This guide covers what each panel shows, which metrics matter, and how to respond to the alerts you will encounter.

The Security Observatory

The Security Observatory is the CFD section of the Mnemom dashboard at app.mnemom.com/cfd. It has five main panels.

Threat Feed

A real-time stream of evaluation events across all agents in your org, newest first. Each row shows:
  • Agent ID and session ID — which agent, which conversation
  • Verdictwarn, quarantine, or block (color-coded amber/orange/red)
  • Top threat type — the highest-confidence threat category
  • Confidence score — L2 confidence percentage
  • Detection layer — whether the verdict came from L1 (pattern match), L2 (LLM analysis), or L3 (session model)
  • Time — seconds or minutes ago
Click any row to open the full evaluation detail: the turn content, all threat signals, the full L2 reasoning, and the quarantine management actions. What to look for: A sudden spike in prompt_injection or indirect_injection events on a specific agent often means its data sources (search results, email, API responses) have been poisoned. Investigate the source, not just the individual turn.

Metrics Overview

Four KPI cards with 7-day and 30-day comparisons:
  • Block Rateblock_count / total_evaluations — healthy baseline is below 0.5% for most agents. A spike above 2% warrants immediate investigation.
  • Warn Ratewarn_count / total_evaluations — 1–5% is typical. Sustained elevation above 10% suggests either a real threat campaign or thresholds that need calibration.
  • Quarantine Queue Depth — count of items awaiting human review. If this grows faster than your review capacity, consider adjusting thresholds or enabling automatic release for low-confidence quarantines.
  • False Positive Rate — computed over resolved quarantine items. Target below 15%. Above 20% means thresholds need adjustment for the dominant false-positive threat type.

Session Risk Panel

Lists all active sessions with elevated risk (medium or high). A session enters medium risk when any single turn scores above the warn threshold. It escalates to high when:
  • Two or more turns in the session have scored above warn, or
  • A hijack_attempt pattern has been detected (topic/scope pivot after benign history), or
  • A session.escalated event has fired
What to watch: A session that reaches high but has not been blocked means CFD is in quarantine or warn enforcement mode for that agent. The session is still live. Check whether human review is proceeding fast enough — the session may be waiting on you.

Campaign Tracker

Displays detected cross-agent attack campaigns — groups of agents that have received structurally similar attacks from what appears to be the same infrastructure within a rolling time window. Campaign detection fires when:
  • 3 or more agents in the same org receive turns that score ≥ 0.85 on the same threat type, and
  • MinHash similarity between those turns is ≥ 0.88
What it means: A campaign detection is strong evidence of a coordinated attack rather than a coincidental spike. The attacker is probing multiple agents simultaneously, possibly to find the one with the weakest enforcement configuration. Respond by checking whether any of the targeted agents have higher thresholds than the others.

Pattern Activity

Shows recently promoted and demoted patterns in the threat library, and the current backlog of candidate patterns awaiting evaluation. Pattern states:
  • candidate — submitted by user or generated by the arena, not yet evaluated
  • active — passed precision/recall thresholds and is in use
  • deprecated — retired due to excessive false positive rate or superseded by a better pattern
Auto-promotion conditions: A candidate pattern is promoted to active when it achieves precision ≥ 0.90 and recall ≥ 0.85 on the evaluation set of 200+ labeled messages. Arena evaluations run continuously; most candidates are evaluated within 24 hours.

Key Metrics to Watch

Block Rate by Agent

curl "https://api.mnemom.ai/v1/cfd/metrics/timeseries?metric=block_rate&bucket=hour&agent_id=smolt-18a228f3" \
  -H "Authorization: Bearer $TOKEN"
A block rate above 2% on a single agent almost always means one of three things:
  1. The agent is under active attack
  2. A data source the agent reads has been compromised
  3. The threshold is too low for this agent’s legitimate workload
Compare the threat type breakdown to distinguish attack from false positive.

Warn Rate by Confidence Band

Break down warn events by confidence band to understand whether your threshold is set correctly:
BandConfidenceInterpretation
Low0.40–0.55High false positive probability — consider raising warn threshold
Medium0.55–0.70Mixed signal — review representative samples
High0.70–0.85Strong signal — these are approaching block territory
Very High0.85+Near-certain threat — if these are in warn mode, consider raising to block

False Positive Rate by Confidence Band

This is the most useful metric for threshold calibration. If your false positive rate is high in the 0.55–0.70 band but low in the 0.70–0.85 band, you should raise your warn threshold to 0.70 and leave your block threshold where it is.

Session Threat Escalation

The X-CFD-Session-Risk response header on every gateway response reflects the current session risk level for that session. Your application code can read this header and take action:
const response = await fetch(SMOLTBOT_GATEWAY_URL, {
  method: 'POST',
  headers: { /* ... */ },
  body: JSON.stringify(requestBody),
});

const sessionRisk = response.headers.get('X-CFD-Session-Risk');
// 'low' | 'medium' | 'high'

if (sessionRisk === 'high') {
  // Pause the automated workflow and route to human review
  await workflowQueue.escalate(sessionId, { reason: 'CFD session risk elevated' });
}
Risk level progression:
LevelMeaningRecommended action
lowNo threats detected, clean session historyContinue normally
mediumOne or more warn-level events in sessionMonitor; consider slowing automated action rate
highBlock-level event, hijack pattern, or repeated warn eventsRoute to human review before further agent actions
Sessions reset to low at the start of each new session. They do not carry over across sessions.

Campaign Detection

When the campaign tracker shows a new campaign, your response workflow should be:
  1. Identify the weakest enforcement link — which of the affected agents has the lowest block threshold for the campaign’s threat type? Tighten it first.
  2. Check for a common data source — if the affected agents all read from the same email inbox, API feed, or shared data store, that source may be compromised.
  3. Review the similarity evidence — the campaign detail view shows representative turns from each affected agent. The structural similarities are often informative about the attacker’s tooling.
  4. File a pattern candidate — if the campaign’s payload structure is novel, submit it as a candidate pattern so it enters the arena evaluation pipeline.
Query all active campaigns:
curl "https://api.mnemom.ai/v1/cfd/campaigns?status=active" \
  -H "Authorization: Bearer $TOKEN"

Adaptive Threshold Suggestions

The calibration engine runs nightly and computes threshold suggestions for any agent whose false positive or miss rate deviates from targets. Retrieve them:
curl https://api.mnemom.ai/v1/cfd/threshold-suggestions \
  -H "Authorization: Bearer $TOKEN"
How to interpret each suggestion:
  • rationale — the specific metric that triggered the suggestion (false positive rate, miss rate, or both)
  • confidence field on the suggestion — high means the pattern is consistent over 30 days; low means there is not yet enough data to be certain
  • scopeagent means the suggestion applies only to a specific agent; org means it applies org-wide
Apply a suggestion:
curl -X PUT https://api.mnemom.ai/v1/agents/smolt-18a228f3/cfd/config \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "thresholds": {
      "bec_fraud": { "warn": 0.55, "block": 0.88 }
    }
  }'
When the calibration engine produces suggestions for multiple threat types on the same agent, apply them one at a time over successive days rather than all at once. This lets you observe the effect of each change in isolation.

Querying Evaluation History

The evaluation log supports rich filtering for forensic investigation and compliance review. All blocks in the past 7 days:
curl "https://api.mnemom.ai/v1/cfd/evaluations?verdict=block&from=2026-03-23T00:00:00Z&limit=100" \
  -H "Authorization: Bearer $TOKEN"
All BEC fraud signals above 0.7 confidence for a specific agent:
curl "https://api.mnemom.ai/v1/cfd/evaluations?agent_id=smolt-18a228f3&threat_type=bec_fraud&min_risk=0.7&limit=50" \
  -H "Authorization: Bearer $TOKEN"
Timeseries for a specific threat type:
curl "https://api.mnemom.ai/v1/cfd/metrics/timeseries?threat_type=prompt_injection&bucket=day&from=2026-03-01T00:00:00Z" \
  -H "Authorization: Bearer $TOKEN"
Available filter parameters:
ParameterTypeDescription
agent_idstringFilter to a specific agent
verdictwarn | quarantine | blockFilter by verdict
threat_typestringFilter by threat category
min_riskfloatMinimum overall risk score (0–1)
detection_layerl1 | l2 | l3Which detection layer produced the verdict
fromISO 8601Start of time range
toISO 8601End of time range
limitintegerResults per page (max 200)
cursorstringPagination cursor from previous response

The Audit Log

Every configuration change, quarantine decision, and pattern activation is recorded in the CFD audit log. The audit log is append-only and tamper-evident. What the audit log contains:
Event ClassExamples
Configuration changesThreshold update, enforcement mode change, bulk-apply
Quarantine decisionsReleased (with is_false_positive), confirmed as threat, deleted
Pattern eventsCandidate submitted, promoted to active, deprecated
Canary eventsCreated, triggered, acknowledged
Campaign eventsDetected, closed, false-positive dismissed
Query the audit log:
curl "https://api.mnemom.ai/v1/cfd/audit?from=2026-03-01T00:00:00Z&limit=50" \
  -H "Authorization: Bearer $TOKEN"
Audit log entries are included in the EU AI Act compliance export (GET /v1/compliance/cfd-report) and support the 90-day minimum retention required for Article 50 compliance.

Further Reading