Skip to main content

CI/CD policy gates

Policy gates prevent misconfigurations from reaching production. By embedding policy checks directly into your CI/CD pipeline, you catch violations before they affect live agent traffic — not after. There are two types of gates:
  1. Static validationmnemom card validate checks schema correctness. It is fast, offline, and makes no API calls.
  2. Card evaluationmnemom card evaluate checks the card’s policy against a set of tools. It can run locally or against live agent data.
Both commands return CI-friendly exit codes: 0 on pass, 1 on failure. This means any CI system that interprets exit codes (GitHub Actions, GitLab CI, Jenkins, CircleCI, etc.) will correctly pass or fail the pipeline step.

Static validation gate

Static validation checks that your alignment card conforms to the unified schema (ADR-008) without making any API calls. This makes it fast, safe to run on every pull request, and suitable for environments without API credentials.
Static validation catches structural errors — missing required fields, invalid enum values, malformed glob patterns, and schema version mismatches. It does not verify that capability mappings reference real tools or that coverage is adequate. Use card evaluation for that.
# .github/workflows/card-validate.yml
name: Card Validation
on:
  pull_request:
    paths:
      - 'card.yaml'
      - 'cards/**'

jobs:
  validate-card:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - run: npm install -g @mnemom/mnemom
      - name: Validate card
        run: mnemom card validate card.yaml
This workflow triggers only when card.yaml or files under cards/ change, keeping CI fast for unrelated pull requests.
Run static validation first in your pipeline. It completes in under a second and catches the most common errors before the slower evaluation step runs.

Card evaluation gate

Card evaluation goes beyond schema validation. It evaluates your card’s policy against the agent’s tools — checking capability mapping coverage, verifying tool permissions, and scoring the card against tool usage. This can run locally or with an API key for live agent data.
The evaluation gate may make API calls and read live agent data when using --agent. Only run it in trusted CI environments where your API key is stored as a secret. Never log the full evaluation response in public build logs if it contains agent identifiers or tool names you consider sensitive.
# .github/workflows/card-evaluate.yml
name: Card Evaluation
on:
  push:
    branches: [main]

jobs:
  evaluate-card:
    runs-on: ubuntu-latest
    env:
      MNEMOM_API_KEY: ${{ secrets.MNEMOM_API_KEY }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - run: npm install -g @mnemom/mnemom
      - name: Evaluate card
        run: mnemom card evaluate card.yaml --tools mcp__browser__navigate,mcp__slack__post_message --agent my-agent --format json
This workflow runs on pushes to main, making it a pre-deploy gate. The MNEMOM_API_KEY is read from GitHub Secrets and exposed as an environment variable.

Exit codes and error handling

Both card validate and card evaluate use standard exit codes that CI systems interpret automatically:
Exit CodeMeaningCI Behavior
0All checks passPipeline continues
1Validation or evaluation failuresPipeline fails
No special configuration is needed. If the command exits with 1, the pipeline step fails, and downstream steps (like deployment) are skipped.

JSON output for CI

Use --format json to get machine-readable output for programmatic parsing in complex pipelines:
mnemom card evaluate card.yaml --tools mcp__browser__navigate,mcp__filesystem__delete --agent my-agent --format json
The JSON output includes the verdict, any violations, warnings, and coverage metrics:
{
  "verdict": "fail",
  "violations": [
    {
      "tool": "mcp__filesystem__delete",
      "rule": "forbidden",
      "reason": "File deletion not permitted for support agents",
      "severity": "critical"
    }
  ],
  "warnings": [
    {
      "tool": "mcp__custom__export",
      "rule": "unmapped",
      "reason": "Tool not covered by any capability mapping"
    }
  ],
  "coverage": {
    "coverage_pct": 85,
    "mapped_actions": 5,
    "total_actions": 6,
    "unmapped": ["data_export"]
  }
}
Pipe the JSON output to jq for extracting specific fields in downstream pipeline steps. For example, mnemom card evaluate card.yaml --tools mcp__browser__navigate --agent my-agent --format json | jq '.coverage.coverage_pct' extracts just the coverage percentage.

Combining with reputation gates

For comprehensive pre-deploy checks, combine policy gates with reputation gates. Policy gates verify that your governance configuration is correct. Reputation gates verify that your agent’s trust score meets your organization’s threshold. Together, they ensure both policy correctness and operational trustworthiness before code reaches production.
# .github/workflows/pre-deploy.yml
name: Pre-Deploy Gates
on:
  push:
    branches: [main]

jobs:
  card-gate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - run: npm install -g @mnemom/mnemom
      - run: mnemom card validate card.yaml
      - run: mnemom card evaluate card.yaml --tools mcp__browser__navigate,mcp__slack__post_message --agent my-agent
        env:
          MNEMOM_API_KEY: ${{ secrets.MNEMOM_API_KEY }}

  reputation-gate:
    runs-on: ubuntu-latest
    steps:
      - uses: mnemom/reputation-check@v1
        with:
          agent-id: mnm-550e8400-e29b-41d4-a716-446655440000
          min-score: 600
          api-key: ${{ secrets.MNEMOM_API_KEY }}
Both jobs run in parallel. If either gate fails, the workflow fails and deployment is blocked.
The reputation-gate job uses the mnemom/reputation-check@v1 GitHub Action, which is a standalone action that checks the agent’s reputation score against a minimum threshold. See Embeddable Trust Badges for full configuration options.

Setting up the full pipeline

Here is a complete end-to-end workflow that validates on every PR and evaluates on merge to main:
1
Add your API key as a secret
2
In GitHub, go to Settings > Secrets and variables > Actions, then add MNEMOM_API_KEY with your API key. In GitLab, go to Settings > CI/CD > Variables and add it there with the “Masked” option enabled.
3
Create the validation workflow
4
Add a workflow file that runs static validation on every pull request that modifies card files. This catches schema errors before code review.
5
Create the evaluation workflow
6
Add a second workflow file that runs card evaluation on pushes to main. This confirms the card’s policy is valid against the agent’s tools before deployment proceeds.
7
Add the reputation gate (optional)
8
If your organization enforces minimum trust scores, add the mnemom/reputation-check action as a parallel job in your deploy workflow.
9
Monitor and iterate
10
Review pipeline failures in your CI dashboard. Use --format json output to integrate with alerting tools like Slack, PagerDuty, or Datadog.

Best practices

  • Run validation on every PR that touches card files. Static validation is fast and catches the most common mistakes before human review begins.
  • Run evaluation on main branch merges (pre-deploy). Evaluation confirms the card’s policy works against your agent’s tools, not just schema correctness.
  • Store card.yaml in version control alongside application code. This gives you diff visibility, rollback capability, and a clear audit trail for every policy change.
  • Use --format json for programmatic parsing in complex pipelines. JSON output integrates cleanly with jq, custom scripts, and downstream CI steps.
  • Set up notifications for evaluation failures. Route CI failures to Slack, email, or your incident management tool so the team responds quickly.
  • Keep validation fast by running it first. Since static validation needs no API call, it should always be the first gate. If it fails, there is no reason to run the slower evaluation step.

See also