Verifiable agent accountability

Magenta Canon — a Verifiable MCP Gateway for AI-Agent Tool Calls

Allow authorized tool calls, block unauthorized ones, record both, and verify the receipt yourself.

Magenta is the verification and control layer that sits in front of an AI agent's tool use — not an AI assistant itself. It is open source, Apache-2.0.

AI agents are moving from chat to action

For two years, agents mostly produced text. That era is ending. Through the Model Context Protocol (MCP) and similar interfaces, agents now act: they call tools and APIs, write and merge code, touch databases, and reach into GitHub, Stripe, Slack, and internal systems.

The moment an agent can take an action with real-world consequences, one question becomes unavoidable: was that action authorized — and can you prove what actually happened?

The trust gap

The reflexive answer is "we log everything." But a log is only as trustworthy as the system that wrote it. Application logs can be edited, truncated, replayed, or silently dropped — by the same service that took the action, the exact party with the motive and means to rewrite history.

So "check the logs" really means "trust the operator's word." For money, deploys, and customer data — anything you would have to defend in an incident review or an audit — that is not enough.

The Magenta answer

Magenta Canon puts a gateway in front of every tool call and makes every decision independently witnessed. It sits transparently between the MCP host and the downstream tools. For each tools/call, before anything reaches the real tool, it:

  • Gates the call against an operator-delegated capability (e.g. "may refund up to $100") — default deny.
  • Witnesses the decision — allowed and refused — as a hash-chained, signed execution receipt in an append-only RFC 6962 Merkle transparency log.
  • Forwards allowed calls to the real tool; blocks the rest. A blocked call never reaches the tool at all.

The refusal is evidence too: "the agent tried to refund $250 and was stopped" is a fact on the same signed record as "the agent refunded $89."

We don't rely on claims

Live proof — one command

A single command runs the entire wedge locally against a real control plane, a real stdio MCP gateway, and a real downstream tool:

npm install
npm run demo
Demo asset placeholder — recording in progress. Caption once live: One command. One allowed call. One blocked call. Independent verification. Tamper detection.
  • ALLOWED   refund of 8900 cents ($89) — under the $100 ceiling — gated, witnessed, and forwarded.
  • BLOCKED   refund of 25000 cents ($250) — over the ceiling — gated, witnessed, and refused.
  • GROUND TRUTH   the downstream tool's own log shows exactly one call; the blocked $250 refund is absent — it never arrived.
  • VERIFIED   the standalone verifier checks the math over the published evidence and prints RESULT: VERIFIED.
  • TAMPER FAILED   flip a single byte of the evidence and the verifier prints RESULT: VERIFICATION FAILED. Tampering is caught.

You do not have to blindly trust the server. You can verify the receipts yourself — and with independent STH mirroring, detect server-side history rewrites. The verifier imports nothing from the server — it re-derives every step from the public spec, so anyone can re-implement it. It proves the evidence bundle's integrity (what was decided and recorded), not that a downstream action succeeded.

See the recorded runs: gateway proof · trust-anchor proof.

How it works

  AI agent / MCP host  (Claude Code, Desktop, Cursor)
            │  tools/call
            ▼
     ┌─────────────────┐    gate + witness    ┌──────────────────────┐
     │ Magenta Gateway │ ───────────────────▶ │ Control plane         │
     │  (stdio proxy)  │ ◀──── allow / deny ── │  • capability gate    │
     └────────┬────────┘                       │  • transparency log   │
              │ forward IF allowed             │  • /api/trust/evidence│
              ▼                                 └──────────┬───────────┘
       downstream tool                                     │ signed receipts
       (refund, API, DB)                                   ▼
                                          evidence bundle ─▶ magenta-verify = VERIFIED
Allowed → forwarded. Blocked → stopped before the tool. Both → witnessed receipt → evidence bundle → independently verified.

See it at a glance

Understand it visually

The same flow, in pictures. Authorize what's allowed, block what isn't, record proof of both — then verify the receipt yourself.

Magenta Canon sits between an AI agent and a tool server, authorizing allowed calls, blocking unauthorized ones, and recording proof of both.
What it does — verifiable accountability for AI-agent tool calls.
An $89 refund is authorized and forwarded to the tool server; a $250 refund is blocked and never forwarded.
Allowed vs blocked — an $89 refund is forwarded; a $250 refund stops at the gate.
The downstream ground-truth log contains the allowed call and is absent the blocked call, proving blocked actions never reach downstream.
Never reached downstream — the tool's own log proves the blocked call never arrived.
Architecture: agent to Magenta Gateway capability gate, to an append-only signed Merkle transparency log, to the downstream MCP tool server, to an independent verifier producing an evidence bundle.
MCP architecture — authorize, witness to an append-only signed log, verify independently.

Explainer video

A narrated walkthrough of the authorize · block · record · verify loop.

Full visual set on GitHub: Visual Guide.

Honest status

Magenta Canon today is a credible, reproducible reference implementation — a proven wedge, not a finished product. A project about verifiable claims cannot make unverifiable ones.

  • Proven & tested now — capability gate, witness, signed receipts, standalone verifier, stdio MCP gateway; exercised live by npm run demo and covered by the test suite.
  • Local/dev proof — runs against file-backed persistence in a fresh local universe.
  • stdio transport only — no Streamable HTTP, no hosted or multi-tenant gateway yet.
  • Future lanes — production durability (single-writer / Postgres) and a hosted, externally-mirrored witness are explicitly future work.

This is a wedge you can run and verify today — not a claim of production readiness. Full threat model: SECURITY_MODEL.md.

Design partners

If you are building or deploying agents that take real actions — refunds, deploys, database writes, API calls with consequences — we want to talk.

Run npm run demo, read the verification spec, and tell us where it breaks for your use case.