Verifiable agent accountability
Magenta Canon — a Verifiable MCP Gateway for AI-Agent Tool Calls
Allow authorized tool calls, block unauthorized ones, record both, and verify the receipt yourself.
Magenta is the verification and control layer that sits in front of an AI agent's tool use — not an AI assistant itself. It is open source, Apache-2.0.
AI agents are moving from chat to action
For two years, agents mostly produced text. That era is ending. Through the Model Context Protocol (MCP) and similar interfaces, agents now act: they call tools and APIs, write and merge code, touch databases, and reach into GitHub, Stripe, Slack, and internal systems.
The moment an agent can take an action with real-world consequences, one question becomes unavoidable: was that action authorized — and can you prove what actually happened?
The trust gap
The reflexive answer is "we log everything." But a log is only as trustworthy as the system that wrote it. Application logs can be edited, truncated, replayed, or silently dropped — by the same service that took the action, the exact party with the motive and means to rewrite history.
So "check the logs" really means "trust the operator's word." For money, deploys, and customer data — anything you would have to defend in an incident review or an audit — that is not enough.
The Magenta answer
Magenta Canon puts a gateway in front of every tool call and makes every decision independently witnessed. It sits transparently between the MCP host and the downstream tools. For each tools/call, before anything reaches the real tool, it:
- Gates the call against an operator-delegated capability (e.g. "may refund up to $100") — default deny.
- Witnesses the decision — allowed and refused — as a hash-chained, signed execution receipt in an append-only RFC 6962 Merkle transparency log.
- Forwards allowed calls to the real tool; blocks the rest. A blocked call never reaches the tool at all.
The refusal is evidence too: "the agent tried to refund $250 and was stopped" is a fact on the same signed record as "the agent refunded $89."
We don't rely on claims
Live proof — one command
A single command runs the entire wedge locally against a real control plane, a real stdio MCP gateway, and a real downstream tool:
npm install
npm run demo
- ALLOWED refund of
8900cents ($89) — under the $100 ceiling — gated, witnessed, and forwarded. - BLOCKED refund of
25000cents ($250) — over the ceiling — gated, witnessed, and refused. - GROUND TRUTH the downstream tool's own log shows exactly one call; the blocked $250 refund is absent — it never arrived.
- VERIFIED the standalone verifier checks the math over the published evidence and prints
RESULT: VERIFIED. - TAMPER FAILED flip a single byte of the evidence and the verifier prints
RESULT: VERIFICATION FAILED. Tampering is caught.
You do not have to blindly trust the server. You can verify the receipts yourself — and with independent STH mirroring, detect server-side history rewrites. The verifier imports nothing from the server — it re-derives every step from the public spec, so anyone can re-implement it. It proves the evidence bundle's integrity (what was decided and recorded), not that a downstream action succeeded.
See the recorded runs: gateway proof · trust-anchor proof.
How it works
AI agent / MCP host (Claude Code, Desktop, Cursor)
│ tools/call
▼
┌─────────────────┐ gate + witness ┌──────────────────────┐
│ Magenta Gateway │ ───────────────────▶ │ Control plane │
│ (stdio proxy) │ ◀──── allow / deny ── │ • capability gate │
└────────┬────────┘ │ • transparency log │
│ forward IF allowed │ • /api/trust/evidence│
▼ └──────────┬───────────┘
downstream tool │ signed receipts
(refund, API, DB) ▼
evidence bundle ─▶ magenta-verify = VERIFIED
See it at a glance
Understand it visually
The same flow, in pictures. Authorize what's allowed, block what isn't, record proof of both — then verify the receipt yourself.
Explainer video
Full visual set on GitHub: Visual Guide.
Honest status
Magenta Canon today is a credible, reproducible reference implementation — a proven wedge, not a finished product. A project about verifiable claims cannot make unverifiable ones.
- Proven & tested now — capability gate, witness, signed receipts, standalone verifier, stdio MCP gateway; exercised live by
npm run demoand covered by the test suite. - Local/dev proof — runs against file-backed persistence in a fresh local universe.
- stdio transport only — no Streamable HTTP, no hosted or multi-tenant gateway yet.
- Future lanes — production durability (single-writer / Postgres) and a hosted, externally-mirrored witness are explicitly future work.
This is a wedge you can run and verify today — not a claim of production readiness. Full threat model: SECURITY_MODEL.md.
Design partners
If you are building or deploying agents that take real actions — refunds, deploys, database writes, API calls with consequences — we want to talk.
Run npm run demo, read the verification spec, and tell us where it breaks for your use case.