Timeline update: the Digital Omnibus (agreed 7 May 2026, pending adoption) defers Annex III obligations to 2 Dec 2027 — current law still says 2 Aug 2026 · what that means
Mechanistic interpretability for the EU AI Act

Prove what your model actually did.

Glassbox traces the circuit behind a decision, measures how faithful that explanation is, and writes the Annex IV documentation for you. One function call, on open-weight models.

Live demo
Open source · MIT core · built for GPT-2 → Llama-3, Mistral, Phi-3
Grounded in
Built on PyTorch & TransformerLens Method anchored to ACDC (Conmy 2023) & IOI (Wang 2022) 932 tests passing in CI arXiv 2603.09988
The method

Proof from the model's own wiring,
not a story told afterward

Most explainability tools describe a model from the outside. Glassbox reads the inside. It ranks every attention head by the effect it has on the output, then keeps only the heads the decision depends on.

01

Discover

Attribution patching scores each attention head by its causal effect on the prediction. Three forward passes, no correlation guesswork.

15–37× faster than ACDC · 1.8s on a laptop
02

Verify

Sufficiency and comprehensiveness isolate the smallest circuit that still reproduces the behaviour, scored against full ablation. The explanation is measured, not asserted.

credit task (gpt2) · F1 0.00 · Grade C · NON-COMPLIANT — no faithful circuit
03

Document

Eight of the nine Annex IV sections are written straight from the model and content-hashed, so nothing can be edited quietly. The ninth, the Declaration of Conformity, stays in your hands.

9 sections structured · 8 auto-filled · hashed
Measured results

Numbers you can reproduce

37×
faster than ACDC, up to
15–37× across the GPT-2 family
0.704
faithfulness F1, Grade B
IOI task (gpt2), vs full ablation
8/9
Annex IV sections auto-filled
§8 declaration stays human
0.009
confidence ↔ faithfulness
correlation (r), orthogonal

Wall-clock timings from gb.analyze() on Apple M1 Pro, weights pre-loaded; ACDC baseline per Conmy et al. 2023. Full methodology, hardware specs and raw data in BENCHMARKS.md — reproduce with scripts/benchmark.py. Every approximation is disclosed in the result dict.

From prompt to evidence

One call. Any decision.

The same call works whether you're auditing a loan model, a triage classifier, or a CV screener. It picks its own counterfactual when you let it.

audit.py
# any prompt, any domain
from glassbox import GlassboxV2, build_annex_iv_vault

gb = GlassboxV2(model)
result = gb.analyze(
    prompt="Credit score 620. Decision:",
    correct=[" approved", " approve", " yes"],
    incorrect=[" denied", " deny", " no"],
    certify="hessian",
)
result["faithfulness"]["f1"]   # 0.00 — gpt2 has no faithful credit circuit
result["evidence_tier"]["tier"]  # self-graded
vault = build_annex_iv_vault(result)
1

Point it at a model

Load any open-weight transformer. No fine-tuning, no labels, no setup.

2

Run one call

It finds the circuit, verifies the counterfactual actually moved the decision, and scores how faithful the explanation is — then grades its own evidence tier.

3

Get the vault

A regulator-ready Annex IV record, content-hashed and ready to hand over.

EU AI Act · Annex IV

The documentation, written
from the model itself

Every high-risk AI system placed on the EU market must keep Annex IV technical documentation (Article 11). Under current law that applies from 2 August 2026; the Digital Omnibus agreement of 7 May 2026 — pending formal adoption — defers Annex III systems to 2 December 2027. Either way the work is months, not weeks: Glassbox fills eight of the nine sections directly from the model's measured behaviour.

€15M
or 3% of global annual turnover, whichever is higher — the penalty ceiling for documentation obligations under Article 99(4).
days
until 2 December 2027 — the expected Annex III enforcement date under the Digital Omnibus (pending formal adoption; 2 Aug 2026 under current law).
8 of 9
Annex IV sections generated automatically; the Declaration of Conformity stays a human sign-off.

What you actually hand over

Every run produces a structured Annex IV record. The nine sections map to the regulation, each one grounded in the measured circuit, with a SHA-256 hash so the evidence can't be edited after the fact.

  • Reproducible: the same input gives the same vault and hash
  • Faithfulness scores carried into §4, not just asserted
  • Exportable as JSON or PDF for your technical file
Art. 9 riskArt. 13 transparencyArt. 15 accuracyArt. 72 post-market
annex_iv_vault.jsonexample output
sha256 3f9a0c7e…1d4be7c1 · model gpt2-small · 1.8s
§1General description of the systemfilled
§2Development process and designfilled
§3Monitoring, functioning and controlfilled
§4Performance & faithfulness metricsfilled
§5Risk management (Article 9)filled
§8Declaration of conformityhuman sign-off
F1 0.00grade CNON-COMPLIANTno faithful circuit
Pricing

The core is free, forever.
Pay when compliance is on the line.

Circuit discovery and faithfulness metrics stay MIT-licensed and open. Commercial plans cover the compliance engine, hosted infrastructure, and the guarantees an audit demands.

Open source

Community

€0 forever

For researchers and engineers exploring circuits.

  • Full circuit discovery — all 21 frameworks
  • Faithfulness metrics & CLI
  • MIT core · community support on GitHub
pip install glassbox-mech-interp
For teams shipping to the EU

Pro

€499 / month · launch pricing

Hosted audits and signed evidence, without running infrastructure.

  • Hosted audit API — white-box and black-box
  • Signed Annex IV vaults (PDF + JSON)
  • CircuitDiff CI gate for every deploy
  • Email support, 2-business-day SLA
Join the waitlist
Launching before August 2026 · early members lock launch pricing
Regulated industries

Enterprise

Custom

For banks, insurers, health and HR platforms under Annex III.

  • On-prem / air-gapped Docker deployment
  • Tamper-evident audit log, SHA-256 chained
  • Onboarding with your compliance team
  • Priority support & custom SLA
Book a walkthrough

Pro launch pricing is introductory and may change at general availability; early waitlist members keep it. The open-source core (MIT) is never feature-gated — commercial plans add packaging, hosting, signing, and support, not capability.

How Glassbox compares

Faithfulness is the line
most tools can't cross

Black-box explainers describe a model from the outside and can't tell you whether the explanation is true. Circuit tools can, but they're slow. Glassbox keeps the rigour and drops the cost.

ApproachFaithfulness measured?Needs open weights?Speed (GPT-2)
Glassbox attribution patchingYes (suff / comp / F1)Yes1.8s
ACDC (Conmy 2023)YesYes~65s · ~37× slower
SHAP / LIME (black-box)No guaranteeNo (works on APIs)varies
Confidence scoresNo (r = 0.009)Noinstant

Each row is a different trade-off, not a verdict. Black-box methods are the right tool when you only have API access — Glassbox ships a black-box auditor for exactly that case. Where the weights are available, it gives you ACDC-grade circuits at a fraction of the runtime, and unlike a confidence score, an explanation whose faithfulness is actually measured.

Model coverage

Validated GPT-2 → 12B.
Architected for more.

Nine architecture families validated end-to-end (82M → 12B), with grouped-query attention and RMSNorm handled correctly — GPT-2, Pythia/GPT-NeoX, GPT-Neo, OPT, Llama-3, Mistral, Gemma-2, Qwen2/2.5, Yi (Llama-architecture) and Phi-3 — ten model series across nine distinct architectures. Beyond ~13B, gradient-based attribution needs multi-GPU memory; that path is implemented but not yet validated live.

ClassParameter rangeStrategyStatus
small / medium82M to 12Bstandardvalidated — 9 families
large13B to 70Bcheckpointneeds multi-GPU · pending validation
xlarge / xxlarge70B to 200B+checkpoint + offloadimplemented · unverified (cluster)

Reproducible in VALIDATION_LOG.md; comprehensiveness is specificity-checked against a random same-size circuit at every scale. Circuit-level analysis requires open weights — closed API models (Claude, GPT, Gemini) can only receive black-box documentation support, because faithfulness can't be measured without access to the weights.

Any prompt, any domain

Not just toy tasks

Early circuit tools only handled name-swap puzzles. Glassbox picks a counterfactual that fits the prompt in front of it, and every strategy traces to a published method.

Credit

Loan affordability and limit decisions

Annex III §5(b)

Medical triage

Priority and severity classification

high-risk

Employment

CV screening and candidate ranking

Annex III §4

Corruption strategies

name-swap, antonym, semantic negation, random-token fallback

auto

A confident answer is not the same as a correct one.

Across our runs, model confidence and circuit faithfulness barely correlate, at r = 0.009. That's why a compliance auditor needs the circuit, not the score.
Research & releases

The work behind the tool

Be audit-ready before everyone else.

The deadline may move; the work doesn't shrink. The open-source core is on PyPI today. Pro — hosted audits, signed vaults, and the CI gate — opens before enforcement. Waitlist members get onboarded first and keep launch pricing.

Prefer to talk first? Book a 20-minute walkthrough — we'll classify your system against Annex III and show a live audit on a comparable model.

Join the Pro waitlist

Privacy notice

The waitlist form collects your email address (and anything you choose to add) so we can contact you about Glassbox access and onboarding — that's the only purpose. Data controller: Ajay Pravin Mahale (contact: mahale.ajay01@gmail.com). Processing happens on Vercel (hosting) and Resend (email delivery) acting as processors. We don't sell or share your address, don't use it for third-party marketing, and delete it on request or once it's no longer needed. You can ask for access, correction, or deletion at any time by emailing the address above (GDPR Arts. 15–17). This site sets no tracking cookies.