Glassbox traces the circuit behind a decision, measures how faithful that explanation is, and writes the Annex IV documentation for you. One function call, on open-weight models.
Attribution patching, ACDC, causal scrubbing, DAS, Hessian bounds — 21 frameworks behind one Python API. pip install and audit in minutes.
Read the quickstart I own complianceEight of nine sections drafted from the model's measured behaviour, content-hashed against quiet edits. No notebook required — your engineers run it, you review it.
See what you hand overMost explainability tools describe a model from the outside. Glassbox reads the inside. It ranks every attention head by the effect it has on the output, then keeps only the heads the decision depends on.
Attribution patching scores each attention head by its causal effect on the prediction. Three forward passes, no correlation guesswork.
Sufficiency and comprehensiveness isolate the smallest circuit that still reproduces the behaviour, scored against full ablation. The explanation is measured, not asserted.
Eight of the nine Annex IV sections are written straight from the model and content-hashed, so nothing can be edited quietly. The ninth, the Declaration of Conformity, stays in your hands.
Wall-clock timings from gb.analyze() on Apple M1 Pro, weights pre-loaded; ACDC baseline per Conmy et al. 2023. Full methodology, hardware specs and raw data in BENCHMARKS.md — reproduce with scripts/benchmark.py. Every approximation is disclosed in the result dict.
The same call works whether you're auditing a loan model, a triage classifier, or a CV screener. It picks its own counterfactual when you let it.
# any prompt, any domain from glassbox import GlassboxV2, build_annex_iv_vault gb = GlassboxV2(model) result = gb.analyze( prompt="Credit score 620. Decision:", correct=[" approved", " approve", " yes"], incorrect=[" denied", " deny", " no"], certify="hessian", ) result["faithfulness"]["f1"] # 0.00 — gpt2 has no faithful credit circuit result["evidence_tier"]["tier"] # self-graded vault = build_annex_iv_vault(result)
Load any open-weight transformer. No fine-tuning, no labels, no setup.
It finds the circuit, verifies the counterfactual actually moved the decision, and scores how faithful the explanation is — then grades its own evidence tier.
A regulator-ready Annex IV record, content-hashed and ready to hand over.
Every high-risk AI system placed on the EU market must keep Annex IV technical documentation (Article 11). Under current law that applies from 2 August 2026; the Digital Omnibus agreement of 7 May 2026 — pending formal adoption — defers Annex III systems to 2 December 2027. Either way the work is months, not weeks: Glassbox fills eight of the nine sections directly from the model's measured behaviour.
Every run produces a structured Annex IV record. The nine sections map to the regulation, each one grounded in the measured circuit, with a SHA-256 hash so the evidence can't be edited after the fact.
Circuit discovery and faithfulness metrics stay MIT-licensed and open. Commercial plans cover the compliance engine, hosted infrastructure, and the guarantees an audit demands.
For researchers and engineers exploring circuits.
Hosted audits and signed evidence, without running infrastructure.
For banks, insurers, health and HR platforms under Annex III.
Pro launch pricing is introductory and may change at general availability; early waitlist members keep it. The open-source core (MIT) is never feature-gated — commercial plans add packaging, hosting, signing, and support, not capability.
Black-box explainers describe a model from the outside and can't tell you whether the explanation is true. Circuit tools can, but they're slow. Glassbox keeps the rigour and drops the cost.
| Approach | Faithfulness measured? | Needs open weights? | Speed (GPT-2) |
|---|---|---|---|
| Glassbox attribution patching | Yes (suff / comp / F1) | Yes | 1.8s |
| ACDC (Conmy 2023) | Yes | Yes | ~65s · ~37× slower |
| SHAP / LIME (black-box) | No guarantee | No (works on APIs) | varies |
| Confidence scores | No (r = 0.009) | No | instant |
Each row is a different trade-off, not a verdict. Black-box methods are the right tool when you only have API access — Glassbox ships a black-box auditor for exactly that case. Where the weights are available, it gives you ACDC-grade circuits at a fraction of the runtime, and unlike a confidence score, an explanation whose faithfulness is actually measured.
Nine architecture families validated end-to-end (82M → 12B), with grouped-query attention and RMSNorm handled correctly — GPT-2, Pythia/GPT-NeoX, GPT-Neo, OPT, Llama-3, Mistral, Gemma-2, Qwen2/2.5, Yi (Llama-architecture) and Phi-3 — ten model series across nine distinct architectures. Beyond ~13B, gradient-based attribution needs multi-GPU memory; that path is implemented but not yet validated live.
| Class | Parameter range | Strategy | Status |
|---|---|---|---|
| small / medium | 82M to 12B | standard | validated — 9 families |
| large | 13B to 70B | checkpoint | needs multi-GPU · pending validation |
| xlarge / xxlarge | 70B to 200B+ | checkpoint + offload | implemented · unverified (cluster) |
Reproducible in VALIDATION_LOG.md; comprehensiveness is specificity-checked against a random same-size circuit at every scale. Circuit-level analysis requires open weights — closed API models (Claude, GPT, Gemini) can only receive black-box documentation support, because faithfulness can't be measured without access to the weights.
Early circuit tools only handled name-swap puzzles. Glassbox picks a counterfactual that fits the prompt in front of it, and every strategy traces to a published method.
Loan affordability and limit decisions
Annex III §5(b)Priority and severity classification
high-riskCV screening and candidate ranking
Annex III §4name-swap, antonym, semantic negation, random-token fallback
autoA confident answer is not the same as a correct one.
The deadline may move; the work doesn't shrink. The open-source core is on PyPI today. Pro — hosted audits, signed vaults, and the CI gate — opens before enforcement. Waitlist members get onboarded first and keep launch pricing.
The waitlist form collects your email address (and anything you choose to add) so we can contact you about Glassbox access and onboarding — that's the only purpose. Data controller: Ajay Pravin Mahale (contact: mahale.ajay01@gmail.com). Processing happens on Vercel (hosting) and Resend (email delivery) acting as processors. We don't sell or share your address, don't use it for third-party marketing, and delete it on request or once it's no longer needed. You can ask for access, correction, or deletion at any time by emailing the address above (GDPR Arts. 15–17). This site sets no tracking cookies.