v3.4.0  ·  MIT License  ·  Open Source

The compliance
audit your model
cannot fake.

Glassbox traces the exact attention heads causally responsible for any transformer prediction. Structured output for EU AI Act Annex IV — in 1.2 seconds, 3 forward passes.

$ pip install glassbox-mech-interp
glassbox — compliance analysis
Scroll
Faster than ACDC
baseline
0s
Circuit discovery
on CPU
0
Forward passes
total
0/9
Annex IV sections
covered
Aug '26
EU AI Act
enforcement
Core Research Finding
r = 0.009

Correlation between a model's output confidence and its internal faithfulness.

A model can output Mary with 94% certainty while the attention circuit responsible for that answer has nothing to do with genuine reasoning. Confidence scores cannot catch this. Glassbox can.

How it works

Three steps from prompt
to compliance report.

01
Install and point at your model.

Works with any TransformerLens-compatible model — GPT-2, GPT-Neo, Llama, Mistral, Pythia. No proprietary dependencies, no cloud required.

$ pip install glassbox-mech-interp
02
Run one function call.

Pass a prompt and a contrastive token pair. Glassbox runs attribution patching and greedy circuit discovery in O(3 + 2p) forward passes.

analyze.py
from glassbox import GlassboxV2
from transformer_lens import HookedTransformer

gb = GlassboxV2(HookedTransformer.from_pretrained("gpt2"))

result = gb.analyze(
    prompt    = "When Mary and John went to the store, John gave a drink to",
    correct   = " Mary",
    incorrect = " John",
)
03
Get a structured compliance package.

Minimum faithful circuit, three faithfulness metrics with 95% CIs, and a complete Annex IV evidence draft ready for regulatory submission.

output
# Circuit: causal heads identified
result["circuit"]     → [(9,9), (9,6), (10,0)]

# Faithfulness metrics + 95% CIs
result["faithfulness"] → { sufficiency: 1.00,
                            comprehensiveness: 0.47,
                            f1: 0.64, grade: "B" }

# Annex IV evidence package
result["annex_iv"]     → 9-section structured dict
                            mapped to EU AI Act articles
EU AI Act

Annex IV documentation,
automatically generated.

Enforcement starts August 2026. Article 11 requires technical documentation for every high-risk AI system. Glassbox produces all nine sections.

  • Identifies which specific components (layer, head) causally drove a prediction — maps to Annex IV §7 explainability
  • Sufficiency score quantifies how much of the prediction the circuit explains — direct evidence for Article 13(1)
  • Comprehensiveness score measures causal necessity — distinguishes genuine explanations from post-hoc correlation
  • Structured JSON output suitable for direct import into GRC systems and audit documentation
  • Bootstrap CIs provide statistical grounding — regulators assess confidence, not just point estimates
  • Every approximation is explicitly disclosed — meets the EU AI Act's transparency requirements without ambiguity
EU AI Act enforcement — August 2026High-risk systems in finance, healthcare, HR, and legal must comply with Articles 11–15 and Annex IV.
Annex IV Report  ·  Generated
B
Compliance Grade
Conditionally Compliant
Faithfulness F1
0.64
Sufficiency
1.00
Comprehensiveness
0.47
Circuit heads
(9,9) (9,6) (10,0)
Annex IV sections
9 / 9

Ready to audit your model?

No registration. No API key. Open source, runs locally. The live demo generates a full compliance report in 60 seconds.