v3.4.0 · MIT License · Open Source

The compliance
audit your model
cannot fake.

Glassbox traces the exact attention heads causally responsible for any transformer prediction. Structured output for EU AI Act Annex IV — in 1.2 seconds, 3 forward passes.

Generate Annex IV Report Read the Paper

$ pip install glassbox-mech-interp

glassbox — compliance analysis

Scroll

0×

Faster than ACDC
baseline

Circuit discovery
on CPU

Forward passes
total

0/9

Annex IV sections
covered

Aug '26

EU AI Act
enforcement

Core Research Finding

r = 0.009

Correlation between a model's output confidence and its internal faithfulness.

A model can output Mary with 94% certainty while the attention circuit responsible for that answer has nothing to do with genuine reasoning. Confidence scores cannot catch this. Glassbox can.

How it works

Three steps from prompt
to compliance report.

Install and point at your model.

Works with any TransformerLens-compatible model — GPT-2, GPT-Neo, Llama, Mistral, Pythia. No proprietary dependencies, no cloud required.

$ pip install glassbox-mech-interp

Run one function call.

Pass a prompt and a contrastive token pair. Glassbox runs attribution patching and greedy circuit discovery in O(3 + 2p) forward passes.

analyze.py

from glassbox import GlassboxV2
from transformer_lens import HookedTransformer

gb = GlassboxV2(HookedTransformer.from_pretrained("gpt2"))

result = gb.analyze(
    prompt    = "When Mary and John went to the store, John gave a drink to",
    correct   = " Mary",
    incorrect = " John",
)

Get a structured compliance package.

Minimum faithful circuit, three faithfulness metrics with 95% CIs, and a complete Annex IV evidence draft ready for regulatory submission.

output

# Circuit: causal heads identified
result["circuit"]     → [(9,9), (9,6), (10,0)]

# Faithfulness metrics + 95% CIs
result["faithfulness"] → { sufficiency: 1.00,
                            comprehensiveness: 0.47,
                            f1: 0.64, grade: "B" }

# Annex IV evidence package
result["annex_iv"]     → 9-section structured dict
                            mapped to EU AI Act articles

EU AI Act

Annex IV documentation,
automatically generated.

Enforcement starts August 2026. Article 11 requires technical documentation for every high-risk AI system. Glassbox produces all nine sections.

Identifies which specific components (layer, head) causally drove a prediction — maps to Annex IV §7 explainability
Sufficiency score quantifies how much of the prediction the circuit explains — direct evidence for Article 13(1)
Comprehensiveness score measures causal necessity — distinguishes genuine explanations from post-hoc correlation
Structured JSON output suitable for direct import into GRC systems and audit documentation
Bootstrap CIs provide statistical grounding — regulators assess confidence, not just point estimates
Every approximation is explicitly disclosed — meets the EU AI Act's transparency requirements without ambiguity

EU AI Act enforcement — August 2026High-risk systems in finance, healthcare, HR, and legal must comply with Articles 11–15 and Annex IV.

Annex IV Report · Generated

Compliance Grade

Conditionally Compliant

Faithfulness F1

0.64

Sufficiency

1.00

Comprehensiveness

0.47

Circuit heads

(9,9) (9,6) (10,0)

Annex IV sections

9 / 9

The complianceaudit your modelcannot fake.

Three steps from promptto compliance report.

Annex IV documentation,automatically generated.

Ready to audit your model?

The compliance
audit your model
cannot fake.

Three steps from prompt
to compliance report.

Annex IV documentation,
automatically generated.