Glassbox traces the exact attention heads causally responsible for any transformer prediction. Structured output for EU AI Act Annex IV — in 1.2 seconds, 3 forward passes.
Correlation between a model's output confidence and its internal faithfulness.
A model can output Mary with 94% certainty while the attention circuit responsible for that answer has nothing to do with genuine reasoning. Confidence scores cannot catch this. Glassbox can.
Works with any TransformerLens-compatible model — GPT-2, GPT-Neo, Llama, Mistral, Pythia. No proprietary dependencies, no cloud required.
Pass a prompt and a contrastive token pair. Glassbox runs attribution patching and greedy circuit discovery in O(3 + 2p) forward passes.
from glassbox import GlassboxV2 from transformer_lens import HookedTransformer gb = GlassboxV2(HookedTransformer.from_pretrained("gpt2")) result = gb.analyze( prompt = "When Mary and John went to the store, John gave a drink to", correct = " Mary", incorrect = " John", )
Minimum faithful circuit, three faithfulness metrics with 95% CIs, and a complete Annex IV evidence draft ready for regulatory submission.
# Circuit: causal heads identified result["circuit"] → [(9,9), (9,6), (10,0)] # Faithfulness metrics + 95% CIs result["faithfulness"] → { sufficiency: 1.00, comprehensiveness: 0.47, f1: 0.64, grade: "B" } # Annex IV evidence package result["annex_iv"] → 9-section structured dict mapped to EU AI Act articles
Enforcement starts August 2026. Article 11 requires technical documentation for every high-risk AI system. Glassbox produces all nine sections.
No registration. No API key. Open source, runs locally. The live demo generates a full compliance report in 60 seconds.