#early-exit

1 experiment

EXP-0032026-04-07

Transformer "Noise Layers" Contain Massive Hidden Information — 92.8% Probe Accuracy Where Output Head Gets 2.8%

When a transformer's output head (lm_head) gets near-zero accuracy at intermediate layers, is next-token information genuinely absent, or is it present in a different geometric basis that the output head can't read?

CORRECTION: The 92.8% probe accuracy was an artifact of overfitting — a 1536-dim linear probe on only 356 tokens will me…
"Noise layers" still contain more information than the output head can read. Even with corrected methodology, the traine…

#probing#transformers#interpretability#linear-probes