#probing

2 experiments

EXP-0062026-04-08

Speech Act Classification from LLM Hidden States (Austin/Searle)

Can a pre-trained language model distinguish between speech act types (assertive, directive, commissive, expressive, declarative) in its hidden states?

Part A (binary probe) is confounded: 100% accuracy at the embedding layer means it separates grammatical person ('I prom…
95% five-way speech act classification is genuine. The 5-way task forces the probe to distinguish WITHIN the same gramma…

#probing#speech-acts#pragmatics#llm-internals

EXP-0032026-04-07

Transformer "Noise Layers" Contain Massive Hidden Information — 92.8% Probe Accuracy Where Output Head Gets 2.8%

When a transformer's output head (lm_head) gets near-zero accuracy at intermediate layers, is next-token information genuinely absent, or is it present in a different geometric basis that the output head can't read?

CORRECTION: The 92.8% probe accuracy was an artifact of overfitting — a 1536-dim linear probe on only 356 tokens will me…
"Noise layers" still contain more information than the output head can read. Even with corrected methodology, the traine…

#probing#transformers#interpretability#linear-probes