#probing
2 experiments
EXP-006
Speech Act Classification from LLM Hidden States (Austin/Searle)
Can a pre-trained language model distinguish between speech act types (assertive, directive, commissive, expressive, declarative) in its hidden states?
- Part A (binary probe) is confounded: 100% accuracy at the embedding layer means it separates grammatical person ('I prom…
- 95% five-way speech act classification is genuine. The 5-way task forces the probe to distinguish WITHIN the same gramma…
#probing#speech-acts#pragmatics#llm-internals
EXP-003
Transformer "Noise Layers" Contain Massive Hidden Information — 92.8% Probe Accuracy Where Output Head Gets 2.8%
When a transformer's output head (lm_head) gets near-zero accuracy at intermediate layers, is next-token information genuinely absent, or is it present in a different geometric basis that the output head can't read?
- CORRECTION: The 92.8% probe accuracy was an artifact of overfitting — a 1536-dim linear probe on only 356 tokens will me…
- "Noise layers" still contain more information than the output head can read. Even with corrected methodology, the traine…
#probing#transformers#interpretability#linear-probes