#philosophy

3 experiments

EXP-0082026-04-08

Perceptual Geometry of Attention: Fragmented vs Continuous Fields (Merleau-Ponty)

How does modifying the attention mask geometry at inference (sliding window, block-diagonal, foveal) affect a pre-trained transformer's performance, and is there a critical horizon size?

Block-diagonal attention (fragmented perception) is catastrophic at 2.04x baseline loss — far worse than sliding window …
Critical horizon for 90% performance recovery: 64 tokens. For 95% recovery: 256 tokens. Beyond 64 tokens, marginal gains…

#attention#transformer#perceptual-geometry#sliding-window

EXP-0072026-04-08

Shadow Distributions Reveal Pragmatic Meaning in Suppressed Tokens (Derrida)

Does the suppressed part of a language model's output distribution (the non-argmax tokens) carry pragmatic and social meaning that the chosen tokens don't?

Euphemism and register shifts amplify maximally in the shadow (2.3-2.5x). 'Let go' vs 'fired' differ modestly on the sur…
Irony amplifies 1.66x — the literal meaning persists in the shadow distribution even when the model outputs the ironic i…

#shadow-distributions#pragmatics#euphemism#irony

EXP-0062026-04-08

Speech Act Classification from LLM Hidden States (Austin/Searle)

Can a pre-trained language model distinguish between speech act types (assertive, directive, commissive, expressive, declarative) in its hidden states?

Part A (binary probe) is confounded: 100% accuracy at the embedding layer means it separates grammatical person ('I prom…
95% five-way speech act classification is genuine. The 5-way task forces the probe to distinguish WITHIN the same gramma…

#probing#speech-acts#pragmatics#llm-internals