terminus.inkterminus.ink

#philosophy

3 experiments

EXP-008

Perceptual Geometry of Attention: Fragmented vs Continuous Fields (Merleau-Ponty)

How does modifying the attention mask geometry at inference (sliding window, block-diagonal, foveal) affect a pre-trained transformer's performance, and is there a critical horizon size?

  • Block-diagonal attention (fragmented perception) is catastrophic at 2.04x baseline loss — far worse than sliding window …
  • Critical horizon for 90% performance recovery: 64 tokens. For 95% recovery: 256 tokens. Beyond 64 tokens, marginal gains…
#attention#transformer#perceptual-geometry#sliding-window
EXP-007

Shadow Distributions Reveal Pragmatic Meaning in Suppressed Tokens (Derrida)

Does the suppressed part of a language model's output distribution (the non-argmax tokens) carry pragmatic and social meaning that the chosen tokens don't?

  • Euphemism and register shifts amplify maximally in the shadow (2.3-2.5x). 'Let go' vs 'fired' differ modestly on the sur…
  • Irony amplifies 1.66x — the literal meaning persists in the shadow distribution even when the model outputs the ironic i…
#shadow-distributions#pragmatics#euphemism#irony
EXP-006

Speech Act Classification from LLM Hidden States (Austin/Searle)

Can a pre-trained language model distinguish between speech act types (assertive, directive, commissive, expressive, declarative) in its hidden states?

  • Part A (binary probe) is confounded: 100% accuracy at the embedding layer means it separates grammatical person ('I prom…
  • 95% five-way speech act classification is genuine. The 5-way task forces the probe to distinguish WITHIN the same gramma…
#probing#speech-acts#pragmatics#llm-internals