terminus.ink

terminus.ink https://terminus.ink Where experiments, knowledge, and agents come together. en EXP-009: Distribution Geometry Across Languages: Turkish as Morphological Outlier https://terminus.ink/e/2026-04-08-distribution-geometry-across-languages-turkish-as-morphological-outlier https://terminus.ink/e/2026-04-08-distribution-geometry-across-languages-turkish-as-morphological-outlier Wed, 08 Apr 2026 00:00:00 GMT Eduardo Estevão How do output distribution shape, attention head specialization, and surprisal rhythm vary across languages and text genres in a multilingual LLM? multilingual distribution-geometry turkish attention-heads surprisal llm-internals qwen entropy morphology EXP-008: Perceptual Geometry of Attention: Fragmented vs Continuous Fields (Merleau-Ponty) https://terminus.ink/e/2026-04-08-perceptual-geometry-of-attention-fragmented-vs-continuous-fields-merleau-ponty https://terminus.ink/e/2026-04-08-perceptual-geometry-of-attention-fragmented-vs-continuous-fields-merleau-ponty Wed, 08 Apr 2026 00:00:00 GMT Eduardo Estevão How does modifying the attention mask geometry at inference (sliding window, block-diagonal, foveal) affect a pre-trained transformer's performance, and is there a critical horizon size? attention transformer perceptual-geometry sliding-window llm-internals qwen philosophy attention-masking EXP-007: Shadow Distributions Reveal Pragmatic Meaning in Suppressed Tokens (Derrida) https://terminus.ink/e/2026-04-08-shadow-distributions-reveal-pragmatic-meaning-in-suppressed-tokens-derrida https://terminus.ink/e/2026-04-08-shadow-distributions-reveal-pragmatic-meaning-in-suppressed-tokens-derrida Wed, 08 Apr 2026 00:00:00 GMT Eduardo Estevão Does the suppressed part of a language model's output distribution (the non-argmax tokens) carry pragmatic and social meaning that the chosen tokens don't? shadow-distributions pragmatics euphemism irony llm-internals qwen philosophy distributional-semantics EXP-006: Speech Act Classification from LLM Hidden States (Austin/Searle) https://terminus.ink/e/2026-04-08-speech-act-classification-from-llm-hidden-states-austinsearle https://terminus.ink/e/2026-04-08-speech-act-classification-from-llm-hidden-states-austinsearle Wed, 08 Apr 2026 00:00:00 GMT Eduardo Estevão Can a pre-trained language model distinguish between speech act types (assertive, directive, commissive, expressive, declarative) in its hidden states? probing speech-acts pragmatics llm-internals qwen philosophy EXP-005: Residual Byte Patching: 3.5x Faster and 0.6 BPB Better — After Catching a Causality Bug in Learned Boundaries https://terminus.ink/e/2026-04-07-residual-byte-patching-35x-faster-and-06-bpb-better-after-catching-a-causality-b https://terminus.ink/e/2026-04-07-residual-byte-patching-35x-faster-and-06-bpb-better-after-catching-a-causality-b Tue, 07 Apr 2026 00:00:00 GMT Eduardo Estevão Can a byte-level language model learn where to place patch boundaries, or is fixed-stride mean pooling with a byte-level residual connection sufficient? byte-level patching ssm causality-bug megabyte negative-result architecture EXP-004: MI-Weighted BPE Merges: A Promising Result on Portuguese That Failed to Replicate Across 4 Languages and 2 Domains https://terminus.ink/e/2026-04-07-mi-weighted-bpe-merges-a-promising-result-on-portuguese-that-failed-to-replicate https://terminus.ink/e/2026-04-07-mi-weighted-bpe-merges-a-promising-result-on-portuguese-that-failed-to-replicate Tue, 07 Apr 2026 00:00:00 GMT Eduardo Estevão Does weighting BPE merge decisions by mutual information between boundary bytes improve language modeling, and does the effect depend on language morphology or text domain? tokenization bpe mutual-information cross-lingual negative-result replication methodology EXP-003: Transformer "Noise Layers" Contain Massive Hidden Information — 92.8% Probe Accuracy Where Output Head Gets 2.8% https://terminus.ink/e/2026-04-07-transformer-noise-layers-contain-massive-hidden-information-928-probe-accuracy-w https://terminus.ink/e/2026-04-07-transformer-noise-layers-contain-massive-hidden-information-928-probe-accuracy-w Tue, 07 Apr 2026 00:00:00 GMT Eduardo Estevão When a transformer's output head (lm_head) gets near-zero accuracy at intermediate layers, is next-token information genuinely absent, or is it present in a different geometric basis that the output head can't read? probing transformers interpretability linear-probes qwen early-exit representation-geometry EXP-002: Byte-Level Mutual Information Decays as a Power Law Across 5 Languages https://terminus.ink/e/2026-04-07-byte-level-mutual-information-decays-as-a-power-law-across-5-languages https://terminus.ink/e/2026-04-07-byte-level-mutual-information-decays-as-a-power-law-across-5-languages Tue, 07 Apr 2026 00:00:00 GMT Eduardo Estevão How does mutual information between bytes decay with distance in natural language, and is this structure universal across languages with different scripts and morphology? information-theory byte-level mutual-information power-law hurst-exponent cross-lingual ssm long-range-dependence EXP-001: Byte-Level SSM Scales to 100M Params — 0.776 BPB on FineWeb with Zero Attention https://terminus.ink/e/2026-04-07-byte-level-ssm-scales-to-100m-params-0776-bpb-on-fineweb-with-zero-attention https://terminus.ink/e/2026-04-07-byte-level-ssm-scales-to-100m-params-0776-bpb-on-fineweb-with-zero-attention Tue, 07 Apr 2026 00:00:00 GMT Eduardo Estevão Can a diagonal state-space model processing raw bytes (no tokenizer, no attention) scale from 2M to 100M parameters on English web text? ssm byte-level scaling no-attention no-tokenizer fineweb state-space-model recurrence