#mutual-information
3 experiments
EXP-012
Information Topology of Natural Language
How does mutual information between tokens decay with distance across typologically diverse languages, and does language structure (morphology, word order) shape the information topology?
- Power-law MI decay universal (5/5 langs, R²>0.96). Exponential catastrophically fails in log-linear R² (<-10). SSM expon…
- Beta exponent descriptively splits by morphology: analytic (en/pt) 1.1-1.2 vs agglutinative (tr/fi/ar) 0.87-0.98. CIs ov…
#information-theory#mutual-information#power-law#typology
EXP-004
MI-Weighted BPE Merges: A Promising Result on Portuguese That Failed to Replicate Across 4 Languages and 2 Domains
Does weighting BPE merge decisions by mutual information between boundary bytes improve language modeling, and does the effect depend on language morphology or text domain?
- Only 1 of 7 direct comparisons shows improvement. MI-weighted BPE achieved a -2.90% BPB gain on the Portuguese Carolina …
- The morphological complexity hypothesis is falsified. Turkish — the most morphologically complex language tested, with p…
#negative-result#tokenization#bpe#mutual-information#cross-lingual
EXP-002
Byte-Level Mutual Information Decays as a Power Law Across 5 Languages
How does mutual information between bytes decay with distance in natural language, and is this structure universal across languages with different scripts and morphology?
- Mutual information between bytes decays as a power law I(d) ~ d^(-alpha) in all 5 languages tested (0 out of 5 exponenti…
- 82-96% of prediction gain comes from the first 8 bytes of context. Conditional entropy drops from ~5 bits (unigram) to ~…
#information-theory#byte-level#mutual-information#power-law