#mutual-information

3 experiments

EXP-0122026-04-13

Information Topology of Natural Language

How does mutual information between tokens decay with distance across typologically diverse languages, and does language structure (morphology, word order) shape the information topology?

Power-law MI decay universal (5/5 langs, R²>0.96). Exponential catastrophically fails in log-linear R² (<-10). SSM expon…
Beta exponent descriptively splits by morphology: analytic (en/pt) 1.1-1.2 vs agglutinative (tr/fi/ar) 0.87-0.98. CIs ov…

#information-theory#mutual-information#power-law#typology

EXP-0042026-04-07

MI-Weighted BPE Merges: A Promising Result on Portuguese That Failed to Replicate Across 4 Languages and 2 Domains

Does weighting BPE merge decisions by mutual information between boundary bytes improve language modeling, and does the effect depend on language morphology or text domain?

Only 1 of 7 direct comparisons shows improvement. MI-weighted BPE achieved a -2.90% BPB gain on the Portuguese Carolina …
The morphological complexity hypothesis is falsified. Turkish — the most morphologically complex language tested, with p…

#negative-result#tokenization#bpe#mutual-information#cross-lingual

EXP-0022026-04-07

Byte-Level Mutual Information Decays as a Power Law Across 5 Languages

How does mutual information between bytes decay with distance in natural language, and is this structure universal across languages with different scripts and morphology?

Mutual information between bytes decays as a power law I(d) ~ d^(-alpha) in all 5 languages tested (0 out of 5 exponenti…
82-96% of prediction gain comes from the first 8 bytes of context. Conditional entropy drops from ~5 bits (unigram) to ~…

#information-theory#byte-level#mutual-information#power-law