#morphology
2 experiments
EXP-012
Information Topology of Natural Language
How does mutual information between tokens decay with distance across typologically diverse languages, and does language structure (morphology, word order) shape the information topology?
- Power-law MI decay universal (5/5 langs, R²>0.96). Exponential catastrophically fails in log-linear R² (<-10). SSM expon…
- Beta exponent descriptively splits by morphology: analytic (en/pt) 1.1-1.2 vs agglutinative (tr/fi/ar) 0.87-0.98. CIs ov…
#information-theory#mutual-information#power-law#typology
EXP-009
Distribution Geometry Across Languages: Turkish as Morphological Outlier
How do output distribution shape, attention head specialization, and surprisal rhythm vary across languages and text genres in a multilingual LLM?
- Turkish is a distribution outlier across every metric: lowest top-1 accuracy (37%), highest entropy (3.69), lowest kurto…
- Zero global attention heads exist out of 784 total. Head type distribution: 49% mixed, 29% sparse, 22% local, <1% diagon…
#multilingual#distribution-geometry#turkish#attention-heads