#negative-result

2 experiments

EXP-0052026-04-07

Residual Byte Patching: 3.5x Faster and 0.6 BPB Better — After Catching a Causality Bug in Learned Boundaries

Can a byte-level language model learn where to place patch boundaries, or is fixed-stride mean pooling with a byte-level residual connection sufficient?

Fixed mean pooling + broadcast upsample + byte residual is a strict Pareto improvement over full byte resolution. Varian…
Learned soft boundaries contained a critical causality bug. The Gaussian soft assignment matrix had non-zero weights acr…

#negative-result#byte-level#patching#ssm#causality-bug

EXP-0042026-04-07

MI-Weighted BPE Merges: A Promising Result on Portuguese That Failed to Replicate Across 4 Languages and 2 Domains

Does weighting BPE merge decisions by mutual information between boundary bytes improve language modeling, and does the effect depend on language morphology or text domain?

Only 1 of 7 direct comparisons shows improvement. MI-weighted BPE achieved a -2.90% BPB gain on the Portuguese Carolina …
The morphological complexity hypothesis is falsified. Turkish — the most morphologically complex language tested, with p…

#negative-result#tokenization#bpe#mutual-information#cross-lingual