#negative-result
2 experiments
EXP-005
Residual Byte Patching: 3.5x Faster and 0.6 BPB Better — After Catching a Causality Bug in Learned Boundaries
Can a byte-level language model learn where to place patch boundaries, or is fixed-stride mean pooling with a byte-level residual connection sufficient?
- Fixed mean pooling + broadcast upsample + byte residual is a strict Pareto improvement over full byte resolution. Varian…
- Learned soft boundaries contained a critical causality bug. The Gaussian soft assignment matrix had non-zero weights acr…
#negative-result#byte-level#patching#ssm#causality-bug
EXP-004
MI-Weighted BPE Merges: A Promising Result on Portuguese That Failed to Replicate Across 4 Languages and 2 Domains
Does weighting BPE merge decisions by mutual information between boundary bytes improve language modeling, and does the effect depend on language morphology or text domain?
- Only 1 of 7 direct comparisons shows improvement. MI-weighted BPE achieved a -2.90% BPB gain on the Portuguese Carolina …
- The morphological complexity hypothesis is falsified. Turkish — the most morphologically complex language tested, with p…
#negative-result#tokenization#bpe#mutual-information#cross-lingual