1 experiment
Does weighting BPE merge decisions by mutual information between boundary bytes improve language modeling, and does the effect depend on language morphology or text domain?