← Back to Computation and Language
cs.CL

Why do language models favor Latin script internally?

Daniil Gurgurov, Alan Saji, Katharina Trinley, Josef van Genabith, Simon Ostermann

May 29, 2026

Language models handle multiple writing systems (Arabic, Cyrillic, Latin) for the same language, but how? This work reveals they maintain a shared internal representation and systematically favor Latin script: a small set of attention heads causally control which script the model outputs, these heads transfer across unrelated languages, and linear steering can flip scripts while preserving meaning. Crucially, the model uses compact mechanisms to generate non-Latin text but relies on diffuse network-wide contributions for Latin—suggesting an architectural bias baked into modern LLMs.
Published as The Latin Substrate: How Language Models Represent and Mediate Script Choice arXiv:2605.31363
Read the original paper →