Do neurons specialize more as models grow larger?

Scaling laws describe how loss improves with model size, but what happens inside? This work tracks Rosetta Neurons—neurons that activate similarly across independently trained models—from 30B language models to 5B vision models. They find these interpretable neurons follow a sublinear scaling law: their absolute count grows, but they represent a shrinking percentage of total neurons. Critically, these neurons become increasingly selective and monosemantic (responding to single concepts) as models scale, while non-Rosetta neurons remain scattered. An analytical model explains this polarization as competition for limited neuron capacity. The findings reveal that interpretability and specialization improve with scale, not degrade.