Why mixture of experts works better on your phone than dense models

Yanbei Chen, Hanxian Huang, Ernie Chang, Jacob Szwejbka, Digant Desai, Zechun Liu, Vikas Chandra, Raghuraman Krishnamoorthi

Mixture-of-Experts (MoE) is proven effective for enormous language models, but its benefits for on-device deployment—where memory and compute are scarce—were unclear. MobileMoE identifies the sweet spot: moderate sparsity with fine-grained shared experts that are optimal under mobile constraints. Across 14 benchmarks, 0.3–0.9B active parameter models match leading dense baselines with far fewer operations and deliver 1.8–3.8× faster text generation on real smartphones, with code and models released.