← Back to Computation and Language cs.CL
Why mixture of experts works better on your phone than dense models
Yanbei Chen, Hanxian Huang, Ernie Chang, Jacob Szwejbka, Digant Desai, Zechun Liu, Vikas Chandra, Raghuraman Krishnamoorthi
May 26, 2026
Mixture-of-Experts (MoE) is proven effective for enormous language models, but its benefits for on-device deployment—where memory and compute are scarce—were unclear. MobileMoE identifies the sweet spot: moderate sparsity with fine-grained shared experts that are optimal under mobile constraints. Across 14 benchmarks, 0.3–0.9B active parameter models match leading dense baselines with far fewer operations and deliver 1.8–3.8× faster text generation on real smartphones, with code and models released.
Read the original paper →