← Back to Computation and Language
cs.CL

Why mixture of experts works better on your phone than dense models

Yanbei Chen, Hanxian Huang, Ernie Chang, Jacob Szwejbka, Digant Desai, Zechun Liu, Vikas Chandra, Raghuraman Krishnamoorthi

May 26, 2026

Mixture-of-Experts (MoE) is proven effective for enormous language models, but its benefits for on-device deployment—where memory and compute are scarce—were unclear. MobileMoE identifies the sweet spot: moderate sparsity with fine-grained shared experts that are optimal under mobile constraints. Across 14 benchmarks, 0.3–0.9B active parameter models match leading dense baselines with far fewer operations and deliver 1.8–3.8× faster text generation on real smartphones, with code and models released.
Published as MobileMoE: Scaling On-Device Mixture of Experts arXiv:2605.27358
Read the original paper →