← Back to Machine Learning (Statistics) stat.ML
Why does Muon optimizer work? A physics perspective
Aratrika Mustafi, Soumya Mukherjee, Bharath K. Sriperumbudur
May 22, 2026
Muon is a second-order optimizer that works well in practice, but why? This work reveals it as a damped Hamiltonian flow on probability measures—connecting optimizer dynamics to classical mechanics. The authors prove monotonic energy dissipation and exponential convergence rates under reasonable assumptions, then extend the framework to transformer mixture-of-experts, showing how particle interactions in mean-field training can be analyzed through this lens.
Read the original paper →