← Back to Machine Learning (Statistics)
stat.ML

Why does Muon optimizer work? A physics perspective

Aratrika Mustafi, Soumya Mukherjee, Bharath K. Sriperumbudur

May 22, 2026

Muon is a second-order optimizer that works well in practice, but why? This work reveals it as a damped Hamiltonian flow on probability measures—connecting optimizer dynamics to classical mechanics. The authors prove monotonic energy dissipation and exponential convergence rates under reasonable assumptions, then extend the framework to transformer mixture-of-experts, showing how particle interactions in mean-field training can be analyzed through this lens.
Published as Move on Muon : A Hamiltonian probability gradient flow perspective of Muon optimizer arXiv:2605.23871
Read the original paper →