← Back to Machine Learning (Statistics)
stat.ML

Why momentum algorithms fail when data is sparse and uneven?

Katie Everett, Elliot Paquette

May 27, 2026

Momentum optimization works well when gradients arrive steadily, but real data—especially imbalanced classes or sparse architectures—delivers them unevenly. This paper solves the dynamics exactly for least squares and logistic regression with sparse inputs, showing momentum's behavior depends on two competing timescales: how long the momentum buffer survives versus how fast the model learns. When learning outpaces buffer decay, the system oscillates wildly; when they're balanced, you get classical heavy-ball motion. The mismatch reveals why a single global momentum parameter fails across sparse data with different frequencies.
Published as Dynamics of Stochastic Momentum with Sparse Updates in High Dimensions arXiv:2605.28961
Read the original paper →