← Back to Machine Learning (Statistics)
stat.ML

Why does AdaGrad work when gradients go haywire?

Zijian Liu

May 18, 2026

Machine learning training often encounters extreme gradient noise that breaks standard optimizers. This work proves AdaGrad converges reliably under such heavy-tailed noise without needing gradient clipping or normalization, and does so while automatically adapting to the noise severity. The convergence rate is tight enough to show AdaGrad can't match the theoretical optimum for this setting—a gap between practice and theory worth understanding.
Published as Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGrad arXiv:2605.18694
Read the original paper →