← Back to Machine Learning (Statistics) stat.ML
Why transformers go haywire at high learning rates
Krishnakumar Balasubramanian
May 20, 2026
Transformers trained with high learning rates can suddenly switch from normal convergence to periodic oscillations, chaos, or divergence, even when lower rates work fine. By analyzing a simplified linear-transformer model, researchers mapped this instability landscape using dynamical-systems theory and found explicit thresholds where training attractors shift from stable points to chaotic cycles. The findings explain why practitioners see mysterious training failures at certain learning rates and suggest that current mini-batch methods may hide these instabilities by accident.
Read the original paper →