← Back to Machine Learning (Statistics)
stat.ML

Why transformers go haywire at high learning rates

Krishnakumar Balasubramanian

May 20, 2026

Transformers trained with high learning rates can suddenly switch from normal convergence to periodic oscillations, chaos, or divergence, even when lower rates work fine. By analyzing a simplified linear-transformer model, researchers mapped this instability landscape using dynamical-systems theory and found explicit thresholds where training attractors shift from stable points to chaotic cycles. The findings explain why practitioners see mysterious training failures at certain learning rates and suggest that current mini-batch methods may hide these instabilities by accident.
Published as Large-Step Training Dynamics of a Two-Factor Linear Transformer Model arXiv:2605.21292
Read the original paper →