← Back to Machine Learning cs.LG
Why bigger AI models sometimes get worse: a physics perspective
Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu, Yuan Yang, Chen Zheng, Thomas Hartvigsen, Yiyuan Ma
May 22, 2026
Large language models sometimes perform worse when scaled up or quantized, a puzzle existing scaling laws can't explain. This work reframes LLM training as information transmission through a noisy channel (Shannon's model), where model size is bandwidth and training tokens are signal power. The theory predicts a fundamental capacity limit: pushing either dimension without maintaining signal-to-noise ratio amplifies noise and triggers U-shaped performance degradation. Tested on Pythia and OLMo2 across quantization, fine-tuning, and noise injection, the Shannon Scaling Law outperforms prior approaches and extrapolates to unseen model sizes accurately.
Read the original paper →