← Back to Machine Learning
cs.LG

Why bigger AI models sometimes get worse: a physics perspective

Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu, Yuan Yang, Chen Zheng, Thomas Hartvigsen, Yiyuan Ma

May 22, 2026

Large language models sometimes perform worse when scaled up or quantized, a puzzle existing scaling laws can't explain. This work reframes LLM training as information transmission through a noisy channel (Shannon's model), where model size is bandwidth and training tokens are signal power. The theory predicts a fundamental capacity limit: pushing either dimension without maintaining signal-to-noise ratio amplifies noise and triggers U-shaped performance degradation. Tested on Pythia and OLMo2 across quantization, fine-tuning, and noise injection, the Shannon Scaling Law outperforms prior approaches and extrapolates to unseen model sizes accurately.
Published as LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws arXiv:2605.23901
Read the original paper →