← Back to Machine Learning
cs.LG

Why does more pre-training data always help? A theory that actually explains it

Kazuto Fukuchi, Ryuichiro Hataya, Kota Matsui

June 1, 2026

Pre-training scales well: more data means better few-shot performance downstream. But why? This work proposes complexity minimization, a meta-learning framework that learns representations by minimizing worst-case model complexity across source domains. The theory traces the full path from pre-training through downstream regression and proves error improves predictably as meta-training data grows. Adding complexity regularization to standard meta-learning methods consistently cuts sample requirements on new tasks.
Published as Provable Data Scaling Law for Meta Learning via Complexity Minimization arXiv:2606.02008
Read the original paper →