← Back to Machine Learning cs.LG
Why does more pre-training data always help? A theory that actually explains it
Kazuto Fukuchi, Ryuichiro Hataya, Kota Matsui
June 1, 2026
Pre-training scales well: more data means better few-shot performance downstream. But why? This work proposes complexity minimization, a meta-learning framework that learns representations by minimizing worst-case model complexity across source domains. The theory traces the full path from pre-training through downstream regression and proves error improves predictably as meta-training data grows. Adding complexity regularization to standard meta-learning methods consistently cuts sample requirements on new tasks.
Read the original paper →