Why does more pre-training data always help? A theory that actually explains it

Pre-training scales well: more data means better few-shot performance downstream. But why? This work proposes complexity minimization, a meta-learning framework that learns representations by minimizing worst-case model complexity across source domains. The theory traces the full path from pre-training through downstream regression and proves error improves predictably as meta-training data grows. Adding complexity regularization to standard meta-learning methods consistently cuts sample requirements on new tasks.