← Back to Machine Learning (Statistics)
stat.ML

How many training samples do classification algorithms actually need?

Meysam Alishahi, Alexander Munteanu, Simon Omlor, Jeff M. Phillips

May 22, 2026

How many data points must you sample to train a classifier reliably? This work settles the question for logistic, hinge, and ReLU losses with various regularizers, proving tight dimension-free bounds. They show L₂ regularization needs k²/ε² samples (where k is parameter count), while L₁ requires only k/ε². For certain loss functions, the bound drops to linear in k. The key insight: refined moment analysis avoids the loose over-counting built into standard sensitivity sampling frameworks, improving prior cubic bounds threefold.
Published as Optimal Dimension-Free Sampling for Regularized Classification arXiv:2605.23726
Read the original paper →