How many training samples do classification algorithms actually need?

How many data points must you sample to train a classifier reliably? This work settles the question for logistic, hinge, and ReLU losses with various regularizers, proving tight dimension-free bounds. They show L₂ regularization needs k²/ε² samples (where k is parameter count), while L₁ requires only k/ε². For certain loss functions, the bound drops to linear in k. The key insight: refined moment analysis avoids the loose over-counting built into standard sensitivity sampling frameworks, improving prior cubic bounds threefold.