When do generative models learn versus memorize?

Diffusion models trained on different subsets of the same dataset converge to nearly identical outputs, suggesting they've learned the data distribution rather than memorized it. But what does convergence actually measure? Using linear models and convolutional denoisers, these researchers show that convergence emerges when you have roughly as many training samples as input dimensions—yet this convergence masks a critical gap: models recover the important underlying factors of the data only later, in a sharp separate transition. Memorization and generalization aren't a single spectrum; they're at least two distinct processes captured by different mathematical distances.