← Back to Computer Vision cs.CV
Can you train diffusion models on smaller, smarter datasets?
Xiao Cui, Yulei Qin, Mo Zhu, Wengang Zhou, Hongsheng Li, Houqiang Li
June 4, 2026
Training diffusion models requires preserving the geometric structure of data distributions—a challenge existing dataset condensation methods ignore. This work reformulates subset selection as a geometry alignment problem using partial optimal transport, ensuring rare data patterns stay represented even in compact subsets. Combined with feature and semantic consistency checks, the method trains diffusion models on 10–50% less data without quality loss. Code is released.
Read the original paper →