← Back to Computer Vision
cs.CV

Can you train diffusion models on smaller, smarter datasets?

Xiao Cui, Yulei Qin, Mo Zhu, Wengang Zhou, Hongsheng Li, Houqiang Li

June 4, 2026

Training diffusion models requires preserving the geometric structure of data distributions—a challenge existing dataset condensation methods ignore. This work reformulates subset selection as a geometry alignment problem using partial optimal transport, ensuring rare data patterns stay represented even in compact subsets. Combined with feature and semantic consistency checks, the method trains diffusion models on 10–50% less data without quality loss. Code is released.
Published as Geometry-Aware Dataset Condensation for Diffusion Model Training arXiv:2606.05883
Read the original paper →