← Back to Machine Learning cs.LG
Does predicting clean images work better in compressed space?
Funing Fu, Tenghui Wang, Junyong Cen, Qichao Zhu, Guanyu Zhou
May 26, 2026
Diffusion models can predict images by regressing toward clean pixels or toward noise—mathematically equivalent operations. But a team tested whether this choice matters after compression into learned latent codes. Their 130M JLT model predicts clean latents rather than velocity, achieving FID-50K 2.50 on ImageNet 256×256. Local geometric analysis reveals velocity regression amplifies low-variance directions while clean prediction dampens them, suggesting the choice of prediction target is representation-dependent and not merely algebraic.
Read the original paper →