← Back to Machine Learning
cs.LG

Does predicting clean images work better in compressed space?

Funing Fu, Tenghui Wang, Junyong Cen, Qichao Zhu, Guanyu Zhou

May 26, 2026

Diffusion models can predict images by regressing toward clean pixels or toward noise—mathematically equivalent operations. But a team tested whether this choice matters after compression into learned latent codes. Their 130M JLT model predicts clean latents rather than velocity, achieving FID-50K 2.50 on ImageNet 256×256. Local geometric analysis reveals velocity regression amplifies low-variance directions while clean prediction dampens them, suggesting the choice of prediction target is representation-dependent and not merely algebraic.
Published as JLT: Clean-Latent Prediction in Latent Diffusion Transformers arXiv:2605.27102
Read the original paper →