← Back to Machine Learning (Statistics) stat.ML
Why discrete diffusion models predict wrong targets during training
Samson Gourevitch, Yazid Janati, Dario Shariatian, Umut Simsekli, Eric Moulines, Eric P. Xing, Alain Durmus
May 21, 2026
Discrete diffusion models use denoising to generate text, but researchers found that standard uniform diffusion training optimizes the wrong posterior: it predicts clean tokens while ignoring their own noisy observations (leave-one-out prediction), not the full denoising posterior. This mismatch between the training objective and the model's architecture degrades performance. The team derives exact conversions between different parameterizations, proposes a new absorbing-state reformulation that simplifies the denoising problem, and introduces inference tricks (predictor-corrector sampling, better temperature scaling) that improve generation on language modeling without retraining. Code released.
Read the original paper →