Why discrete diffusion models predict wrong targets during training

Samson Gourevitch, Yazid Janati, Dario Shariatian, Umut Simsekli, Eric Moulines, Eric P. Xing, Alain Durmus

Discrete diffusion models use denoising to generate text, but researchers found that standard uniform diffusion training optimizes the wrong posterior: it predicts clean tokens while ignoring their own noisy observations (leave-one-out prediction), not the full denoising posterior. This mismatch between the training objective and the model's architecture degrades performance. The team derives exact conversions between different parameterizations, proposes a new absorbing-state reformulation that simplifies the denoising problem, and introduces inference tricks (predictor-corrector sampling, better temperature scaling) that improve generation on language modeling without retraining. Code released.