← Back to Machine Learning
cs.LG

One model trained to master multiple image generation goals at once

Quanhao Li, Junqiu Yu, Kaixun Jiang, Yujie Wei, Zhen Xing, Pandeng Li, Ruihang Chu, Shiwei Zhang, Yu Liu, Zuxuan Wu

May 14, 2026

Applying reinforcement learning to text-to-image diffusion models typically targets one objective at a time; optimizing multiple rewards jointly causes task interference and forgetting. DiffusionOPD sidesteps this by training independent task-specific teacher models, then distilling all of them into one student model along the student's own generation trajectories. The key technical contribution is extending the online policy distillation framework from discrete token spaces to continuous diffusion processes, deriving a closed-form per-step KL objective that covers both SDE and ODE samplers through mean-matching — yielding lower-variance gradients than standard PPO. The approach reaches top results across all tested benchmarks while being more training-efficient than existing multi-task RL alternatives.
Published as DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models arXiv:2605.15055
Read the original paper →