One model trained to master multiple image generation goals at once

Quanhao Li, Junqiu Yu, Kaixun Jiang, Yujie Wei, Zhen Xing, Pandeng Li, Ruihang Chu, Shiwei Zhang, Yu Liu, Zuxuan Wu

Applying reinforcement learning to text-to-image diffusion models typically targets one objective at a time; optimizing multiple rewards jointly causes task interference and forgetting. DiffusionOPD sidesteps this by training independent task-specific teacher models, then distilling all of them into one student model along the student's own generation trajectories. The key technical contribution is extending the online policy distillation framework from discrete token spaces to continuous diffusion processes, deriving a closed-form per-step KL objective that covers both SDE and ODE samplers through mean-matching — yielding lower-variance gradients than standard PPO. The approach reaches top results across all tested benchmarks while being more training-efficient than existing multi-task RL alternatives.