← Back to Computer Vision
cs.CV

Training fast image generators to match your taste without slowing them down

Zhou Jiang, Yandong Wen, Zhen Liu

June 1, 2026

One-step text-to-image models like SD-Turbo are fast but hard to align with human preferences without breaking their speed. DrPO ranks generated images with a reward model, then synthesizes update directions in feature space without backpropagating through the reward—meaning you can use black-box or non-differentiable scorers. On SD-Turbo and SDXL-Turbo, it outperforms gradient-free baselines while cutting training compute by 3.5×, keeping inference at a single forward pass.
Published as Drifting Preference Optimization for One-Step Generative Models arXiv:2606.02521
Read the original paper →