Training fast image generators to match your taste without slowing them down

One-step text-to-image models like SD-Turbo are fast but hard to align with human preferences without breaking their speed. DrPO ranks generated images with a reward model, then synthesizes update directions in feature space without backpropagating through the reward—meaning you can use black-box or non-differentiable scorers. On SD-Turbo and SDXL-Turbo, it outperforms gradient-free baselines while cutting training compute by 3.5×, keeping inference at a single forward pass.