← Back to Machine Learning cs.LG
Can rewards instead of fixed rules improve image generation training?
Shentong Mo, Sukmin Yun
May 30, 2026
Diffusion transformers struggle to efficiently align generative features with pretrained visual encoders during training. VRPO treats representation alignment as a reinforcement process where the model receives adaptive rewards for generation fidelity and semantic coherence, rather than enforcing fixed similarity constraints. On ImageNet-256, this approach yields 1.8 FID improvement and 2.3× speedup compared to prior alignment methods, with negligible added cost and compatibility with existing DiT and SiT architectures.
Read the original paper →