← Back to Machine Learning
cs.LG

Can rewards instead of fixed rules improve image generation training?

Shentong Mo, Sukmin Yun

May 30, 2026

Diffusion transformers struggle to efficiently align generative features with pretrained visual encoders during training. VRPO treats representation alignment as a reinforcement process where the model receives adaptive rewards for generation fidelity and semantic coherence, rather than enforcing fixed similarity constraints. On ImageNet-256, this approach yields 1.8 FID improvement and 2.3× speedup compared to prior alignment methods, with negligible added cost and compatibility with existing DiT and SiT architectures.
Published as Improving Visual Representation Alignment Generation with GRPO arXiv:2606.00583
Read the original paper →