← Back to Machine Learning cs.LG
How to make diffusion policies learn faster in robotics?
Shutong Ding, Zejia Zhong, Zhongyi Wang, Ke Hu, Bikang Pan, Jingya Wang, Ye Shi
May 28, 2026
Diffusion-based reinforcement learning policies excel at exploration but struggle with slow convergence; gradient methods do the opposite. CGPO combines both by guiding the diffusion denoising process toward high-value actions identified by a critic network, eliminating the need for additional sampling. Tests on five MuJoCo tasks and real Franka robot arms show faster learning and better final performance than existing diffusion RL methods.
Read the original paper →