← Back to Machine Learning
cs.LG

How to make diffusion policies learn faster in robotics?

Shutong Ding, Zejia Zhong, Zhongyi Wang, Ke Hu, Bikang Pan, Jingya Wang, Ye Shi

May 28, 2026

Diffusion-based reinforcement learning policies excel at exploration but struggle with slow convergence; gradient methods do the opposite. CGPO combines both by guiding the diffusion denoising process toward high-value actions identified by a critic network, eliminating the need for additional sampling. Tests on five MuJoCo tasks and real Franka robot arms show faster learning and better final performance than existing diffusion RL methods.
Published as Sample-Efficient Diffusion-based Reinforcement Learning with Critic Guidance arXiv:2605.30056
Read the original paper →