How to make diffusion policies learn faster in robotics?

Diffusion-based reinforcement learning policies excel at exploration but struggle with slow convergence; gradient methods do the opposite. CGPO combines both by guiding the diffusion denoising process toward high-value actions identified by a critic network, eliminating the need for additional sampling. Tests on five MuJoCo tasks and real Franka robot arms show faster learning and better final performance than existing diffusion RL methods.