← Back to Robotics cs.RO
How to safely fine-tune pretrained AI policies without them collapsing?
Yonghoon Dong, Kyungmin Lee, Changyeon Kim, Jaehyuk Kim, Jinwoo Shin
May 26, 2026
Fine-tuning pretrained AI policies with reinforcement learning often fails because small errors in the critic (evaluator) get amplified, causing learning to collapse. TRQAM fixes this by dynamically controlling how far the updated policy drifts from the pretrained one, using a mathematical trick that lets you set the exact acceptable deviation upfront. On 50 robot manipulation benchmarks, it reaches 68% success versus 46% for the previous best method.
Read the original paper →