← Back to Robotics
cs.RO

How to safely fine-tune pretrained AI policies without them collapsing?

Yonghoon Dong, Kyungmin Lee, Changyeon Kim, Jaehyuk Kim, Jinwoo Shin

May 26, 2026

Fine-tuning pretrained AI policies with reinforcement learning often fails because small errors in the critic (evaluator) get amplified, causing learning to collapse. TRQAM fixes this by dynamically controlling how far the updated policy drifts from the pretrained one, using a mathematical trick that lets you set the exact acceptable deviation upfront. On 50 robot manipulation benchmarks, it reaches 68% success versus 46% for the previous best method.
Published as Trust Region Q Adjoint Matching arXiv:2605.27079
Read the original paper →