How to safely fine-tune pretrained AI policies without them collapsing?

Fine-tuning pretrained AI policies with reinforcement learning often fails because small errors in the critic (evaluator) get amplified, causing learning to collapse. TRQAM fixes this by dynamically controlling how far the updated policy drifts from the pretrained one, using a mathematical trick that lets you set the exact acceptable deviation upfront. On 50 robot manipulation benchmarks, it reaches 68% success versus 46% for the previous best method.