When should AI agents actually stop and think?

Mingkai Deng, Jinyu Hou, Lara Sá Neves, Varad Pimpalkhute, Taylor W. Killian, Zhengzhong Liu, Eric P. Xing

Large language models waste computation by planning on every problem equally. This work splits agentic reasoning into three parts: a world model that simulates future states, a learned controller that decides *when* to plan and how deeply, and a reactive executor for immediate actions. The team built SR²AM, which uses reinforcement learning to train a 30B model to match reasoning performance of models 20–30× larger while cutting reasoning tokens by 76–95%. The key insight: optimal agents don't deliberate constantly—they learn to plan only when it matters, spending extra thinking time on genuinely hard problems rather than easy ones.