← Back to Computation and Language cs.CL
Learning from past mistakes to make LLMs better at using tools
Renning Pang, Tian Lan, Leyuan Liu, Piao Tong, Sheng Cao, Xiaosong Zhang
May 14, 2026
Language models struggle to reliably use external tools because they must balance reasoning depth against the need for structurally valid outputs. CAST addresses this by analyzing historical execution trajectories to identify two types of patterns: complexity profiles (which reasoning strategies work for which task types) and failure profiles (which structural errors are most likely to occur). The model learns a fine-grained reward function during reinforcement learning that internalizes these patterns, enabling it to adapt reasoning depth per case. On BFCLv2 and ToolBench benchmarks, CAST achieves up to 5.85 percentage points improvement in execution accuracy while reducing average reasoning length by 26%, with particular gains in preventing high-impact structural failures.
Read the original paper →