← Back to Machine Learning
cs.LG

Building better tool-using AI with fewer, smarter training environments

Minrui Xu, Zilin Wang, Mengyi DENG, Zhiwei Li, Zhicheng Yang, Xiao Zhu, Yinhong Liu, Boyu Zhu, Baiyu Huang, Chao Chen, Heyuan Deng, Fei Mi, Lifeng Shang, Xingshan Zeng, Zhijiang Guo

May 18, 2026

Training LLM agents to use tools reliably hits a wall: either you pay for real APIs, rely on hallucination-prone simulations, or use synthetic data that reads like instruction manuals rather than natural human requests. EnvFactory automates both environment construction and realistic trajectory generation, exploring and verifying stateful tool environments from real sources, then synthesizing multi-turn interactions that capture implicit human intent. On just 85 environments, it generates 2,575 training trajectories and boosts Qwen3 models by 15% on BFCLv3 and 6–8.6% on other agent benchmarks—outperforming approaches using five times more environments.
Published as EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL arXiv:2605.18703
Read the original paper →