Teaching AI to master the command line at scale

Zihao Cheng, Hongru Wang, Zeming Liu, Xinyi Wang, Xiangrong Zhu, Yuhang Guo, Wei Lin, Jeff Z. Pan, Yunhong Wang

Training language models to execute terminal commands faces a data bottleneck—existing approaches splice together incomplete sources (human seeds, GitHub) that produce narrow, misaligned tasks. Terminal-World sidesteps this by using composable agent skills as building blocks that encode both what to do and how to do it, then automatically derives compatible tasks, environments, and trajectories together. The team synthesized 5,723 environments and trained models up to 32B parameters, achieving 31.5 Pass@1 on Terminal-Bench 2.0—beating a larger baseline with far less data.