Turning agent work into long-context training data

Qisheng Su, Zhen Fang, Shiting Huang, Yu Zeng, Yiming Zhao, Kou Shi, Ziao Zhang, Lin Chen, Zehui Chen, Lijun Wu, Feng Zhao

Agent trajectories—the steps agents take when solving problems with tools—contain evidence scattered across many turns. Standard training ignores this, masking tool responses and missing supervision signals. ACC converts agent trajectories into long-context QA pairs that explicitly combine questions with observations and tool responses from multiple steps, training models to reason over distant context without calling tools. On benchmarks requiring long-range reasoning, Qwen3-30B trained with ACC achieved results matching a 7× larger model while maintaining general capabilities.