← Back to Computation and Language cs.CL
Turning agent work into long-context training data
Qisheng Su, Zhen Fang, Shiting Huang, Yu Zeng, Yiming Zhao, Kou Shi, Ziao Zhang, Lin Chen, Zehui Chen, Lijun Wu, Feng Zhao
May 21, 2026
Agent trajectories—the steps agents take when solving problems with tools—contain evidence scattered across many turns. Standard training ignores this, masking tool responses and missing supervision signals. ACC converts agent trajectories into long-context QA pairs that explicitly combine questions with observations and tool responses from multiple steps, training models to reason over distant context without calling tools. On benchmarks requiring long-range reasoning, Qwen3-30B trained with ACC achieved results matching a 7× larger model while maintaining general capabilities.
Read the original paper →