← Back to Computation and Language cs.CL
What makes reasoning data actually work in AI training?
Yaoming Li, Guangxiang Zhao, Qilong Shi, Lin Sun, Xiangzheng Zhang, Tong Yang
June 1, 2026
Large language models improve dramatically with post-training on reasoning data, but the field lacks unified understanding of what works and why. This primer synthesizes 150+ papers and system reports to answer: what types of reasoning data exist, what makes them useful, how they're created, and how they scale. The framework helps predict which data investments pay off in future model development.
Read the original paper →