← Back to Computation and Language
cs.CL

What makes reasoning data actually work in AI training?

Yaoming Li, Guangxiang Zhao, Qilong Shi, Lin Sun, Xiangzheng Zhang, Tong Yang

June 1, 2026

Large language models improve dramatically with post-training on reasoning data, but the field lacks unified understanding of what works and why. This primer synthesizes 150+ papers and system reports to answer: what types of reasoning data exist, what makes them useful, how they're created, and how they scale. The framework helps predict which data investments pay off in future model development.
Published as A Primer in Post-Training Reasoning Data: What We Know About How It Works arXiv:2606.02113
Read the original paper →