← Back to Machine Learning cs.LG
Training math models faster by focusing on the right problems
Peng Cui, Boyao Yang, Jun Zhu
May 16, 2026
RL post-training on LLMs for math reasoning wastes compute on problems models have already solved or cannot yet handle. Learning-Zone Energy (LZE) scores each training example using difficulty, outcome uncertainty, and pass-rate momentum to identify the model's active learning frontier, then concentrates rollouts there. A forward pruner further cuts wall-clock time by skipping solved problems. Tested on Qwen models (1.5B–8B) across GSM8K, MATH, and DAPO-MATH, LZE retains 40% of data per step while matching or exceeding baselines, with especially large gains on harder benchmarks (AIME25 +45.9%). Code is released.
Read the original paper →