← Back to Machine Learning
cs.LG

Training math models faster by focusing on the right problems

Peng Cui, Boyao Yang, Jun Zhu

May 16, 2026

RL post-training on LLMs for math reasoning wastes compute on problems models have already solved or cannot yet handle. Learning-Zone Energy (LZE) scores each training example using difficulty, outcome uncertainty, and pass-rate momentum to identify the model's active learning frontier, then concentrates rollouts there. A forward pruner further cuts wall-clock time by skipping solved problems. Tested on Qwen models (1.5B–8B) across GSM8K, MATH, and DAPO-MATH, LZE retains 40% of data per step while matching or exceeding baselines, with especially large gains on harder benchmarks (AIME25 +45.9%). Code is released.
Published as Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training arXiv:2605.17003
Read the original paper →