← Back to Machine Learning (Statistics) stat.ML
How to sample better reasoning without extra training?
Felix Zhou, Anay Mehrotra, Quanquan C. Liu
May 28, 2026
Sampling from a sharpened version of a base language model can elicit strong reasoning without reinforcement learning, but requires efficient sampling. The authors propose Entropy-Cut Metropolis-Hastings, which uses next-token entropy to identify critical decision points—like choosing a proof strategy—rather than resampling uniformly at random positions. On math and coding benchmarks, this approach mixes faster and beats baselines.
Read the original paper →