← Back to Machine Learning (Statistics)
stat.ML

How to sample better reasoning without extra training?

Felix Zhou, Anay Mehrotra, Quanquan C. Liu

May 28, 2026

Sampling from a sharpened version of a base language model can elicit strong reasoning without reinforcement learning, but requires efficient sampling. The authors propose Entropy-Cut Metropolis-Hastings, which uses next-token entropy to identify critical decision points—like choosing a proof strategy—rather than resampling uniformly at random positions. On math and coding benchmarks, this approach mixes faster and beats baselines.
Published as Reasoning with Sampling: Cutting at Decision Points arXiv:2605.30327
Read the original paper →