How to sample better reasoning without extra training?

Sampling from a sharpened version of a base language model can elicit strong reasoning without reinforcement learning, but requires efficient sampling. The authors propose Entropy-Cut Metropolis-Hastings, which uses next-token entropy to identify critical decision points—like choosing a proof strategy—rather than resampling uniformly at random positions. On math and coding benchmarks, this approach mixes faster and beats baselines.