Fixing reasoning: delegate just a few critical tokens

Large reasoning models vastly outperform base LLMs on benchmarks, but the source of this gap is unclear. This work analyzes token-level disagreement between base and reasoning models, finding that reasoning advantage concentrates on a small set of early, planning-focused decision tokens where base models show high uncertainty. The authors propose disagreement-guided token intervention: at inference, delegate only high-disagreement tokens to the reasoning model, then immediately switch back to the base model. On Qwen3-0.6B, this sparse ~8% intervention recovers or exceeds same-size reasoning model performance on challenging tasks. Code is released.