← Back to Machine Learning
cs.LG

How much of a reasoning chain can you trust before it breaks?

Matt Y. Cheung, Ashok Veeraraghavan, Hanjie Chen, Guha Balakrishnan

May 28, 2026

Language models often produce reasoning steps that are partly correct—valid intermediate work followed by critical errors. CROP applies conformal prediction to certify the longest contiguous prefix of a reasoning trace that stays below a risk threshold, automatically routing the unreliable remainder for human review or repair. Tested on six reasoning datasets, it outperforms standard uncertainty metrics by measuring what actually matters: how much valid reasoning can be preserved before hallucination.
Published as Conformal Certification of Reasoning Trace Prefixes arXiv:2605.30085
Read the original paper →