← Back to Computation and Language cs.CL
Can language models diagnose their own mistakes before they happen?
Chu Fei Luo, Samuel Dahan, Xiaodan Zhu
May 29, 2026
When language models solve problems, they often vacillate between correct and incorrect answers depending on sampling. This work trains a probe on a model's internal state before and after it generates clarifying questions, finding the probe can predict final correctness without seeing the answer. The twist: models can diagnose uncertainty through self-questioning, but interventions to correct mistakes fail as often as they succeed, suggesting a gap between self-awareness and self-correction.
Read the original paper →