Can language models diagnose their own mistakes before they happen?

When language models solve problems, they often vacillate between correct and incorrect answers depending on sampling. This work trains a probe on a model's internal state before and after it generates clarifying questions, finding the probe can predict final correctness without seeing the answer. The twist: models can diagnose uncertainty through self-questioning, but interventions to correct mistakes fail as often as they succeed, suggesting a gap between self-awareness and self-correction.