Can models learn to catch their own mistakes better with hints?

Reasoning models get stuck when verifiers can't reliably catch errors—both during test-time checking loops and during training. This work trains verifiers by showing them reference solutions alongside model outputs, so they learn what a better-informed version of themselves would catch. At test time, this doubles accuracy on hard math problems and lifts scientific reasoning from 1.5% to 21%. During training, using these verifiers to give feedback to the generator yields another 33% gain, with the generator alone improving 30% past where standard training converged.