Making vision-language models confident only when they should be

Vision-language models fine-tuned with reinforcement learning often express high confidence in wrong answers, especially on corrupted or ambiguous images. Ranking-Aware Calibration (RAC) addresses this by adding two comparison-based losses during RL training: one that enforces higher confidence for better reasoning paths over worse ones within the same prompt, and another that reduces confidence when visual evidence degrades. Tested on Qwen2.5-VL and InternVL-3.5 across six multimodal benchmarks, RAC improves task accuracy and reduces calibration error under both clean and corrupted inputs, while requiring no external confidence annotations.