← Back to Machine Learning
cs.LG

Making vision-language models confident only when they should be

Peng Cui, Boyao Yang, Jun Zhu

May 16, 2026

Vision-language models fine-tuned with reinforcement learning often express high confidence in wrong answers, especially on corrupted or ambiguous images. Ranking-Aware Calibration (RAC) addresses this by adding two comparison-based losses during RL training: one that enforces higher confidence for better reasoning paths over worse ones within the same prompt, and another that reduces confidence when visual evidence degrades. Tested on Qwen2.5-VL and InternVL-3.5 across six multimodal benchmarks, RAC improves task accuracy and reduces calibration error under both clean and corrupted inputs, while requiring no external confidence annotations.
Published as Ranking-Aware Calibration for Reliable Multimodal Reinforcement Learning arXiv:2605.16999
Read the original paper →