Teaching video quality models to compare, not score absolutely

Shibei Meng, Binxin Yang, Yuan Liu, Jiexuan Zhang, Zhengyao Lv, Hubery Yin, Qiang Xu

Video quality assessment models trained on absolute scores fail when deployed to new datasets because they pick up on rating habits rather than actual quality differences. VersusQ sidesteps this by having a multimodal model compare pairs of videos, reason about visual and temporal differences, and output a signed margin—essentially saying "video A is better than B by this much." The method outperforms absolute-score baselines on public benchmarks and generalizes far better across different datasets and evaluation protocols.