Can better questions and rubrics improve AI reasoning?

Rongzhi Zhang, Rui Feng, Zhihan Zhang, Jingfeng Yang, Qingyu Yin, Xin Liu, Zixuan Zhang, Priyanka Nigam, Bing Yin, Tuo Zhao, Chao Zhang

Training AI systems with rubric-based reinforcement learning hits a wall: vague queries produce vague rubrics, while overly specific ones demand references the model can't satisfy. QUBRIC solves this by jointly refining both. It converts open-ended questions into concrete scenarios, derives rubrics from teacher-policy mismatches, and filters for signal-rich pairs. The result: a 5.5-point gain on ArenaHard, plus strong transfer to reasoning tasks like legal and moral judgment—all without needing human verification of every answer.