← Back to Computation and Language
cs.CL

Can AI learn what each user actually wants before judging responses?

Yilun Qiu, Xiaoyan Zhao, Yang Zhang, Yuxin Chen, Cilin Yan, Jiayin Cai, Xiaolong Jiang, Yao Hu, Yoko Yamakata, Tat-Seng Chua

May 29, 2026

Evaluating whether an LLM actually matches individual user preferences remains unsolved—existing judges and metrics ignore long-term interaction patterns. PARL learns personalized scoring rubrics directly from user histories through reinforcement learning, then validates them against the user's own choices. Tested on real text generation tasks, it captures stable stylistic preferences and generalizes across users and domains, with code released.
Published as Preference-Aware Rubric Learning for Personalized Evaluation arXiv:2605.31545
Read the original paper →