Teaching AI to act human by giving it feedback in words

Weiwei Sun, Xuhui Zhou, Jiarui Liu, Weihua Du, Haojia Sun, Yiqing Xie, Qianou Ma, Sihao Chen, Mengting Wan, Longqi Yang, Pei Zhou, Sherry Wu, Sean Welleck, Graham Neubig, Yiming Yang, Maarten Sap

Language models increasingly role-play as people—patients, students, job applicants—but they often fail at capturing subtle social norms that humans learn through verbal feedback. DITTO treats verbal feedback as a primary training signal: after each rollout, the model receives subjective, multi-faceted guidance (like "that was dismissive") and generates an improved version, with both jointly optimized via reinforcement learning. The team benchmarked this on SOUL, a new 10-task suite covering theory of mind, role-play, and user simulation. The approach yields 36% gains over the base model and outperforms GPT-4 on most tasks—suggesting that absorbing linguistic critique, not just reward scalars, makes AI more convincingly human.