← Back to Computation and Language cs.CL
Why do fine-tuned AI models suddenly overuse words like 'delve'?
Xiaoyang Ming, Jose Hernandez, Thomas Stephan Juzek
May 29, 2026
After RLHF training, language models develop systematic word preferences—overusing "delve" or "furthermore"—that diverge from their base versions and human text. This team built an automated metric (Triangulated Preference Shift score) that isolates these shifts by comparing human standards, base models, and fine-tuned versions, eliminating the need for manual curation. Analysis across six model families suggests preference learning pushes models toward formal, prestige-oriented language patterns.
Read the original paper →