← Back to Computation and Language
cs.CL

Why do fine-tuned AI models suddenly overuse words like 'delve'?

Xiaoyang Ming, Jose Hernandez, Thomas Stephan Juzek

May 29, 2026

After RLHF training, language models develop systematic word preferences—overusing "delve" or "furthermore"—that diverge from their base versions and human text. This team built an automated metric (Triangulated Preference Shift score) that isolates these shifts by comparing human standards, base models, and fine-tuned versions, eliminating the need for manual curation. Analysis across six model families suggests preference learning pushes models toward formal, prestige-oriented language patterns.
Published as Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning arXiv:2606.00334
Read the original paper →