← Back to Computation and Language cs.CL
Can language models be tricked into changing what they know about people?
Yuanpu Cao, Ziyi Yin, Fenglong Ma, Jinghui Chen
June 2, 2026
Large language models can be edited to change their factual knowledge, but what happens when you try to manipulate their records of what famous people actually believe? Researchers built a benchmark of 2,178 real opinions from 261 public figures across 19 issues, then tested whether standard editing techniques could safely change them. Current methods mostly fail—they alter surface responses but break internal consistency, leaving contradictions between the edited opinion and supporting evidence the model generates. The team proposes a new alignment-focused editing approach that fixes these inconsistencies without explicit retraining instructions.
Read the original paper →