← Back to Computation and Language
cs.CL

Can language models be tricked into changing what they know about people?

Yuanpu Cao, Ziyi Yin, Fenglong Ma, Jinghui Chen

June 2, 2026

Large language models can be edited to change their factual knowledge, but what happens when you try to manipulate their records of what famous people actually believe? Researchers built a benchmark of 2,178 real opinions from 261 public figures across 19 issues, then tested whether standard editing techniques could safely change them. Current methods mostly fail—they alter surface responses but break internal consistency, leaving contradictions between the edited opinion and supporting evidence the model generates. The team proposes a new alignment-focused editing approach that fixes these inconsistencies without explicit retraining instructions.
Published as Can Factual Opinions Be Edited (Manipulated) in Large Language Models? arXiv:2606.03096
Read the original paper →