← Back to Computation and Language
cs.CL

How to fix facts in vision-language models without breaking everything else?

Leijiang Gu, Zhen Zeng, Feng Li, Xinjian Gao, Zenglin Shi

May 28, 2026

When you fix a factual error in a multimodal language model, the change often stays trapped in that single example or accidentally corrupts unrelated knowledge. LDKE solves this by identifying which specific layers control each fact and separating target-relevant features from irrelevant ones. The method uses fast localization to find critical layers and a classifier to route information properly, achieving both broad generalization and high precision across multiple benchmarks.
Published as Towards Localized and Disentangled Knowledge Editing for Multimodal Large Language Models arXiv:2605.29826
Read the original paper →