Why AI knowledge edits fail on slightly different images or phrasings

Multimodal language models can be edited to correct facts, but changes often don't generalize—they fail when the same concept appears in different images or worded differently. This work treats robust editing as a group problem: identify semantically equivalent inputs and ensure predictions stay consistent across them. Two techniques do the work: generating adversarial variants in the latent space to find brittle regions, then enforcing low-rank alignment at the edit layer to stabilize them. Results show substantial improvements in generalization without degrading existing knowledge.