Teaching vision-language models to forget on demand

Vision-language models trained on massive datasets often contain copyrighted or sensitive material that must be removable on demand. Existing unlearning approaches handle one-off deletion requests, but real deployments face sequential removal requests over time. CATA uses task arithmetic to represent each forget request as a vector, then aggregates these vectors while suppressing conflicting updates that would undo previous deletions. Experiments show CATA outperforms baselines in removing target knowledge, maintaining model performance on retained tasks, and preventing knowledge from re-emerging under sequential updates.