Teaching multimodal AI new tasks without forgetting old ones

Yuehao Liu, Shanyan Guan, Weijia Zhang, Xuanming Shang, Yanhao Ge, Wei Li, Chao Ma

As multimodal large language models learn new tasks sequentially, they tend to overwrite previously acquired knowledge — a problem called catastrophic forgetting. Existing fixes either store past training data (raising privacy concerns) or add architectural overhead that hurts generalization. Octopus sidesteps both issues with History-Free Gradient Orthogonalization (HiFGO), which constrains new gradient updates to be orthogonal to directions that would disturb prior task parameters, using no historical examples. A two-stage fine-tuning strategy separates task adaptation from regularization, balancing plasticity and stability. On the UCIT continual learning benchmark, Octopus outperforms the previous best by 2.14% on average accuracy and 6.82% on last-task accuracy.