← Back to Computer Vision
cs.CV

Reusing old classification weights to align vision and language models

David Méndez, Roberto Confalonieri, Natalia Díaz Rodríguez

May 21, 2026

Vision-language models need expensive training on huge paired datasets to connect images and text. This work recycles the classification heads from pretrained vision models—weights normally discarded—as semantic anchors for alignment. The approach works two ways: directly as zero-shot alignment signals, and as data augmentation when mixed with real image-text pairs. Applied to standard post-hoc alignment methods, it improves cross-modal retrieval and zero/few-shot classification across benchmarks.
Published as Supervised Classification Heads as Semantic Prototypes: Unlocking Vision-Language Alignment via Weight Recycling arXiv:2605.22484
Read the original paper →