Can AI models secretly re-link medical images to patient reports?

Clinical vision-language models learn to match radiographs with their corresponding reports in a shared embedding space—a problem when images and reports are supposed to stay separate for privacy. Researchers tested this "cross-modal linkage" vulnerability on 406,241 paired examples from MIMIC-CXR and CheXpert Plus, finding stronger models could retrieve the correct report far above random chance, even when disease labels were removed. Applying differential privacy only to the projection layer connecting the two modalities reduced re-linkage by 62% without meaningfully degrading image classification (AUROC dropped 0.2 percentage points), offering a practical defense mechanism.