← Back to Computer Vision
cs.CV

Why medical imaging models struggle with synthetic data

Mischa Dombrowski, Felix Nützel, Bernhard Kainz

May 16, 2026

Generative augmentation with latent diffusion models promises to address class imbalance in medical imaging, but this work identifies a fundamental bottleneck: pretrained autoencoders successfully reconstruct medical images yet organize their latent representations in ways that confuse downstream classifiers. Testing five autoencoder families across chest X-rays, dermatoscopy, CT, and echocardiography confirms the learnability gap exists regardless of architecture or hyperparameter choices. The authors introduce noise-conditioned latent classifiers with FiLM layers and image-space distillation that achieve 64× throughput gains and 120× memory savings while diagnosing latent space quality. The core finding: latent space structure, not perceptual fidelity or domain fine-tuning, is the primary barrier to closing the performance gap between real and synthetic medical data.
Published as The Learnability Gap in Medical Latent Diffusion arXiv:2605.17087
Read the original paper →