Why medical imaging models struggle with synthetic data

Generative augmentation with latent diffusion models promises to address class imbalance in medical imaging, but this work identifies a fundamental bottleneck: pretrained autoencoders successfully reconstruct medical images yet organize their latent representations in ways that confuse downstream classifiers. Testing five autoencoder families across chest X-rays, dermatoscopy, CT, and echocardiography confirms the learnability gap exists regardless of architecture or hyperparameter choices. The authors introduce noise-conditioned latent classifiers with FiLM layers and image-space distillation that achieve 64× throughput gains and 120× memory savings while diagnosing latent space quality. The core finding: latent space structure, not perceptual fidelity or domain fine-tuning, is the primary barrier to closing the performance gap between real and synthetic medical data.