← Back to Computer Vision
cs.CV

Why self-supervised features make one-step image generation 39× better

Hugues Van Assel, Edward De Brouwer, Saeed Saremi, Gabriele Scalia, Aviv Regev

May 30, 2026

One-step image generators match generated samples to real data using frozen self-supervised learning (SSL) features and the Sinkhorn divergence. The key insight: SSL features suppress reconstruction noise, creating compact geometry that makes distribution matching tractable—39× FID improvement on ImageNet. Surprisingly, the best features for training differ from the best features for evaluation metrics, exposing how metrics can be gamed. Code released.
Published as Generate in Reconstruction Space, Match in Semantic Space: Transport Geometry for One-Step Generation arXiv:2606.00514
Read the original paper →