← Back to Computer Vision cs.CV
Predicting if synthetic training data will work without training models
Patryk Bartkowiak, Bartosz Kotrys, Dominik Michels, Soren Pirk, Wojtek Palubicki
May 21, 2026
Building computer vision models on synthetic data is cheap, but it's hard to know if a synthetic dataset will actually work for real applications. SADGE combines appearance and geometric similarity scores (using DINOv3 and MASt3R) to predict downstream performance on object detection, segmentation, and pose estimation. Tested across 15 synthetic-to-real benchmarks, it correlates with real transfer performance at r=0.88, beating appearance-only or geometry-only metrics. This lets practitioners screen datasets before expensive training.
Read the original paper →