← Back to Computer Vision
cs.CV

Predicting if synthetic training data will work without training models

Patryk Bartkowiak, Bartosz Kotrys, Dominik Michels, Soren Pirk, Wojtek Palubicki

May 21, 2026

Building computer vision models on synthetic data is cheap, but it's hard to know if a synthetic dataset will actually work for real applications. SADGE combines appearance and geometric similarity scores (using DINOv3 and MASt3R) to predict downstream performance on object detection, segmentation, and pose estimation. Tested across 15 synthetic-to-real benchmarks, it correlates with real transfer performance at r=0.88, beating appearance-only or geometry-only metrics. This lets practitioners screen datasets before expensive training.
Published as SADGE: Structure and Appearance Domain Gap Estimation of Synthetic and Real Data arXiv:2605.22467
Read the original paper →