← Back to Computer Vision cs.CV
Can flowing queries through latent space beat static detectors?
Yao Wei, Andrea Cavallaro, Changjae Oh
May 30, 2026
Open-vocabulary object detection typically uses static decoder queries, limiting what detectors can find. FlowOVD treats query generation as a continuous transport process using rectified flow, progressively warping generic queries into text-guided ones. The approach gains 2.5% over GroundingDINO on COCO and 15% on the harder long-tailed LVIS dataset, showing that continuous latent dynamics unlock better generalization without requiring additional training.
Read the original paper →