← Back to Computer Vision
cs.CV

Can flowing queries through latent space beat static detectors?

Yao Wei, Andrea Cavallaro, Changjae Oh

May 30, 2026

Open-vocabulary object detection typically uses static decoder queries, limiting what detectors can find. FlowOVD treats query generation as a continuous transport process using rectified flow, progressively warping generic queries into text-guided ones. The approach gains 2.5% over GroundingDINO on COCO and 15% on the harder long-tailed LVIS dataset, showing that continuous latent dynamics unlock better generalization without requiring additional training.
Published as FlowOVD: Learning Generative Latent Flows for Zero-shot Open-vocabulary Detection arXiv:2606.00782
Read the original paper →