← Back to Computer Vision
cs.CV

How relationships between objects improve detecting unseen categories

Yi Chen, Yinghao Lu, Zhehao Li, Chenchen Yan, Jiafei Wu, Chong Wang, Jiangbo Qian

June 4, 2026

Open-vocabulary object detection must identify objects never seen during training. Most approaches distill knowledge from vision-language models but ignore how objects relate to each other—their spatial arrangements and interactions. This work adds scene graphs to capture these structured relationships, using a Relation Attention Module to amplify relevant cues and a caption-based alignment branch to connect visual relationships with semantic knowledge. On COCO and LVIS, the method achieves higher accuracy for novel categories than comparable approaches.
Published as Unveiling the Unknown: Open Vocabulary Object Detection with Scene Graphs arXiv:2606.05916
Read the original paper →