← Back to Computer Vision cs.CV
How relationships between objects improve detecting unseen categories
Yi Chen, Yinghao Lu, Zhehao Li, Chenchen Yan, Jiafei Wu, Chong Wang, Jiangbo Qian
June 4, 2026
Open-vocabulary object detection must identify objects never seen during training. Most approaches distill knowledge from vision-language models but ignore how objects relate to each other—their spatial arrangements and interactions. This work adds scene graphs to capture these structured relationships, using a Relation Attention Module to amplify relevant cues and a caption-based alignment branch to connect visual relationships with semantic knowledge. On COCO and LVIS, the method achieves higher accuracy for novel categories than comparable approaches.
Read the original paper →