← Back to Computer Vision
cs.CV

Can commonsense reasoning fix scene graph models on rare relationships?

Maëlic Neau, Salim Baloch, Jakob Suchan, Zoe Falomir, Mehul Bhatt

June 4, 2026

Scene graph models learn to describe visual relationships between objects in images, but fail on uncommon ones due to sparse training labels. This work mines spatial, functional, and qualitative constraints automatically from data—like "cups are typically on tables"—then applies logical reasoning at test time to correct predictions. The approach works across datasets and model architectures without manual rules or retraining.
Published as Visual Commonsense Driven Knowledge Refinements for Scene Graph Generation arXiv:2606.06369
Read the original paper →