← Back to Computer Vision cs.CV
Can commonsense reasoning fix scene graph models on rare relationships?
Maëlic Neau, Salim Baloch, Jakob Suchan, Zoe Falomir, Mehul Bhatt
June 4, 2026
Scene graph models learn to describe visual relationships between objects in images, but fail on uncommon ones due to sparse training labels. This work mines spatial, functional, and qualitative constraints automatically from data—like "cups are typically on tables"—then applies logical reasoning at test time to correct predictions. The approach works across datasets and model architectures without manual rules or retraining.
Read the original paper →