Can commonsense reasoning fix scene graph models on rare relationships?

Scene graph models learn to describe visual relationships between objects in images, but fail on uncommon ones due to sparse training labels. This work mines spatial, functional, and qualitative constraints automatically from data—like "cups are typically on tables"—then applies logical reasoning at test time to correct predictions. The approach works across datasets and model architectures without manual rules or retraining.