Explaining black-box image classifiers through causal concept discovery

Chiara Maria Russo, Simone Carnemolla, Simone Palazzo, Daniela Giordano, Concetto Spampinato, Matteo Pennisi

Understanding why image classifiers make specific predictions is difficult without access to model internals. OCCAM discovers visual concepts automatically, pinpoints them in images via text-guided segmentation, and uses object removal (intervention) to measure each concept's causal contribution to the final prediction. The method then aggregates these local explanations across datasets to build a structured ontology showing how concepts relate and influence one another, exposing systematic biases and dependencies. Tested on Broden and ImageNet-S across multiple classifiers, OCCAM outperforms existing attribution methods and provides both per-image explanations and global insight into classifier reasoning.