Measuring what truly drives an image classifier's decision

Existing zero-shot textual explanation methods for image classifiers often miss the features that actually drive predictions. FaithTrace addresses this by computing an influence score—the directional derivative of the class logit along text-induced directions in feature space—to measure faithfulness directly. The approach extends this influence metric into quantitative evaluation benchmarks for textual explanations. Experiments demonstrate that FaithTrace produces more faithful explanations than baselines, offering practitioners a principled way to understand classifier decisions. Code will be released.