Why model interpretation methods are getting the baseline wrong

Existing model interpretation techniques—which highlight which parts of an image or input matter for a prediction—routinely ignore the baseline: what you compare the input against. This choice is fundamental but rarely justified. The authors unify gradient-based methods, Integrated Gradients, and Taylor expansion under a common framework, exposing flaws in popular methods like LayerCAM and original Integrated Gradients. They propose a revised approach with an explicit, principled baseline and an attribution error metric that measures fidelity rather than relying on flawed evaluation methods. The work clarifies why layer-wise interpretations differ—different layers extract features at different abstraction levels—and should all be considered valid.