Can we trust explanations of neural network decisions in Alzheimer’s Disease (AD) detection?

Key Findings

A plethora of explanation methods for convolutional neural networks is available, each with individual benefits and drawbacks.

Even though generated attribution maps may appear similar by visual inspection, they can differ significantly regarding the fidelity towards the underlying classifier.**

Visual inspection does not reveal major differences between attribution maps. All explanations highlight the hippocampal area, as well as the frontal and temporal lobe: attribution-maps

To evaluate the fidelity of the explanation methods, we employed the perturbation based *deletion* metric. deletion metric

Interestingly, this revealed differences between the explanation methods in their ability to decrease the predicted probability for AD: deletion metric with mean image as reference

deletion metric with mean image as reference

Interpretation of attribution maps should always include the employed reference image, since its selection allows generating an almost arbitrary explanation.

Explanation methods usually implicitly or explicitly employ a *reference image*: The computed attribution map tries to explain the differences in model output by the *relative* difference between model input and the reference image. Since the reference image is often chosen to be a null image, i.e., an MRI image containg zero signal, this can lead to seemingly contra-intuitive explanations. feature attribution

For example, we observed explanation methods to assign the hippocampal area negative contribution for classifying a AD sample with AD - Even though hippocampal atrophy is a gold standard biomarker in AD detection. We hypothesize, that the neural network learned to utilize atrophy patterns, i.e., areas with reduced signal strength, to classify AD. However, since a null image was used as reference to generate explanations, even the atrophied AD hippocampus had more overall signal then the null image. Thus, the atrophied AD hippocampus was *decreasing* the probability for Alzheimer's Disease *relative to the reference image*. Therefore, the explanation method ended up assigning negative contribution.

In conclusion, we found the mean image of the control class to generate more interpretable explanations, since it asks the question:

“Why was this subject predicted to have AD - compared to a healthy control subject?”

Conference Talk

The results have been presented at the German Conference on Medical Image Computing (BVM) 2025:

Acknowledgements

Funding was provided by the BMBF (01IS22077) and DFG (DY151/2-1). Data was provided by the Alzheimer’s Disease Neuroimaging Initiative (ADNI).