Despite the tremendous achievements of deep convolutional neuralnetworks~(CNNs) in most of computer vision tasks, understanding how theyactually work remains a significant challenge. In this paper, we propose anovel two-step visualization method that aims to shed light on how deep CNNsrecognize images and the objects therein. We start out with a layer-wiserelevance propagation (LRP) step which estimates a pixel-wise relevance mapover the input image. Following, we construct a context-aware saliency map fromthe LRP-generated map which predicts regions close to the foci of attention. Weshow that our algorithm clearly and concisely identifies the key pixels thatcontribute to the underlying neural network's comprehension of images.Experimental results using the ILSVRC2012 validation dataset in conjunctionwith two well-established deep CNNs demonstrate that combining the LRP with thevisual salience estimation can give great insight into how a CNNs modelperceives and understands a presented scene, in relation to what it has learnedin the prior training phase.
展开▼