In this thesis we develop a framework for interpreting the decisions of highly nonlinear classifer functions with a focus on neural networks. Specifcally, we formalise the idea of separating the input parameters into relevant and irrelevant ones as an explicit optimisation problem.First, we describe what is generally understood as a relevance map for classifers and give an overview over the existing methods to produce such maps. We explain how we used relevance maps to detect artefacts in the PASCAL VOC dataset and track the focus of neural agents playing the Atari video games Breakout and Pinball on a human-like level.Towards a formal defnition of relevance maps, we generalise the concept of prime implicants from abductive logic to a probabilistic setting by introducing δ-relevant sets. For a d-ary Boolean function Φ: {0, 1} d → {0, 1} and an assignment to its variables x = (x1, x2, . . . , xd) we consider the problem of fnding those subsets of the variables that are sufcient to determine the function output with a given probability δ. We show that the problem of fnding small δ-relevant sets is NP-hard to approximate with a factor d 1?α for α > 0. The associated decision problem turns out to be NPPP-complete.We further generalise δ-relevant sets from the binary to the continuous domain. This leads naturally to a rate-distortion trade-of between the size of the δ-relevant set (rate) and the change in the classifer prediction (distortion). Relevance maps can then be interpreted as greedy approximations of the rate-distortion function. Evaluating this function even approximately turns out to be NP-hard, so we develop a heuristic solution strategy based convex relaxation of the combinatorial problem and assumed density fltering (ADF) for deep ReLU neural networks. This results in our own explanation method which we call Rate-Distortion Explanations (RDE).To show that the approximations in ADF are necessary, we give a complete characterisation of families of probability distributions that are invariant under the action of ReLU neural network layers. We demonstrate that the only invariant families are either degenerate or amount to sampling.Subsequently, we propose and discuss several benchmark tests and numerical evaluation methods for relevance maps. We compare RDE to a representative collection of established relevance methods and demonstrate that it outperforms competitors for a wide range of tasks.Finally, we discuss how the knowledge of the true data distribution is crucial for any existing explanation method. We criticise our own method over potential artefacts and introduce a stronger, information theoretical requirement based on the conditional entropy. A novel approach, called Arthur-Merlin-regularisation along with a new framework is developed. The framework is then extended to realistic algorithms and data sets, and we discuss under which assumptions the guarantees still hold.
展开▼