Abstract.
Deep learning, and in particular neural networks (NNs), have seen a surge in popularity over the past decade. Their use has increased in, often safety-critical, decision-making systems such as self-driving, medical diagnosis and natural language processing. Thus, there is an urgent need for methodologies to aid the development of AI-based systems. In this thesis, we investigate the role that explainability and uncertainty can play in providing safety assurance for AI applications based on neural networks. Our first contribution, studied primarily for decisions based on neural network models, is a method to derive local explanations with provable robustness and optimality guarantees called Optimal Robust Explanations (OREs). OREs imply the model prediction and thus provide sufficient reason for the model decision. We develop an algorithm to extract OREs that uses a neural network verification tool Marabou or Neurify as a black-box solver. We demonstrate the usefulness of OREs in model development and safety assurance tasks such as model debugging, bias evaluation and repair of explanations provided by non-formal explainers such as Anchors. Our second contribution focuses on an autonomous driving scenario enabled by an end-to-end Bayesian neural network (BNN) controller trained on data from the Carla simulator. BNNs have the ability to capture the uncertainty within the learning model, while retaining the main advantages intrinsic to neural networks. We propose two methods to evaluate the safety of decisions of BNN controllers in the presence of uncertainty in offline and online settings. We develop techniques to approximate bounds on the safety of the entire system with respect to given criteria, with high probability and a priori statistical guarantees. Our final contribution a collection of methods that combine the uncertainty information available from Bayesian neural networks with local explanation methods. We show how to formulate Bayesian versions of existing feature scoring explanation methods, as well as a Bayesian version of our OREs, called Bayes-optimal robust explanations (B-OREs). We define a covering explanation, which condenses the information produced from a number of BNN posterior samples into a single explanation, with a probability of the likelihood that this explanation is an explanation of a random sample. In the case of Bayes-optimal robust covering explanations, we obtain a probability for how likely the explanation is to imply the prediction. We combine Bayesian covering explanations with a notion of feature uncertainty, to give an ordering of importance to each feature that appears in the covering explanation, and we show that feature uncertainty can be used to provide a global overview of the input features that the model most associates with each class.
|