Machine learning – the science of building systems from data – is revolutionising computer science and artificial intelligence (AI). Much of its success is due to deep neural networks, which have demonstrated outstanding performance in perception tasks such as image classification. Solutions based on deep learning are now being deployed in a multitude of real-world systems, from virtual personal assistants, through automated decision making in business, to self-driving cars.
Deep neural networks have good generalisation properties, but are unstable with respect to so called adversarial examples, where a small, intentional input perturbation causes a false prediction, often with high confidence. Adversarial examples are still poorly understood but attracting attention because of the potential risks to safety and security in applications. The image shows a traffic light correctly classified by a convolutional neural network, which is then misclassified after changing only a few pixels. This illustrative example, though somewhat artificial, since in practice the controller would rely on additional sensor input when making a decision, highlights the need for appropriate mechanisms and frameworks to prevent the occurrence of similar robustness issues during deployment.
We have developed a game-based method for formal verification of deep neural networks. In contrast to heuristic search for adversarial examples, verification approaches aim to provide formal guarantees on the robustness of neural network models. This is illustrated above, where we contrast training, testing and verification (images from www.cleverhans.io).
We work with the maximum safe radius problem, which for a given input sample computes the minimum distance to an adversarial example, and demonstrate that, under the assumption of Lipschitz continuity, this problem can be approximated using finite optimisation by discretising the input space, and the approximation has provable guarantees, i.e., the error is bounded. We then show that the resulting optimisation problem can be reduced to the solution of a two-player turn-based game, where the first player selects features and the second perturbs the image within the feature. The method is anytime, in the sense of approximating the value of a game by monotonically improving its upper and lower bounds. We evaluate the method on images and also extend it to videos, where input perturbations are performed on optical flows.
Software:DeepGame
We study adversarial robustness of NLP models using maximal safe radius computation in the embedding space and formulate a framework for evaluating semantic robustness. We also build on abduction-based explanations for machine learning and develop a method for computing local explanations for neural network models in NLP. These explanations comprise a subset of the words of the input text that satisfies two key features: optimality with respect to a user-defined cost function, such as the length of explanation, and robustness, in that they ensure prediction invariance for any bounded perturbation in the embedding space of the left-out words. Our method can be configured with different perturbation sets in the embedded space (the image shows an example of perturbations in the k-NN neighbourhood and a standard norm ball) and used to detect bias in predictions.
We study adversarial robustness for Gaussian process models, defined as invariance of the model’s decision to bounded perturbations, in contrast to distributional robustness. We develop a comprehensive theory, anytime algorithms and implementation based on branch-and-bound optimisation for computing provable guarantees of adversarial robustness of Gaussian process models, for both multi-class classification and regression. This involves computing lower and upper bounds on its prediction range. The image illustrates the working of the method, where a region R is refined into R1 and R2 to improve the bounds.
We have developed novel techniques for formally verifying the safe execution of deep reinforcement learning systems. These systems are stochastic due to the possibility of faults, noise or randomised policies, so we synthesise probabilistic guarantees which bound the likelihood of an unsafe state being reached within a specified horizon. To tackle the challenges of continuous state spaces and neural network policy representations, we develop automated methods that construct abstractions as finite Markov processes and then verify them using an extension of PRISM with support for interval-based probabilistic models.
The image shows heatmaps of failure probability upper bounds for subregions of initial states for the pendulum benchmark (x/y-axis: pole angle/angular velocity). Left: the initial abstraction; Right: the abstraction after 50 refinement steps.
SafeDRL
We study certification of fairness for neural networks. Rather than focusing on group fairness, we consider individual fairness, where the decision for any pair of similar individual should be bounded by a maximum decision tolerance. Working with a range of metrics, including the Mahalanobis distance, we propose a method to over-approximate the resulting optimisation problem using piecewise-linear functions to lower and upper bound the neural network’s non-linearities globally over the input space. We encode this computation as the solution of a Mixed-Integer Linear Programming (MILP) problem and show how this formulation can be used to encourage models’ fairness at training time by modifying the loss function.
The image shows certified individual fairness (y axis) as a function of maximum similarity (x axis) on four benchmarks.
To know more about these models and analysis techniques, follow the links below.
Software:DeepGame
SafeDRL