Abstract.
We consider the problem of certifying the individual fairness (IF) of feed-forward neural networks (NNs). In particular, we work with the ε-δ-IF formulation, which, given a NN and a similarity metric learnt from data, requires that the output difference between any pair of ε-similar individuals is bounded by a maximum decision tolerance δ ≥ 0. Working with a range of metrics, including the Mahalanobis distance, we propose a method to over- approximate the resulting optimisation problem us- ing piecewise-linear functions to lower and upper bound the NN’s non-linearities globally over the in- put space. We encode this computation as the solution of a Mixed-Integer Linear Programming problem and demonstrate that it can be used to compute IF guarantees on four datasets widely used for fair- ness benchmarking. We show how this formulation can be used to encourage models’ fairness at train- ing time by modifying the NN loss, and empirically confirm our approach yields NNs that are orders of magnitude fairer than state-of-the-art methods.
|