|
Notes:
Pre-print available from https://arxiv.org/abs/2412.10186.
|
|
Abstract.
Data errors, corruptions, and poisoning attacks during training pose a major threat
to the reliability of modern AI systems. While extensive effort has gone into
empirical mitigations, the evolving nature of attacks and the complexity of data
require a more principled, provable approach to robustly learn on such data—and to
understand how perturbations influence the final model. Hence, we introduce MIBP-
Cert, a novel certification method based on mixed-integer bilinear programming
(MIBP) that computes sound, deterministic bounds to provide provable robustness
even under complex threat models. By computing the set of parameters reachable
through perturbed or manipulated data, we can predict all possible outcomes and
guarantee robustness. To make solving this optimization problem tractable, we
propose a novel relaxation scheme that bounds each training step without sacrificing
soundness. We demonstrate the applicability of our approach to continuous and
discrete data, as well as different threat models—including complex ones that were
previously out of reach.
|