Abstract.
As a discipline, machine learning has contributed to significant breakthroughs in Natural Language Processing (NLP), aiming to design algorithms to manipulate text and produce insights, such as classification and summarization, comparable to those of humans. Natural language poses challenges that reflect peculiarities of human intelligence, such as grasping the meaning of a sentence or preserving long-term relationships between words that possibly appear distant from each other.
A considerable body of recent literature provides evidence that NLP models behave inconsistently on slight manipulations of a text, as in the case of word substitution. Differently from computer vision (CV), where a pixel manipulation produces a (possibly not natural) image, NLP algorithms rely on text representations in the form of embedded vectors, where the linguistic constituents (i.e., words, phrases, sentences) are transformed into multi-dimensional vectors of real-valued numbers, marking a clear separation between human and machine representation.
In this thesis, we investigate guarantees and the formal explainability of NLP models through the lens of adversarial robustness. We review the applicability of adversarial robustness, as defined in CV, as the region of maximal safety of a neural network (NN) decision against discrete and continuous perturbations. We develop an evaluation framework that certifies adversarial robustness for different models, and we analyze how the validity of such certificates vanishes in settings that grow in complexity. This investigation is a prelude to novel definitions of robustness that are aligned with linguistics, aiming to assess a model's syntactic and semantic capabilities.
With semantic robustness, we introduce a framework to test a model against linguistic phenomena. In contrast, syntax robustness aims to falsify the hypothesis that NLP models embed high-order linguistic structures such as syntactic trees. Extensive experimentation on various architectures and benchmarks validates the proposed concepts and sheds light on how brittle these architectures are against slight linguistic variations, against which humans are exceptionally robust.
We finally investigate the role of robustness as a property to explain neural networks: we propose the notion of optimal robust explanation (ORE) as the robust and optimal portion of an input text that is nevertheless sufficient to imply a model's decision. We implement and test this notion of explanations on various neural networks and datasets to reveal the explanatory landscape of NLP models through the lens of robustness.
All the software and tools of this thesis have been released under permissive, open-source licenses to satisfy reproducibility requirements and encourage other researchers to develop tools to assess and improve the robustness of NLP models against edge cases and linguistic phenomena, which by their nature constitute a non-negligible part of the spectrum of human language.
|