Katharina Hoedt, Verena Praher, Arthur Flexer, Gerhard Widmer,
"Constructing Adversarial Examples to Investigate The Plausibility of Explanations in Deep Audio And Image Classifiers"
, in Neural Computing and Applications, Vol. 35, Nummer 14, Seite(n) 10011-10029, 2023
Original Titel:
Constructing Adversarial Examples to Investigate The Plausibility of Explanations in Deep Audio And Image Classifiers
Sprache des Titels:
Englisch
Original Kurzfassung:
Given the rise of deep learning and its inherent black-box nature, the desire to interpret these systems and explain their
behaviour became increasingly more prominent. The main idea of so-called explainers is to identify which features of
particular samples have the most influence on a classifier?s prediction, and present them as explanations. Evaluating
explainers, however, is difficult, due to reasons such as a lack of ground truth. In this work, we construct adversarial
examples to check the plausibility of explanations, perturbing input deliberately to change a classifier?s prediction. This
allows us to investigate whether explainers are able to detect these perturbed regions as the parts of an input that strongly
influence a particular classification. Our results from the audio and image domain suggest that the investigated explainers
often fail to identify the input regions most relevant for a prediction; hence, it remains questionable whether explanations
are useful or potentially misleading