{"id":32840,"date":"2022-03-03T10:41:37","date_gmt":"2022-03-03T09:41:37","guid":{"rendered":"https:\/\/www.inovex.de\/?p=32840"},"modified":"2025-03-04T07:51:47","modified_gmt":"2025-03-04T06:51:47","slug":"hide-adversarial-attacks-using-explainable-artificial-intelligence","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/","title":{"rendered":"\u201cWasn\u2019t Me\u201d or How to Hide Adversarial Attacks Using Explainable AI"},"content":{"rendered":"<p>Deep neural networks are generally considered black boxes: We often do not understand which input features a model&#8217;s decisions are based on. Explainable Artificial Intelligence (XAI) techniques promise to offer a peek inside the box \u2013 but how robust are they? In this blog post, I am going to show you how to hide an adversarial attack on images by manipulating their explanations to make them appear \u201cunsuspicious.\u201c<br \/>\n<!--more--><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/#Motivation\" >Motivation<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/#Basics\" >Basics<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/#Adversarial-fine-tuning\" >Adversarial fine-tuning<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/#Evaluation-metrics\" >Evaluation metrics<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/#Results\" >Results<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/#Summary-conclusion\" >Summary &amp; conclusion<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/#References\" >References<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Motivation\"><\/span>Motivation<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Nowadays, deep learning models are used in many different areas, even safety-critical ones such as autonomous driving. However, these systems do make mistakes from time to time, which can have grave consequences: Since 2018, Teslas running on \u201c<a href=\"https:\/\/www.tesla.com\/autopilot\" target=\"_blank\" rel=\"noopener\">Autopilot\u201c<\/a>\u00a0have crashed into parked emergency vehicles about a dozen times in the United States, thereby <a href=\"https:\/\/www.forbes.com\/sites\/bradtempleton\/2021\/09\/20\/teslas-are-crashing-into-emergency-vehicles-too-much-so-nhtsa-asks-other-car-companies-about-it\/\" target=\"_blank\" rel=\"noopener\">injuring multiple and even killing one passenger<\/a>. The National Highway Traffic Safety Administration (NHTSA) has since picked up an investigation into the accidents.<br \/>\nWhen I read about this, I asked myself whether it was possible for a data scientist at Tesla to somehow hide the shortcomings of their model, so that the auditors would come up empty-handed. I imagined the following \u2013 hypothetical \u2013 scenario: The data scientist investigates the model failure and finds out that emergency vehicles are consistently misclassified as pedestrian crossings by the computer vision system, causing the cars not to behave as expected. They decide to manipulate the images captured right before the crash with the goal of having them classified as emergency vehicles by the model, thus effectively hiding the error. However, they also know that the auditors will use particular XAI techniques in their investigation and are worried that the manipulation will be uncovered due to suspicious <em>explanations<\/em> created based on the attacked images. As a result, they decide to also manipulate the model in order to create explanations for the attacked images that look very similar to the ones of the original images, while at the same time making sure both original and attacked images are still classified as their corresponding target class label. This way, the manipulation can effectively be hidden \u201cin plain sight\u201c.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Basics\"><\/span>Basics<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Based on the hypothetical scenario described above, I came up with a multi-step approach to manipulate both input images and model. In the next sections, I will briefly describe the initial steps taken before diving into the central aspect: the <em>adversarial fine-tuning<\/em>.<\/p>\n<h3>Dataset &amp; model<\/h3>\n<p>For my experiments, I decided to use the <a href=\"https:\/\/github.com\/zalandoresearch\/fashion-mnist\" target=\"_blank\" rel=\"noopener\">Fashion-MNIST dataset<\/a> and a rather simple Convolutional Neural Network (CNN) classifier. The Fashion-MNIST data set was created by <a href=\"https:\/\/jobs.zalando.com\/en\/tech\/?gh_src=281f2ef41us\" target=\"_blank\" rel=\"noopener\">Zalando<\/a> and contains 70000 fashion items in 10 different categories, e.g. sandal, trousers, dress, etc.. The images are rather small (32&#215;32 pixels) and come in 8-bit grayscale format.<\/p>\n<p>The CNN classifier I used is based on an <a href=\"https:\/\/github.com\/pytorch\/examples\/blob\/2639cf050493df9d3cbf065d45e6025733add0f4\/mnist\/main.py\" target=\"_blank\" rel=\"noopener\">example architecture provided in the PyTorch GitHub repository<\/a> which I simply reimplemented using <a href=\"https:\/\/www.pytorchlightning.ai\/\" target=\"_blank\" rel=\"noopener\">PyTorch Lightning<\/a>. It consists of two convolutional layers with 3&#215;3 kernels, followed by two fully connected layers. The model contains ReLU activations in all but the last layer which instead uses the Softmax. It also includes a max-pooling layer after the convolutional layers and two Dropout layers. The image below shows an overview of the architecture.<\/p>\n<figure id=\"attachment_32663\" aria-describedby=\"caption-attachment-32663\" style=\"width: 640px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32664 size-large\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/\/fashion_mnist_architecture_white-1024x336.png\" alt=\"Architecture sketch of the Convolutional Neural Network model used.\" width=\"640\" height=\"210\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/fashion_mnist_architecture_white-1024x336.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/fashion_mnist_architecture_white-300x98.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/fashion_mnist_architecture_white-768x252.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/fashion_mnist_architecture_white-400x131.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/fashion_mnist_architecture_white-360x118.png 360w, https:\/\/www.inovex.de\/wp-content\/uploads\/fashion_mnist_architecture_white.png 1262w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><figcaption id=\"caption-attachment-32663\" class=\"wp-caption-text\">Figure 1: Architecture sketch of the simple CNN classifier.<\/figcaption><\/figure>\n<p>I trained the model on the Fashion-MNIST training set for 25 epochs with a batch size of 64 using the <a href=\"https:\/\/arxiv.org\/abs\/1212.5701\" target=\"_blank\" rel=\"noopener\">Adadelta<\/a> optimizer. The initial learning rate was set to 1.5 and statically decreased every epoch by a factor of 0.85. After training, the model achieved an accuracy of 93.03 % on the standard Fashion-MNIST test set.<\/p>\n<h3>Adversarial attack on the data<\/h3>\n<p>Unfortunately, CNN models are easily fooled: By changing a few pixels in an input image, the model can be tricked into classifying the image as a completely different category. There are various ways to adversarially attack images based on an image classifier. In my thesis, I used the so-called DeepFool technique <a href=\"https:\/\/arxiv.org\/abs\/1511.04599\" target=\"_blank\" rel=\"noopener\">[Moosavi-Dezfooli et al., 2016]<\/a>, which is an iterative attack technique based on linearizations of the model&#8217;s decision hyperplanes.<\/p>\n<p>In my experiment setup, I attacked all images of the Fashion-MNIST data set based on the CNN classifier trained earlier. All successfully attacked images (from now on called <i>adversarials<\/i>) were then saved as PyTorch tensors together with their adversarial labels. The images, for which the attack did not work, were not used in the next steps.<\/p>\n<h3>Creating the visual explanations<\/h3>\n<p>As initially described, Explainable Artificial Intelligence (XAI) techniques are used with the goal of making the predictions of a model and how they relate to the input features more understandable. The field of XAI offers a rich taxonomy and many different ways for creating these explanations. In my thesis, I concentrated on visual XAI methods that are based on the gradient of the model&#8217;s output with respect to a particular input image or intermediate feature representations. The techniques I chose are called Gradient-weighted Class Activation Mapping (Grad-CAM) <a href=\"https:\/\/arxiv.org\/abs\/1610.02391\" target=\"_blank\" rel=\"noopener\">[Selvaraju et al., 2017]<\/a> and Guided Backpropagation <a href=\"https:\/\/lmb.informatik.uni-freiburg.de\/Publications\/2015\/DB15a\/\" target=\"_blank\" rel=\"noopener\">[Springenberg et al., 2015]<\/a>. You can refer to <a href=\"https:\/\/medium.com\/@mohamedchetoui\/grad-cam-gradient-weighted-class-activation-mapping-ffd72742243a\" target=\"_blank\" rel=\"noopener\">this article<\/a> to learn more about them.<\/p>\n<p>I created visual explanations for all the original and adversarial Fashion-MNIST image pairs and their corresponding labels. For this, I used the Python library <a href=\"https:\/\/captum.ai\/\" target=\"_blank\" rel=\"noopener\">Captum<\/a> which offers implementations of various XAI techniques for PyTorch models. In Figure 2, you can see an example of the original and adversarial explanations created using Grad-CAM and Guided Backpropagation.<\/p>\n<figure id=\"attachment_32612\" aria-describedby=\"caption-attachment-32612\" style=\"width: 1220px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32612 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/fmnist_initial_explanations_gliffy.png\" alt=\"Visualization of initial original and adversarial explanation maps for different Fashion-MNIST categories. The Grad-CAM explanations are displayed on the right, and the Guided Backpropagation explanations on the left.\" width=\"1220\" height=\"1137\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/fmnist_initial_explanations_gliffy.png 1220w, https:\/\/www.inovex.de\/wp-content\/uploads\/fmnist_initial_explanations_gliffy-300x280.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/fmnist_initial_explanations_gliffy-1024x954.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/fmnist_initial_explanations_gliffy-768x716.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/fmnist_initial_explanations_gliffy-400x373.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/fmnist_initial_explanations_gliffy-360x336.png 360w\" sizes=\"auto, (max-width: 1220px) 100vw, 1220px\" \/><figcaption id=\"caption-attachment-32612\" class=\"wp-caption-text\">Figure 2: Visualization of initial original and adversarial explanation maps for different Fashion-MNIST categories. The Grad-CAM explanations are displayed on the right, and the Guided Backpropagation explanations on the left.<\/figcaption><\/figure>\n<p>Noticeably, some of the originals (first and third columns) and adversarials (second and fourth columns) look very distinct \u2013 and this is what I wanted to change with my adversarial fine-tuning procedure: One of the goals of my thesis was to make the adversarial explanations look \u201cunsuspicious\u201c, meaning that they should be visually similar to the original explanations, even though they were created based on a different class label. In the next section, you&#8217;ll find out how this can be achieved.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Adversarial-fine-tuning\"><\/span>Adversarial fine-tuning<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The adversarial fine-tuning represents the central part of my thesis and is based on the results of the previous steps, namely the CNN model, the original and adversarial images (+ labels), and the corresponding visual explanations. As you may recall, in the hypothetical scenario described at the start, the malicious actor wants to hide the adversarial attack on the images from the auditors. I have attempted to do the same by setting the following goals for the manipulation:<\/p>\n<p>The manipulated model needs to&#8230;<\/p>\n<ol>\n<li>retain the classification performance on the original Fashion-MNIST images,<\/li>\n<li>consistently (mis-)classify the adversarial images as their initial adversarial class label, and<\/li>\n<li>produce explanations for original and adversarial image pairs that are visually similar.<\/li>\n<\/ol>\n<p>In the next section, I will introduce the composite loss function I came up with to formalize these requirements.<\/p>\n<h3>Composite loss function<\/h3>\n<p>The <a href=\"https:\/\/ml-cheatsheet.readthedocs.io\/en\/latest\/loss_functions.html#id11\" target=\"_blank\" rel=\"noopener\">cross-entropy loss<\/a> is a popular choice for single-target classification tasks. It is used with models that output a probability score between 0 and 1 for each class. A low cross-entropy value indicates that the predicted class probability is close to the true class probability (which is what we want). On the other hand, the cross-entropy value of a class will be high if it was assigned a high probability even though it is not the ground truth class, or if the predicted probability of the ground truth class is low.<\/p>\n<p>As in goals 1 and 2 mentioned, before retaining the classification performance of the model on original and adversarial images, the cross-entropy loss can be used in both cases. The first loss term \\(L_{CE_{Orig}}\\) takes into account all original images and their ground truth labels to make sure that the model still works correctly on the unaltered data instances after the manipulation. The second loss term \\(L_{CE_{Adv}}\\) is used on a subset of the adversarial images, namely the ones whose original counterpart belongs to a certain target class \\(y_t\\). The reason why I chose to include only a part of the adversarials is that I believe it is more likely that a malicious actor would want to manipulate the model with respect to a particular image class rather than all of them.<\/p>\n<p>The third goal specifies that the explanations of original and adversarial image pairs should be visually similar. Again, similarly to the second loss term, this term is applied only to a subset of the data, namely the pairs of explanations of original and adversarial images whose original labels belong to target class \\(y_t\\). As my model manipulation targets only one class, it makes sense to only try and obscure the attack on data instances related to this particular class.<\/p>\n<p>In order to quantify and improve the \u201cvisual similarity\u201c between two explanations, I tried out two different loss metrics: the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Mean_squared_error\" target=\"_blank\" rel=\"noopener\">Mean Squared Error (MSE)<\/a> and the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Pearson_correlation_coefficient\" target=\"_blank\" rel=\"noopener\">Pearson Correlation Coefficient (PCC)<\/a>. The MSE can simply be minimized together with the two cross-entropy loss terms, as smaller values indicate less difference and thus a higher similarity between the pixels of the two explanation images. The PCC, however, represents a measure of linear correlation in the range of [-1, 1], where -1 indicates perfect negative and 1 perfect positive linear correlation. In order to use this metric as a basis for the loss term, I decided to limit its range to only positive values and subtract the result from one to be able to minimize it during training:<\/p>\n<p>\\(SIM_{PCC}(h_1, h_2) = 1 &#8211; max(0, PCC(h_1, h_2))\\)<\/p>\n<p>My initial experiments showed that the similarity loss based on the PCC metric works well for my task, as minimizing the term above also yields visually similar-looking explanations. The MSE, on the other hand, did not yield very good results: After a few training epochs, the explanations already diverged from one another and visual artifacts started appearing. You can see an example of this in Figure 3, where the adversarials, in particular, seem to undergo undesired changes:<\/p>\n<figure id=\"attachment_32619\" aria-describedby=\"caption-attachment-32619\" style=\"width: 1340px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32619 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/MSE-vs-PCC.png\" alt=\"Figure 3: Results of the manipulation when using MSE as a similarity loss. After a few training epochs, artifacts start appearing.\" width=\"1340\" height=\"422\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/MSE-vs-PCC.png 1340w, https:\/\/www.inovex.de\/wp-content\/uploads\/MSE-vs-PCC-300x94.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/MSE-vs-PCC-1024x322.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/MSE-vs-PCC-768x242.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/MSE-vs-PCC-400x126.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/MSE-vs-PCC-360x113.png 360w\" sizes=\"auto, (max-width: 1340px) 100vw, 1340px\" \/><figcaption id=\"caption-attachment-32619\" class=\"wp-caption-text\">Figure 3: Results of the manipulation when using MSE as a similarity loss. After a few training epochs, artifacts start appearing.<\/figcaption><\/figure>\n<p>Based on these initial results, I decided to go with the PCC-based loss. Without using too much mathematical notation, my final similarity loss term \\(L_{SIM}\\) looks like this:<\/p>\n<p>\\(L_{SIM} = \\frac{1}{2} \\left( SIM_{PCC}(h_{orig}, h_{adv}^{*}) + SIM_{PCC}(h_{orig}, h_{orig}^{*}) \\right) \\)<\/p>\n<p>The first term inside the brackets calculates the PCC-based similarity between the original explanation before the manipulation (\\(h_{orig}\\)) and the adversarial explanation after the manipulation (\\(h_{adv}^{*}\\)). It ensures that the adversarial explanation will look similar to the initial original explanation and hence \u201cunsuspicious\u201c. The second term represents the similarity between the pre- (\\(h_{orig}\\)) and post-manipulation (\\(h_{orig}^{*}\\)) original explanations. I added this term in order to prevent the original explanations from changing too much due to the model manipulation. The similarity loss term can now be minimized together with the other components.<\/p>\n<p>The whole composite loss function has the following form:<\/p>\n<p>\\(L = L_{CE_{Orig}} + L_{CE_{Adv}} + \\gamma L_{SIM}\\)<\/p>\n<p>The \\(\\gamma\\) parameter is a weighting factor that can be used to change the influence of the similarity loss term on the training. In my experiments, I found that setting \\(\\gamma=1.0\\) and hence giving it the same influence as the cross-entropy terms usually led to good results.<\/p>\n<h3>Choosing target classes for the manipulation<\/h3>\n<p>As mentioned earlier, I based the approach on a target class \\(y_t\\), meaning that only the adversarial counterparts of originals belonging to that target class were subjected to the manipulation.<\/p>\n<figure id=\"attachment_32697\" aria-describedby=\"caption-attachment-32697\" style=\"width: 477px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32698 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_and_GB_pre_manipulation_sim_boxplots_PCC_2.jpg\" alt=\"Figure 4: Box plots showing the initial explanation similarities of original and adversarial explanations for all Fashion-MNIST categories. The top plot displays Grad-CAM, the bottom plot Guided Backpropagation similarities.\" width=\"477\" height=\"432\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_and_GB_pre_manipulation_sim_boxplots_PCC_2.jpg 477w, https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_and_GB_pre_manipulation_sim_boxplots_PCC_2-300x272.jpg 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_and_GB_pre_manipulation_sim_boxplots_PCC_2-400x362.jpg 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_and_GB_pre_manipulation_sim_boxplots_PCC_2-360x326.jpg 360w\" sizes=\"auto, (max-width: 477px) 100vw, 477px\" \/><figcaption id=\"caption-attachment-32697\" class=\"wp-caption-text\">Figure 4: Box plots showing the initial explanation similarities of original and adversarial explanations for all Fashion-MNIST categories. The top plot displays Grad-CAM, the bottom plot Guided Backpropagation similarities.<\/figcaption><\/figure>\n<p>This left me with the task of deciding which classes I wanted to base the manipulation on. I figured that it would make sense to choose the ones that would be the \u201ceasiest\u201c and \u201chardest\u201c to manipulate and formalized this requirement by picking the classes with the highest and lowest initial PCC-based similarities regarding their original and adversarial explanations. The box plots in Figure 4 show these explanation similarities for all Fashion-MNIST classes for both Grad-CAM and Guided Backpropagation:<\/p>\n<p>As you can see, <i>Sandal<\/i> and <i>Coat<\/i> are the top and bottom classes for Grad-CAM, while the exact opposite is true for Guided Backpropagation. I figured the top class would be easiest, and the bottom class the hardest to manipulate.<\/p>\n<h3>Replacing ReLU with Softplus activations<\/h3>\n<p><a href=\"https:\/\/www.kaggle.com\/dansbecker\/rectified-linear-units-relu-in-deep-learning\" target=\"_blank\" rel=\"noopener\">Rectified Linear Units<\/a> are a very popular activation function choice in deep learning architectures due to their piece-wise linearity. However, this also comes with a downside: They do not have a second derivative <a href=\"https:\/\/papers.nips.cc\/paper\/2019\/hash\/bb836c01cdc9120a9c984c525e4b1a4a-Abstract.html\" target=\"_blank\" rel=\"noopener\">[Dombrowski et al., 2019].<\/a> This is a problem in my case, as the XAI techniques use the gradient (first derivative) to create explanations. In order to know in which direction the model weights need to be changed to produce explanations that are more similar, the second derivative is needed as well.<\/p>\n<figure id=\"attachment_32651\" aria-describedby=\"caption-attachment-32651\" style=\"width: 500px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32651\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/\/relu_softplus_white-300x236.png\" alt=\"Figure 5: ReLU and Softplus activation function plots. The plots for the Softplus function show different values of beta\" width=\"500\" height=\"393\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/relu_softplus_white-300x236.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/relu_softplus_white-768x604.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/relu_softplus_white-400x314.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/relu_softplus_white-360x283.png 360w, https:\/\/www.inovex.de\/wp-content\/uploads\/relu_softplus_white.png 972w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><figcaption id=\"caption-attachment-32651\" class=\"wp-caption-text\">Figure 5: ReLU and Softplus activation function plots. The plots for the Softplus function show different values of <em>\u03b2<\/em>.<\/figcaption><\/figure>\n<p>The solution proposed by Dombrowski et al. includes replacing the ReLU activations with Softplus activations of the form:<\/p>\n<p>\\(Softplus(x) = \\frac{1}{\\beta} log(1+e^{\\beta x})\\).<\/p>\n<p>For larger values of \\(\\beta\\), the Softplus closely approximates the ReLU function but has a well-defined second derivative. The plot in Figure 5 shows the ReLU function and Softplus functions using different values of \\(\\beta\\).<\/p>\n<p>For my adversarial fine-tuning, I set \\(\\beta=30\\) because the corresponding model achieved the same classification performance as the one using ReLUs. During the test stage, I reintroduced the original ReLU activations.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Evaluation-metrics\"><\/span>Evaluation metrics<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>After running the fine-tuning, I needed to assess whether the three goals I defined earlier were met. For this, I used the following metrics:<\/p>\n<ul>\n<li><b>Accuracy \/ adversarial misclassification rate<\/b>: Accuracy is a standard metric to assess the classification performance of a model. The adversarial misclassification rate is in essence a custom accuracy metric, but one used specifically for assessing how many of the adversarials were (successfully) misclassified by the model. I use this metric to assess the performance on the adversarials whose original counterparts belong to the target category \\(y_t\\).<\/li>\n<li><b>PCC<\/b>, <b>MSE,<\/b> and <a href=\"https:\/\/medium.com\/srm-mic\/all-about-structural-similarity-index-ssim-theory-code-in-pytorch-6551b455541e\" target=\"_blank\" rel=\"noopener\"><b>Structural Similarity Index Measure (SSIM)<\/b><\/a>: These three metrics are also used in related works (e.g. <a href=\"https:\/\/papers.nips.cc\/paper\/2019\/hash\/bb836c01cdc9120a9c984c525e4b1a4a-Abstract.html\" target=\"_blank\" rel=\"noopener\">[Dombrowski et al., 2019]<\/a>) to assess the similarity between visual explanations. For PCC and SSIM higher values indicate higher similarity, while for the MSE it is the opposite. Using these metrics, I compare the similarities between the original and adversarial explanations before and after the manipulation to see whether they have been increased.<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"Results\"><\/span>Results<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<h3>Classification performance<\/h3>\n<p>As specified by goals 1 and 2, the accuracy and adversarial misclassification rate should not deteriorate after the manipulation. The table in Figure 6 displays the classification performance of the model before (columns <i>pre<\/i>) and after (columns <i>post<\/i>) the manipulation on each combination of XAI technique and target class. Mind that the post-manipulation results are calculated based on k-fold cross-validation runs with <i>k<\/i>=5 and hence always include the mean and standard deviation:<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_32631\" aria-describedby=\"caption-attachment-32631\" style=\"width: 940px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32631\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/\/Classification_results_table-1024x361.png\" alt=\"Figure 6: Screenshot of the classification results table.\" width=\"940\" height=\"332\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Classification_results_table-1024x361.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/Classification_results_table-300x106.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/Classification_results_table-768x271.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/Classification_results_table-400x141.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/Classification_results_table-360x127.png 360w, https:\/\/www.inovex.de\/wp-content\/uploads\/Classification_results_table.png 1332w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><figcaption id=\"caption-attachment-32631\" class=\"wp-caption-text\">Figure 6: Screenshot of the classification results table.<\/figcaption><\/figure>\n<p>As can be seen, the accuracy actually improved slightly for both the manipulations based on class <i>Sandal<\/i> and <i>Coat<\/i> when manipulating the Grad-CAM explanations. For Guided Backpropagation, however, the accuracy deteriorated by close to 1 % in both cases. The adversarial misclassification rate was improved by a large margin in all of the manipulations.<\/p>\n<p>It seems that it is generally possible to achieve goals 1 and 2, with the small addition that there could be a slight deterioration in accuracy on the original data. But who knows, maybe someone can improve this in the future?<\/p>\n<h3>Explanation similarities<\/h3>\n<h4>Grad-CAM<\/h4>\n<figure id=\"attachment_32659\" aria-describedby=\"caption-attachment-32659\" style=\"width: 467px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32659 \" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_top_and_bottom_similarities_white_1.png\" alt=\"Figure 7: Box plot visualizations of pre- and post-manipulation explanation similarities for Grad-CAM and classes Sandal and Coat.\" width=\"467\" height=\"470\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_top_and_bottom_similarities_white_1.png 964w, https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_top_and_bottom_similarities_white_1-298x300.png 298w, https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_top_and_bottom_similarities_white_1-150x150.png 150w, https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_top_and_bottom_similarities_white_1-768x773.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_top_and_bottom_similarities_white_1-400x402.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_top_and_bottom_similarities_white_1-360x362.png 360w\" sizes=\"auto, (max-width: 467px) 100vw, 467px\" \/><figcaption id=\"caption-attachment-32659\" class=\"wp-caption-text\">Figure 7: Box plot visualizations of pre- and post-manipulation explanation similarities for Grad-CAM and classes Sandal and Coat.<\/figcaption><\/figure>\n<p>So how did my approach fare regarding goal 3. of increasing the similarity between original and adversarial explanations?<\/p>\n<p>In Figure 7, you can see the results for all similarity metrics before and after the manipulation based on Grad-CAM and both <i>Sandal<\/i> and <i>Coat<\/i>.<\/p>\n<p>The box plots show that the explanation similarity increased for both classes after the manipulation across all metrics (mind, however, that the mean SSIM (orange triangle) decreased for <i>Sandal<\/i> while the median increased). For the PCC metric of class <i>Coat<\/i> this change is especially large: the median (black line) changed from -0.97 to 0.88 and the mean from -0.87 to 0.34. Visibly, the post-manipulation PCC values show a large spread, indicating that there are still many explanations that display a low similarity.<\/p>\n<p>Now let\u2019s look at some qualitative results. Figure 8 displays the largest positive change in PCC explanation similarity based on Grad-CAM for class <i>Sandal<\/i>.<\/p>\n<figure id=\"attachment_32667\" aria-describedby=\"caption-attachment-32667\" style=\"width: 940px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32667 \" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/gradcam_sandal_top_k_difference-2.png\" alt=\"Figure 8: Largest positive change in PCC explanation similarity due to the manipulation.\" width=\"940\" height=\"643\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/gradcam_sandal_top_k_difference-2.png 1237w, https:\/\/www.inovex.de\/wp-content\/uploads\/gradcam_sandal_top_k_difference-2-300x205.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/gradcam_sandal_top_k_difference-2-1024x700.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/gradcam_sandal_top_k_difference-2-768x525.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/gradcam_sandal_top_k_difference-2-400x274.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/gradcam_sandal_top_k_difference-2-360x246.png 360w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><figcaption id=\"caption-attachment-32667\" class=\"wp-caption-text\">Figure 8: Largest positive change in PCC explanation similarity due to the manipulation.<\/figcaption><\/figure>\n<p>Visibly, the original and adversarial explanations look very distinct before the manipulation which is also indicated by the strong negative linear correlation according to the PCC metrics displayed above the originals. After the manipulation, the explanations have changed in a way to make them look almost identical, also indicated by the high PCC values.<\/p>\n<p>To get a full picture, we also need to look at the opposite cases, meaning the explanations that changed for the worse. In Figure 9,\u00a0 you can see the largest negative changes in PCC explanation similarity. Here, it becomes apparent that the manipulation did not work as expected for all samples. As you can see in the first row, the initial explanations of original and adversarial were quite similar. However, the manipulation led to a profound change in the original explanation, hence decreasing the similarity.<\/p>\n<figure id=\"attachment_32642\" aria-describedby=\"caption-attachment-32642\" style=\"width: 941px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32642 \" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/gradcam_sandal_bottom_k_differenc.png\" alt=\"Figure 9: Largest negative change in PCC explanation similarity due to the manipulation.\" width=\"941\" height=\"619\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/gradcam_sandal_bottom_k_differenc.png 1256w, https:\/\/www.inovex.de\/wp-content\/uploads\/gradcam_sandal_bottom_k_differenc-300x197.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/gradcam_sandal_bottom_k_differenc-1024x673.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/gradcam_sandal_bottom_k_differenc-768x505.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/gradcam_sandal_bottom_k_differenc-400x263.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/gradcam_sandal_bottom_k_differenc-360x237.png 360w\" sizes=\"auto, (max-width: 941px) 100vw, 941px\" \/><figcaption id=\"caption-attachment-32642\" class=\"wp-caption-text\">Figure 9: Largest negative change in PCC explanation similarity due to the manipulation.<\/figcaption><\/figure>\n<h4>Guided Backpropagation<\/h4>\n<p>For Guided Backpropagation, the situation was a bit different, as the manipulation seemed to work well for class <i>Sandal<\/i>, but not for class <i>Coat<\/i>, which is shown in Figure 10.<\/p>\n<figure id=\"attachment_32656\" aria-describedby=\"caption-attachment-32656\" style=\"width: 650px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32656 \" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_top_and_bottom_similarities_white.png\" alt=\"Figure 10: Box plot visualizations of pre- and post-manipulation explanation similarities for Guided Backpropagation and classes Sandal and Coat.\" width=\"650\" height=\"655\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_top_and_bottom_similarities_white.png 962w, https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_top_and_bottom_similarities_white-298x300.png 298w, https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_top_and_bottom_similarities_white-150x150.png 150w, https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_top_and_bottom_similarities_white-768x774.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_top_and_bottom_similarities_white-400x403.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_top_and_bottom_similarities_white-360x363.png 360w\" sizes=\"auto, (max-width: 650px) 100vw, 650px\" \/><figcaption id=\"caption-attachment-32656\" class=\"wp-caption-text\">Figure 10: Box plot visualizations of pre- and post-manipulation explanation similarities for Guided Backpropagation and classes Sandal and Coat.<\/figcaption><\/figure>\n<p>On the right-hand side, you can see that both the PCC and MSE similarities saw some improvement for class <i>Sandal.<\/i> For class <i>Coat<\/i>, on the other hand, there was a significant deterioration: The explanations of original and adversarial images are less similar after the manipulation than before across all three metrics. I investigated this issue in an ablation study by using the weighted cross-entropy loss (see parameter <i>weight<\/i> in the PyTorch docs) as the first loss term and assigning different weights to class <i>Coat<\/i> while leaving the weights of the other classes at 1. This means that I put more emphasis on the model to get this class right. Upon inspecting the results, I noticed that the higher I chose the weight, the better the classification performance of the manipulated model became. However, there was an inverse relationship with the explanation similarity, leading it to become lower as I chose higher weights. Apparently, for this particular Fashion-MNIST class, I cannot have my cake and eat it too. \ud83d\ude09<\/p>\n<h4>Additional results<\/h4>\n<p>Because I had some time left towards the end of my thesis, I also wanted to know whether I could achieve good results for other Fashion-MNIST classes with the two XAI methods I chose and whether it is possible to manipulate other XAI methods in the same way.<\/p>\n<p>Regarding the first case, I tried to manipulate the classes <i>Trousers<\/i>, <i>Dress<\/i>, as well as both of them combined. Without much hyperparameter tuning (rather, I just guessed which parameters could work well) I got quite decent results as you can see in Figure 11 for Grad-CAM.<\/p>\n<figure id=\"attachment_32661\" aria-describedby=\"caption-attachment-32661\" style=\"width: 950px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32661 \" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_Trousers_Dress_and_merged_similarities_white.png\" alt=\"Figure 11: Box plot visualizations of pre- and post-manipulation explanation similarities for Grad-CAM and classes Trousers, Dress, and their combination.\" width=\"950\" height=\"561\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_Trousers_Dress_and_merged_similarities_white.png 1162w, https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_Trousers_Dress_and_merged_similarities_white-300x177.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_Trousers_Dress_and_merged_similarities_white-1024x605.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_Trousers_Dress_and_merged_similarities_white-768x453.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_Trousers_Dress_and_merged_similarities_white-400x236.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/FashionMNIST_GradCAM_Trousers_Dress_and_merged_similarities_white-360x213.png 360w\" sizes=\"auto, (max-width: 950px) 100vw, 950px\" \/><figcaption id=\"caption-attachment-32661\" class=\"wp-caption-text\">Figure 11: Box plot visualizations of pre- and post-manipulation explanation similarities for Grad-CAM and classes Trousers, Dress, and their combination.<\/figcaption><\/figure>\n<p>Astonishingly, the manipulation also worked for the XAI techniques Gradients * Inputs and Integrated Gradients. In the following picture, you can see some example manipulation results for Integrated Gradients on target class <i>Dress<\/i>. After the manipulation, the explanations look astonishingly similar (see Figure 12).<\/p>\n<figure id=\"attachment_32646\" aria-describedby=\"caption-attachment-32646\" style=\"width: 940px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32646\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/\/integrated_gradients_dress-1024x715.png\" alt=\"Figure 12: Pre- and post-manipulation explanations for original and adversarial images created using Integrated Gradients and class Dress.\" width=\"940\" height=\"656\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/integrated_gradients_dress-1024x715.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/integrated_gradients_dress-300x209.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/integrated_gradients_dress-768x536.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/integrated_gradients_dress-400x279.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/integrated_gradients_dress-360x251.png 360w, https:\/\/www.inovex.de\/wp-content\/uploads\/integrated_gradients_dress.png 1232w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><figcaption id=\"caption-attachment-32646\" class=\"wp-caption-text\">Figure 12: Pre- and post-manipulation explanations for original and adversarial images created using Integrated Gradients and class Dress.<\/figcaption><\/figure>\n<h2><span class=\"ez-toc-section\" id=\"Summary-conclusion\"><\/span>Summary &amp; conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>To summarize, I can say that most of the adversarial fine-tuning experiments I conducted in my thesis were quite successful. The three main goals of 1) retaining the classification performance on the original images, 2) ensuring the adversarial images are consistently misclassified, and 3) manipulating the model in such a way that the original and adversarial explanations look very similar, could be achieved in most of the cases. Only while manipulating based on Guided Backpropagation and class <i>Coat <\/i>was I not able to adhere to all of the goals.<\/p>\n<p>Generally, these findings are in line with previous works by Dombrowski et al., Ghorbani et al. and Heo et al., who also manipulated explanations, but took approaches that differ from mine. You should definitely check their works out though if you want to know where I got the inspiration from!<\/p>\n<p>Finally, all of these results in the space of attacking XAI techniques leave us with a few open questions:<\/p>\n<ul>\n<li>Which other XAI techniques can be manipulated in this way?<\/li>\n<li>Can we come up with a standardized and reproducible approach to evaluate the robustness of XAI techniques?<\/li>\n<li>And what are the implications of these findings for the use of XAI techniques outside of scientific research?<\/li>\n<\/ul>\n<p>I hope this post was able to spark your interest in attacks on Explainable Artificial Intelligence \u2013 I for my part am excited to see what directions the research in this area takes and how this in turn will affect machine learning applications in the future.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"References\"><\/span>References<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<h3>GitHub repository<\/h3>\n<ul>\n<li>You can find the code for the master thesis in the following GitHub repo: <a href=\"https:\/\/github.com\/inovex\/hiding-adversarial-attacks\">https:\/\/github.com\/inovex\/hiding-adversarial-attacks<\/a><\/li>\n<\/ul>\n<h3 id=\"references-papers\">Papers<\/h3>\n<ul>\n<li>[Moosavi-Dezfooli et al., 2016] &#8211; <a href=\"https:\/\/www.cv-foundation.org\/openaccess\/content_cvpr_2016\/html\/Moosavi-Dezfooli_DeepFool_A_Simple_CVPR_2016_paper.html\" target=\"_blank\" rel=\"noopener\">DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks<\/a><\/li>\n<li>[Dombrowski et al., 2019] &#8211; <a href=\"https:\/\/papers.nips.cc\/paper\/2019\/hash\/bb836c01cdc9120a9c984c525e4b1a4a-Abstract.html\" target=\"_blank\" rel=\"noopener\">Explanations can be manipulated and geometry is to blame<\/a><\/li>\n<li>[Selvaraju et al., 2017] &#8211; <a href=\"https:\/\/arxiv.org\/abs\/1610.02391\" target=\"_blank\" rel=\"noopener\">Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization<\/a><\/li>\n<li>[Springenberg et al., 2015] &#8211; <a href=\"https:\/\/lmb.informatik.uni-freiburg.de\/Publications\/2015\/DB15a\/\" target=\"_blank\" rel=\"noopener\">Striving for Simplicity: The All Convolutional Net<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Deep neural networks are generally considered black boxes: We often do not understand which input features a model&#8217;s decisions are based on. Explainable Artificial Intelligence (XAI) techniques promise to offer a peek inside the box \u2013 but how robust are they? In this blog post, I am going to show you how to hide an [&hellip;]<\/p>\n","protected":false},"author":265,"featured_media":34999,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[511,150,264],"service":[76],"coauthors":[{"id":265,"display_name":"Stefanie Stoppel","user_nicename":"sstoppel"}],"class_list":["post-32840","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-artificial-intelligence-2","tag-computer-vision","tag-ml-interpretability","service-artificial-intelligence"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>\u201cWasn\u2019t Me\u201d or How to Hide Adversarial Attacks Using Explainable AI - inovex GmbH<\/title>\n<meta name=\"description\" content=\"This post introduces an approach to hide adversarial attacks on images by manipulating their explanations to look &quot;unsuspicious&quot;.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"\u201cWasn\u2019t Me\u201d or How to Hide Adversarial Attacks Using Explainable AI - inovex GmbH\" \/>\n<meta property=\"og:description\" content=\"This post introduces an approach to hide adversarial attacks on images by manipulating their explanations to look &quot;unsuspicious&quot;.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2022-03-03T09:41:37+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-03-04T06:51:47+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Hide-adversarial-Attacks-using-XAI.png\" \/>\n\t<meta property=\"og:image:width\" content=\"985\" \/>\n\t<meta property=\"og:image:height\" content=\"555\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Stefanie Stoppel\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Hide-adversarial-Attacks-using-XAI.png\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Stefanie Stoppel\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"24\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Stefanie Stoppel\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/hide-adversarial-attacks-using-explainable-artificial-intelligence\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/hide-adversarial-attacks-using-explainable-artificial-intelligence\\\/\"},\"author\":{\"name\":\"Stefanie Stoppel\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/3eccfa1cccdef8e987de6ed17fddcb6b\"},\"headline\":\"\u201cWasn\u2019t Me\u201d or How to Hide Adversarial Attacks Using Explainable AI\",\"datePublished\":\"2022-03-03T09:41:37+00:00\",\"dateModified\":\"2025-03-04T06:51:47+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/hide-adversarial-attacks-using-explainable-artificial-intelligence\\\/\"},\"wordCount\":3728,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/hide-adversarial-attacks-using-explainable-artificial-intelligence\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Hide-adversarial-Attacks-using-XAI.png\",\"keywords\":[\"Artificial Intelligence\",\"Computer Vision\",\"ML Interpretability\"],\"articleSection\":[\"Analytics\",\"English Content\",\"General\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/hide-adversarial-attacks-using-explainable-artificial-intelligence\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/hide-adversarial-attacks-using-explainable-artificial-intelligence\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/hide-adversarial-attacks-using-explainable-artificial-intelligence\\\/\",\"name\":\"\u201cWasn\u2019t Me\u201d or How to Hide Adversarial Attacks Using Explainable AI - inovex GmbH\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/hide-adversarial-attacks-using-explainable-artificial-intelligence\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/hide-adversarial-attacks-using-explainable-artificial-intelligence\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Hide-adversarial-Attacks-using-XAI.png\",\"datePublished\":\"2022-03-03T09:41:37+00:00\",\"dateModified\":\"2025-03-04T06:51:47+00:00\",\"description\":\"This post introduces an approach to hide adversarial attacks on images by manipulating their explanations to look \\\"unsuspicious\\\".\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/hide-adversarial-attacks-using-explainable-artificial-intelligence\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/hide-adversarial-attacks-using-explainable-artificial-intelligence\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/hide-adversarial-attacks-using-explainable-artificial-intelligence\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Hide-adversarial-Attacks-using-XAI.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Hide-adversarial-Attacks-using-XAI.png\",\"width\":985,\"height\":555,\"caption\":\"Torch light code in a hole\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/hide-adversarial-attacks-using-explainable-artificial-intelligence\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"\u201cWasn\u2019t Me\u201d or How to Hide Adversarial Attacks Using Explainable AI\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/inovexde\",\"https:\\\/\\\/x.com\\\/inovexgmbh\",\"https:\\\/\\\/www.instagram.com\\\/inovexlife\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/inovex\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/3eccfa1cccdef8e987de6ed17fddcb6b\",\"name\":\"Stefanie Stoppel\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Stefanie-Stoppel_avatar_1636033752-scaled-96x96.jpegbc4f6c7919a07972a629445ccf62177f\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Stefanie-Stoppel_avatar_1636033752-scaled-96x96.jpeg\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Stefanie-Stoppel_avatar_1636033752-scaled-96x96.jpeg\",\"caption\":\"Stefanie Stoppel\"},\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/author\\\/sstoppel\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"\u201cWasn\u2019t Me\u201d or How to Hide Adversarial Attacks Using Explainable AI - inovex GmbH","description":"This post introduces an approach to hide adversarial attacks on images by manipulating their explanations to look \"unsuspicious\".","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/","og_locale":"de_DE","og_type":"article","og_title":"\u201cWasn\u2019t Me\u201d or How to Hide Adversarial Attacks Using Explainable AI - inovex GmbH","og_description":"This post introduces an approach to hide adversarial attacks on images by manipulating their explanations to look \"unsuspicious\".","og_url":"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2022-03-03T09:41:37+00:00","article_modified_time":"2025-03-04T06:51:47+00:00","og_image":[{"width":985,"height":555,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/Hide-adversarial-Attacks-using-XAI.png","type":"image\/png"}],"author":"Stefanie Stoppel","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/Hide-adversarial-Attacks-using-XAI.png","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Stefanie Stoppel","Gesch\u00e4tzte Lesezeit":"24\u00a0Minuten","Written by":"Stefanie Stoppel"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/"},"author":{"name":"Stefanie Stoppel","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/3eccfa1cccdef8e987de6ed17fddcb6b"},"headline":"\u201cWasn\u2019t Me\u201d or How to Hide Adversarial Attacks Using Explainable AI","datePublished":"2022-03-03T09:41:37+00:00","dateModified":"2025-03-04T06:51:47+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/"},"wordCount":3728,"commentCount":0,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Hide-adversarial-Attacks-using-XAI.png","keywords":["Artificial Intelligence","Computer Vision","ML Interpretability"],"articleSection":["Analytics","English Content","General"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/","url":"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/","name":"\u201cWasn\u2019t Me\u201d or How to Hide Adversarial Attacks Using Explainable AI - inovex GmbH","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Hide-adversarial-Attacks-using-XAI.png","datePublished":"2022-03-03T09:41:37+00:00","dateModified":"2025-03-04T06:51:47+00:00","description":"This post introduces an approach to hide adversarial attacks on images by manipulating their explanations to look \"unsuspicious\".","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/Hide-adversarial-Attacks-using-XAI.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Hide-adversarial-Attacks-using-XAI.png","width":985,"height":555,"caption":"Torch light code in a hole"},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/hide-adversarial-attacks-using-explainable-artificial-intelligence\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"\u201cWasn\u2019t Me\u201d or How to Hide Adversarial Attacks Using Explainable AI"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/3eccfa1cccdef8e987de6ed17fddcb6b","name":"Stefanie Stoppel","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/wp-content\/uploads\/Stefanie-Stoppel_avatar_1636033752-scaled-96x96.jpegbc4f6c7919a07972a629445ccf62177f","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/Stefanie-Stoppel_avatar_1636033752-scaled-96x96.jpeg","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Stefanie-Stoppel_avatar_1636033752-scaled-96x96.jpeg","caption":"Stefanie Stoppel"},"url":"https:\/\/www.inovex.de\/de\/blog\/author\/sstoppel\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/32840","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/265"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=32840"}],"version-history":[{"count":8,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/32840\/revisions"}],"predecessor-version":[{"id":61116,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/32840\/revisions\/61116"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/34999"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=32840"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=32840"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=32840"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=32840"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}