Machine Learning Interpretability: Explaining Blackbox Models with LIME (Part II)

Gepostet am: 27. Mai 2019

und

This is the second part of our series about Machine Learning interpretability. We want to describe LIME (Local Interpretable Model-Agnostic Explanations), a popular technique to explain blackbox models. It was proposed by Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin in their paper Why Should I Trust You? Explaining the Predictions of Any Classifier, which they first presented at the ACM’s Conference on Knowledge Discovery and Data Mining in 2016. Please check out our previous article if you are not familiar with the concept of interpretability.

We previously made a distinction between model-specific and model-agnostic techniques as well as between global and local techniques. LIME can be classified as a model-agnostic technique with a local scope. In other words, it enables us to explain particular predictions of any model. In contrast to the techniques described in the first article, LIME is even applicable to models for text and image classification.

What is LIME?

The idea behind LIME is to approximate a complex model locally by an interpretable model and to use that simple model to explain a prediction of a particular instance of interest. It’s comparable to the Global Surrogate technique, but it differs in the fact that it is based on sampled instances that are weighted by proximity to the instance to be explained. Hence, instead of trying to capture the overall behavior of the model, LIME just attempts to be locally faithful to the classifier. The following image shows the non-linear decision boundary of some complex classifier. LIME fitted a linear model, represented by the dashed decision boundary, to sampled instances. Notice that this simple model mimics the complex model in the vicinity of the instance of interest sufficiently well.

Local Faithfulness of LIME

The red and blue areas are separated by the decision boundary of a complex, non-linear model. LIME determined a linear model based on weighted data points to explain the complex models‘ prediction for the highlighted instance. (source: https://arxiv.org/abs/1602.04938)

Unlike other interpretability techniques, explanations that are generated by LIME are based on so-called interpretable components which can completely differ from the input features of the original model. Basically, interpretable components are representations of the underlying data which are understandable to humans. For example, an interpretable component can be a subset of words of a text corpus or a contiguous region of an image. The concept of interpretable components enables LIME to be generalizable to high dimensional domains like text or image classification.

Whatever type of model you try to explain with LIME, you have to think about appropriate ways to determine interpretable components. Common approaches are the bag-of-words model for text classification or pixel segmentation for image classification. Based on that mapping, LIME acts on binary vectors that indicate presence or absence of interpretable components. Apart from that, the original model can be based on much more complex features. It estimates strength and direction of influence for each interpretable component by fitting interpretable models to perturbed instances of those binary vectors. LIME evaluates how perturbations (e.g. removing words of a text, hiding parts of an image) affects the prediction of the original model.

Intrepretable Components of an image

LIME determined contiguous regions of the original image by pixel segmentation (source: https://arxiv.org/abs/1602.04938)

Technically, any interpretable model can be used within the LIME framework. When fitting interpretable models, LIME attempts to balance simplicity and local fidelity. Therefore, a measure of complexity and some locality-aware loss is needed. In most practical cases, sparse linear models, especially The LASSO, perform sufficiently well. The complexity of LASSO models can be measured by the number of non-zero weights. A squared exponential kernel is usually used as a measure of local accuracy.

Now, let’s put it all together to describe each step of the LIME procedure:

  1. Determine an interpretable representation of the instance of interest
  2. Draw a sample by disturbing the interpretable representation
  3. Apply the original model to the perturbed instances
  4. Fit an interpretable model to proximity-weighted sampled instances and the predictions of the original model
  5. Use the interpretable model to draw conclusions about the relevance of each interpretable component
Appliation of LIME to image classification task

LIME applies a complex model to perturbed instances of the original image and fits an interpretable model to the outcome in order to estimate the importance of each interpretable component. (source: https://arxiv.org/abs/1602.04938)

How to use LIME?

Enough has been said about the theory behind LIME. Let’s finally use it to explain some predictions. We’ll start with an image classifier:

Image recognition

Let’s import all relevant libraries and a pre-trained Inception-V3 for image classification. Inception-V3 is a deep neural network proposed by Szegedy, et al. (Google) which is composed of 42 layers of roughly 7 million parameters. As described in this LIME tutorial, we cloned the fork of tf-slim and put the pre-trained Inception-V3 into tf-models/slim/pretrained.

In addition, let’s define some functions like pre-process and predict:

Now to the image recognition task: We want the classifier to recognize the squirrel in the following image. Let’s see if Inception-V3 is able to spot it:

squirrel prediction

Actually, the Inception net is quite sure that there is a fox squirrel in the image. Pretty impressive!

Now, that we observed the outcome of our complex classifier, let’s apply LIME to identify parts of the image having most influence to the squirrel prediction. To do this, we feed an explainer with the squirrel image and provide it the number of patches to be determined (num_features) along with the number of perturbations of the interpretable representation (num_samples):

Ok, let’s see which parts of the image accounted most to the squirrel prediction:

squirrel explanation

It was the squirrel’s face and its bushy tail. Both, quite reasonable causes for the squirrel prediction. Well done, Inception!

Tabular data

LIME is not only applicable to image data. We can also use it to explain predictions that are based on tabular data. As biology teaches us, squirrels are mammals. Let’s find out whether a neural net classifier is able learn and predict this, too. We therefore train a multi-layer perceptron on the Zoo Data Set, a small tabular dataset in which various species are categorized into seven classes based on 16 different features:

We imported all necessary packages, loaded the dataset and did some preprocessing on the data such as removing the observation for the squirrel from the train dataset. Finally, we trained the neural net on 75% of the original dataset.

Now, let’s see which class the classifier assigns to the squirrel:

As we can see, the classifier assigns the highest probability of 88.1% to the mammal class, which is actually correct. Nice!

But why does the model decide upon the mammal class? Since the neural network is a complex, non-linear model, it is difficult for us as humans to understand the model’s decision process. Yet, we can leverage LIME to support us in comprehending its decision. Let’s have a look at the LIME estimate of relevant features for the squirrel prediction:

We took the calculated class probabilities and list the impact of all features (16 in total) for the top class. In that particular case, the class is mammal.

The graph below shows the estimated impact of each feature. Bars starting in the middle and expanding to the left (colored in turquoise) decrease the probability of the instance in question to belong to the class mammal. Bars in blue expanding to the right increase the predicted probability that the squirrel is a mammal. We can see that most features had a positive contribution to the probability of the mammal class.

For example, the fact that a squirrel gives milk to feed its offspring and that it is not venomous are plausible explanations, and it makes sense that these two facts had a positive contribution to the final prediction. Furthermore, the length of a bar indicates the magnitude of the feature’s influence. Thus, giving milk, having fur and not laying eggs are the most influential or rather most discriminative features for a squirrel being a mammal.

Conclusion

While most of the techniques that were described in the first blog post are global techniques, LIME is a technique that has a local scope. Therefore, LIME allows us to explain particular predictions of any classifier. The LIME framework is flexible in the sense that any interpretable model can be used to explain predictions. Furthermore, the concept of interpretable components enables LIME to be applicable to high dimensional domains such as image or text classification. Overall, LIME can be used to support model selection and to generate trust by reviewing expressive examples.

Read on

You might wanna have a look at our deep learning portfolio. If you’re looking for new challenges, you might also want to consider our job offerings for Data Scientists, ML Engineers or BI Developer.

2019-06-04T16:14:33+00:00

Hat dir der Beitrag gefallen?