You maybe know the LIME algorithm from some of my earlier blog posts. It can be quite useful to “debug” data sets and understand machine learning models better. But LIME is fooled very easily. We use the eli5 TextExplainer which is based on LIME and the 20newsgroup data set to show how LIME can fail.

In [1]:
import numpy as np
from sklearn.datasets import fetch_20newsgroups
from eli5.lime import TextExplainer

# we only look at the case atheism vs. copmuter graphics
data = fetch_20newsgroups(categories=['alt.atheism', ''],
X, y =,

te = TextExplainer(random_state=42, n_samples=10000)

Now we code a fairly stupid blackbox model, that just uses the length of the longest token in the document as a feature and splits the data based on this feature.

In [2]:
class BlackBoxModel():
    def fit(self, X=None, y=None):
    def predict(self, X, y=None):
        return np.array([int(max([len(x_ii) for x_ii in x_i.split(" ")]) >= 27) for x_i in X])
    def predict_proba(self, X, y=None):
        return np.array([
In [3]:
model = BlackBoxModel(), y)

Let’s explain the predictions of our black-box model:

In [4]:
From: af774@cleveland.Freenet.Edu (Chad Cipiti)
Subject: Good shareware paint and/or animation software for SGI?
Organization: Case Western Reserve University, Cleveland, OH (USA)
Lines: 15
Reply-To: af774@cleveland.Freenet.Edu (Chad Cipiti)

Does anyone know of any good shareware animation or paint software for an SGI
 machine?  I've exhausted everyplace on the net I can find and still don't hava
 a nice piece of software.

Thanks alot!


This is clearly talking about computer graphics here. So let’s see what our blackbox model predicts and how LIME would explain it.

In [5]:
y_pred = model.predict([X[2]])

So the prediction is corrent, but let’s see if we can understand why.

In [6]:[2], model.predict_proba)

Out[6]: (probability 0.999, score 6.993) top features

+7.976 Highlighted in text (sum)
-0.983 <BIAS>

from: (chad cipiti) subject: good shareware paint and/or animation software for sgi? organization: case western reserve university, cleveland, oh (usa) lines: 15 replyto: (chad cipiti) nntppostinghost: does anyone know of any good shareware animation or paint software for an sgi machine? ive exhausted everyplace on the net i can find and still dont hava a nice piece of software. thanks alot! chad

So we would conclude, that email addresses and words like software and university are important, but we know they are not. This issue always arises, when you are not sure what kind of features your model internally uses. So, for example, it can also be present in more subtle ways in large language models like BERT. You can find out more on this issue on the eli5 website. Note, that it is fairly easy to rationalize this explanation as a human if you are note really careful. This is what makes Explainable AI so dangerous. So be aware of these issues and apply these methods carefully.

You might also be interested in: