Named entity recognition series:
- Introduction To Named Entity Recognition In Python
- Named Entity Recognition With Conditional Random Fields In Python
- Guide To Sequence Tagging With Neural Networks In Python
- Sequence Tagging With A LSTM-CRF
- Enhancing LSTMs With Character Embeddings For Named Entity Recognition
- State-Of-The-Art Named Entity Recognition With Residual LSTM And ELMo
- Evaluate Sequence Models In Python
- Named Entity Recognition with Bert
- Interpretable Named entity recognition with keras and LIME
In the previous posts, we saw how to build strong and versatile named entity recognition systems and how to properly evaluate them. But often you want to understand your model beyond the metrics. So in this tutorial I will show you how you can build an explainable and interpretable NER system with keras and the LIME algorithm.
What does explainable mean?
Deep neural networks are quite successful in many use-cases, but these models can be hard to debug and to understand what’s going on. Our aim is to understand how much certain words influence the prediction of our named entity tagger. We want a human-understandable qualitative explanation which enables an interpretation of the underlying algorithm.
Load the data
We use the data set, you already know from my previous posts about named entity recognition.
import pandas as pd
import numpy as np
from tqdm import tqdm, trange
data = pd.read_csv("ner_dataset.csv", encoding="latin1").fillna(method="ffill")
data.tail(10)
words = list(set(data["Word"].values))
n_words = len(words); n_words
tags = list(set(data["Tag"].values))
n_tags = len(tags); n_tags
class SentenceGetter(object):
def __init__(self, data):
self.n_sent = 1
self.data = data
self.empty = False
agg_func = lambda s: [(w, p, t) for w, p, t in zip(s["Word"].values.tolist(),
s["POS"].values.tolist(),
s["Tag"].values.tolist())]
self.grouped = self.data.groupby("Sentence #").apply(agg_func)
self.sentences = [s for s in self.grouped]
def get_next(self):
try:
s = self.grouped["Sentence: {}".format(self.n_sent)]
self.n_sent += 1
return s
except:
return None
getter = SentenceGetter(data)
sentences = getter.sentences
This is how the sentences in the dataset look like.
labels = [[s[2] for s in sent] for sent in sentences]
sentences = [" ".join([s[0] for s in sent]) for sent in sentences]
sentences[0]
The sentences are annotated with the BIO-schema and the labels look like this.
print(labels[0])
Preprocess the data
We first build a vocabulary of the most common 10000 words and map the rest to the “UNK” token.
from collections import Counter
from keras.preprocessing.sequence import pad_sequences
word_cnt = Counter(data["Word"].values)
vocabulary = set(w[0] for w in word_cnt.most_common(5000))
Now we create the word index and pad the sequence to a common length.
max_len = 50
word2idx = {"PAD": 0, "UNK": 1}
word2idx.update({w: i for i, w in enumerate(words) if w in vocabulary})
tag2idx = {t: i for i, t in enumerate(tags)}
X = [[word2idx.get(w, word2idx["UNK"]) for w in s.split()] for s in sentences]
X = pad_sequences(maxlen=max_len, sequences=X, padding="post", value=word2idx["PAD"])
y = [[tag2idx[l_i] for l_i in l] for l in labels]
y = pad_sequences(maxlen=max_len, sequences=y, padding="post", value=tag2idx["O"])
Lastly, we split the data in train and test set.
from sklearn.model_selection import train_test_split
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.1, shuffle=False)
Now we are ready to build our model.
Setup the NER model
We use the simple LSTM model from this earlier post. But the procedure shown here applies to all kinds of sequence models.
from keras.models import Model, Input
from keras.layers import LSTM, Embedding, Dense, TimeDistributed, SpatialDropout1D, Bidirectional
word_input = Input(shape=(max_len,))
model = Embedding(input_dim=n_words, output_dim=50, input_length=max_len)(word_input)
model = SpatialDropout1D(0.1)(model)
model = Bidirectional(LSTM(units=100, return_sequences=True, recurrent_dropout=0.1))(model)
out = TimeDistributed(Dense(n_tags, activation="softmax"))(model)
model = Model(word_input, out)
model.compile(optimizer="rmsprop",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
history = model.fit(X_tr, y_tr.reshape(*y_tr.shape, 1),
batch_size=32, epochs=5,
validation_split=0.1, verbose=1)
Now look at the predictions and explain them
To explain the predictions, we use the LIME algorithm implemented in the eli5 library. We assume you already now what the algorithm is doing. You can read more about it in this post.
from eli5.lime import TextExplainer
from eli5.lime.samplers import MaskingTextSampler
Now we create a small python class, that holds our preprocessing and prediction of the model. To apply LIME we just need a function to make predictions on texts. We use the closure pattern in get_predict_function
which returns a function that takes a list of texts, processes them and returns the predictions of our previously trained model.
The trick
To make the LIME algorithm work for us, we need to rephrase our problem as a simple multi-class classification problem. We do this by selecting before-hand for which word we want to explain the prediction. This is done by passing the word_index
to the get_predict_function
method.
class NERExplainerGenerator(object):
def __init__(self, model, word2idx, tag2idx, max_len):
self.model = model
self.word2idx = word2idx
self.tag2idx = tag2idx
self.idx2tag = {v: k for k,v in tag2idx.items()}
self.max_len = max_len
def _preprocess(self, texts):
X = [[self.word2idx.get(w, self.word2idx["UNK"]) for w in t.split()]
for t in texts]
X = pad_sequences(maxlen=self.max_len, sequences=X,
padding="post", value=self.word2idx["PAD"])
return X
def get_predict_function(self, word_index):
def predict_func(texts):
X = self._preprocess(texts)
p = self.model.predict(X)
return p[:,word_index,:]
return predict_func
Let’s have a look at some interesting samples. For example the 46781th text in our data set.
index = 46781
label = labels[index]
text = sentences[index]
print(text)
print()
print(" ".join([f"{t} ({l})" for t, l in zip(text.split(), label)]))
for i, w in enumerate(text.split()):
print(f"{i}: {w}")
Now start to explain the prediction. We first initialize our generator object.
explainer_generator = NERExplainerGenerator(model, word2idx, tag2idx, max_len)
We want to explain the NER prediction for the word “Obasanjo”, so we pick word_index=4
and generate the respective prediction function.
word_index = 4
predict_func = explainer_generator.get_predict_function(word_index=word_index)
Here we have to specify a sampler for the LIME algorithm. This controls how the algorithm samples perturbed samples from the text we want to explain. Read more about this in this article or the eli5 documentation.
sampler = MaskingTextSampler(
replacement="UNK",
max_replace=0.7,
token_pattern=None,
bow=False
)
samples, similarity = sampler.sample_near(text, n_samples=4)
print(samples)
Finally, we set up the TextExplainer
and explain the prediction.
te = TextExplainer(
sampler=sampler,
position_dependent=True,
random_state=42
)
te.fit(text, predict_func)
te.explain_prediction(
target_names=list(explainer_generator.idx2tag.values()),
top_targets=3
)
Very nice! As expected, the model predicted I-per
for a later part of a person name. The word President
is a strong indicator that the following word is part of a name. This indicates, that in the dataset, President
is often part of the annotation of a Person.
In this article you learned a handy method to dig deeper into what your named entity system does and how it interacts with your dataset and what signals it picked up. I hope you found it useful and enjoyed it. See you next time.
You might also be interested in:
- Guide to sequence tagging with neural networks in python: Named entity recognition series: Introduction To Named Entity Recognition In Python Named Entity Re …
- Enhancing LSTMs with character embeddings for Named entity recognition: Named entity recognition series: Introduction To Named Entity Recognition In Python Named Entity Re …
- Sequence tagging with a LSTM-CRF: Named entity recognition series: Introduction To Named Entity Recognition In Python Named Entity Re …
07/16/2019 at 2:24 pm
Hi Tobias,
Thank you for your great post.
I was trying to see if the lime algorithm can be used for the explain the output from the Bert model you provided in your previous posts, but I am getting this error,
AttributeError: ‘ResNet’ object has no attribute ‘predict’
when I run the last part of the code in this page.
I have not changed anything from your code, so I was wondering if the lime algorithm works with Bert model or pytorch at all?
If there is a way out, could you please explain?
Thank you very much
07/16/2019 at 3:33 pm
Hi Zara,
it should work with Bert. You just have to pass a predict function to lime. That function should take the raw text and produce a prediction from it. To use the
NERExplainerGenerator
you need to rewrite it to work for Bert. That means you need to write the preprocessing and the call to the bert model in this class.Does this help you?
07/17/2019 at 8:11 am
Yes, thank you for the explanation. It helps a lot.