How to use magnitude with keras

This time we have a look into the magnitude library, a feature-packed Python package and vector storage file format for utilizing vector embeddings in machine learning models in a fast, efficient, and simple manner developed by Plasticity. We want to utilize the embeddings magnitude provides and use them in keras.

Vector space embedding models have become increasingly common in machine learning and traditionally have been popular for natural language processing applications. A fast, lightweight tool to consume these large vector space embedding models efficiently is lacking.

The Magnitude file format (.magnitude) for vector embeddings is intended to be a more efficient universal vector embedding format that allows for lazy-loading for faster cold starts in development, LRU memory caching for performance in production, multiple key queries, direct featurization to the inputs for a neural network, performant similarity calculations, and other nice to have features for edge cases like handling out-of-vocabulary keys or misspelled keys and concatenating multiple vector models together. It also is intended to work with large vector models that may not fit in memory.

Installation

You can install the magnitude package easily with pip:

!pip install pymagnitude


Now you have to download the embedding model you want to use. We picked the fasttext+subword model for a first try. You can also download different and more powerful models like ELMo here.

!wget --header 'Host: magnitude.plasticity.ai' --user-agent 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:65.0) Gecko/20100101 Firefox/65.0' --header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8' --header 'Accept-Language: en-US,en;q=0.5' --header 'DNT: 1' --header 'Upgrade-Insecure-Requests: 1' 'http://magnitude.plasticity.ai/fasttext+subword/wiki-news-300d-1M.magnitude' --output-document 'wiki-news-300d-1M.magnitude'

--2019-02-12 12:46:23--  http://magnitude.plasticity.ai/fasttext+subword/wiki-news-300d-1M.magnitude
Auflösen des Hostnamens magnitude.plasticity.ai (magnitude.plasticity.ai) … 52.216.129.178
Verbindungsaufbau zu magnitude.plasticity.ai (magnitude.plasticity.ai)|52.216.129.178|:80 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 200 OK
Länge: 1692762112 (1,6G) [application/x-www-form-urlencoded]
Wird in »wiki-news-300d-1M.magnitude« gespeichert.

wiki-news-300d-1M.m 100%[===================>]   1,58G  1,13MB/s    in 25m 25s

2019-02-12 13:11:48 (1,06 MB/s) - »wiki-news-300d-1M.magnitude« gespeichert [1692762112/1692762112]


MAX_WORDS = 500 # The maximum number of words the sequence model will consider

from pymagnitude import *

vecs = Magnitude('wiki-news-300d-1M.magnitude',
#case_insensitive=True,


sentence  = vecs.query(["play", "some", "music", "on", "the", "living", "room", "speakers", "."])
sentence.shape

(500, 300)


Build a keras generator to wrap Magnitude

We load the IMDB Movie reviews data set for sentiment classification. It’s a dataset of 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). Reviews have been preprocessed, and each review is encoded as a sequence of word indexes (integers).

import numpy as np
from keras.datasets import imdb

(x_train, y_train), (x_test, y_test) = imdb.load_data(path="imdb.npz",
num_words=None,
skip_top=0,
maxlen=None,
seed=113,
start_char=1,
oov_char=2,
index_from=3)


word_index = imdb.get_word_index()

inv_word_index = {0: "#PAD#", 1: "#START#", 2: "#OOV#"}
inv_word_index.update({v + 3: k for k, v in word_index.items()})

# map indices back to the words
x_train = [[inv_word_index[x_ij] for x_ij in x_i] for x_i in x_train]
x_test = [[inv_word_index[x_ij] for x_ij in x_i] for x_i in x_test]


This is how the text looks now.

" ".join(x_train[0])

"#START# this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert redford's is an amazing actor and now the same being director norman's father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for retail and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also congratulations to the two little boy's that played the part's of norman and paul they were just brilliant children are often left out of the praising list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all"


Build the model

We first specify a few hyperparameters for our model. Note, that we already specified the maximum sequence length when loading the vectors.

HIDDEN_UNITS = 128 # The number of hidden units from the LSTM
DROPOUT_RATIO = .2 # The ratio to dropout
BATCH_SIZE = 256 # The number of examples per train/validation step
EPOCHS = 10 # The number of times to repeat through all of the training data
LEARNING_RATE = .001 # The learning rate for the optimizer


Now we build a simple Bidirectional LSTM with keras.

from keras.models import Sequential
from keras.layers import LSTM, Bidirectional, Dropout, Dense

model = Sequential()

model.compile(
loss='binary_crossentropy',
metrics=['accuracy']
)


The MagnitudeUtils class offers some nice utility functions to prepare batches.

training_batches = MagnitudeUtils.batchify(x_train, y_train, BATCH_SIZE) # Split the training data into batches
num_batches_per_epoch_train = int(np.ceil(len(x_train)/float(BATCH_SIZE)))
test_batches = MagnitudeUtils.batchify(x_test, y_test, BATCH_SIZE)  # Split the test data into batches
num_batches_per_epoch_test = int(np.ceil(len(x_test)/float(BATCH_SIZE)))


Next, we build the batch generators for to train our model. Each batch will be lazyly transformed by Magnitude when it’s needed.

train_batch_generator = (
(
vecs.query(X_train_batch),
y_train_batch
) for X_train_batch, y_train_batch in training_batches
)

test_batch_generator = (
(
vecs.query(X_test_batch),
y_test_batch
) for X_test_batch, y_test_batch in test_batches
)


And now, we can finally train the model.

model.fit_generator(
generator = train_batch_generator,
steps_per_epoch = num_batches_per_epoch_train,
validation_data = test_batch_generator,
validation_steps = num_batches_per_epoch_test,
epochs = EPOCHS,
use_multiprocessing=True
)

Epoch 1/10
98/98 [==============================] - 1975s 20s/step - loss: 0.5775 - acc: 0.6937 - val_loss: 0.4343 - val_acc: 0.8173
Epoch 2/10
98/98 [==============================] - 149s 2s/step - loss: 0.4574 - acc: 0.8015 - val_loss: 0.4039 - val_acc: 0.8328
Epoch 3/10
98/98 [==============================] - 149s 2s/step - loss: 0.4135 - acc: 0.8318 - val_loss: 0.3845 - val_acc: 0.8362
Epoch 4/10
98/98 [==============================] - 149s 2s/step - loss: 0.4094 - acc: 0.8283 - val_loss: 0.4195 - val_acc: 0.8393
Epoch 5/10
98/98 [==============================] - 149s 2s/step - loss: 0.4170 - acc: 0.8242 - val_loss: 0.3773 - val_acc: 0.8413
Epoch 6/10
98/98 [==============================] - 150s 2s/step - loss: 0.3724 - acc: 0.8484 - val_loss: 0.3742 - val_acc: 0.8422
Epoch 7/10
98/98 [==============================] - 149s 2s/step - loss: 0.3585 - acc: 0.8551 - val_loss: 0.4121 - val_acc: 0.8380
Epoch 8/10
98/98 [==============================] - 150s 2s/step - loss: 0.3629 - acc: 0.8543 - val_loss: 0.3558 - val_acc: 0.8502
Epoch 9/10
98/98 [==============================] - 150s 2s/step - loss: 0.3602 - acc: 0.8528 - val_loss: 0.3425 - val_acc: 0.8561
Epoch 10/10
98/98 [==============================] - 149s 2s/step - loss: 0.3428 - acc: 0.8618 - val_loss: 0.3520 - val_acc: 0.8490

<keras.callbacks.History at 0x7fa615c1b278>


Magnitude is using caching of frequently used words. This is why, after warming up at the first epoch, the time per epoch significantly goes down.

The model performance is not very strong result so far. But you learned how to use magnitude to use embedding models with keras. One very handy feature I particularly like, is the ability to handle out-of-vocabulary words and misspellings. Try it yourself. If you want to learn how to use ELMo embeddings with keras and tensorhub, have a look at this post.

• “Magnitude: A Fast, Efficient Universal Vector Embedding Utility Package”