September 25, 2020

The missing guide on data preparation for language modeling

The missing guide on data preparation for language modeling
Language models gained popularity in NLP in the recent years. Sometimes you might have enough data and want to train a language model like BERT or RoBERTa from scratch. While there are many tutorials about tokenization and on how to train the model, there is not much information about how to load the data into the model. This guide aims to close this gap.

December 10, 2019

How the LIME algorithm fails

You maybe know the LIME algorithm from some of my earlier blog posts. It can be quite useful to “debug” data sets and understand machine learning models better. But LIME is fooled very easily.

November 24, 2019

Cluster discovery in german recipes

If you are dealing with a large collections of documents, you will often find yourself in the situation where you are looking for some structure and understanding what is contained in the documents. Here I’ll show you a convenient method for discovering and understanding clusters of text documents. Read more

PrivacyImprintRSS

© depends-on-the-definition 2017-2022