September 25, 2020

The missing guide on data preparation for language modeling

Language models gained popoluarity in NLP in the recent years. Sometimes you might have enought data and want to train a language model like BERT or RoBERTa from scratch. While there are many tutorials about tokenization and on how to train the model, there is not much information about how to load the data into the model. This guide aims to close this gap.

Privacy Imprint

© depends-on-the-definition 2017-2020