September 25, 2020
Language models gained popoluarity in NLP in the recent years. Sometimes you might have enought data and want to train a language model like BERT or RoBERTa from scratch. While there are many tutorials about tokenization and on how to train the model, there is not much information about how to load the data into the model. This guide aims to close this gap.