engineering

#engineering | #shorts

April 20, 2022

How to test error messages with pytest

In this short article, you will learn, how and when to test the error message of an exception with pytest.

#NLP | #engineering

September 25, 2020

The missing guide on data preparation for language modeling

Language models gained popularity in NLP in the recent years. Sometimes you might have enough data and want to train a language model like BERT or RoBERTa from scratch. While there are many tutorials about tokenization and on how to train the model, there is not much information about how to load the data into the model. This guide aims to close this gap.

#data quality | #engineering | #NLP | #machine learning

June 3, 2020

Data validation for NLP applications with topic models

In a recent article, we saw how to implement a basic validation pipeline for text data. Once a machine learning model has been deployed its behavior must be monitored. The predictive performance is expected to degrade over time as the environment changes. Read more

#data quality | #engineering | #NLP | #machine learning

January 30, 2020

Data validation for NLP machine learning applications

An important part of machine learning applications, is making sure that there is no data degeneration while a model is in production. Sometimes downstream data processing changes and machine learning models are very prone to silent failure due to this. Read more

Privacy•Imprint•RSS