September 25, 2020

The missing guide on data preparation for language modeling

The missing guide on data preparation for language modeling
Language models gained popoluarity in NLP in the recent years. Sometimes you might have enought data and want to train a language model like BERT or RoBERTa from scratch. While there are many tutorials about tokenization and on how to train the model, there is not much information about how to load the data into the model. This guide aims to close this gap.

May 20, 2020

Latent Dirichlet allocation from scratch

Latent Dirichlet allocation from scratch
Today, I’m going to talk about topic models in NLP. Specifically we will see how the Latent Dirichlet Allocation model works and we will implement it from scratch in numpy. What is a topic model? Assume we are given a large collections of documents. Read more

December 28, 2019

How explainable AI fails and what to do about it

This article heavily relys on "Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead" by Cynthia Rudin and finally on some of my personal experiences. I will mainly focus on technical issues and leave out most of the governance and ethics related issues that derive from these. Read more

Privacy Imprint

© depends-on-the-definition 2017-2022