algorithms

July 19, 2022

How to calculate shapley values from scratch

The shapley value is a popular and useful method to explain machine learning models. The shapley value of a feature is the average contribution of a feature value to the prediction. In this article I’ll show you how to compute shapley values from scratch. Read more

#NLP | #Algorithms

May 24, 2021

Learning unsupervised embeddings for textual similarity with transformers

In this article, we look at SimCSE, a simple contrastive sentence embedding framework, which can be used to produce superior sentence embeddings, from either unlabeled or labeled data. The idea behind the unsupervised SimCSE is to simply predicts the input sentence itself, with only dropout used as noise. Read more

#NLP | #Algorithms | #from-scratch

May 20, 2020

Latent Dirichlet allocation from scratch

Today, I’m going to talk about topic models in NLP. Specifically we will see how the Latent Dirichlet Allocation model works and we will implement it from scratch in numpy. What is a topic model? Assume we are given a large collections of documents. Read more

#data quality | #NLP | #algorithms | #machine learning

January 21, 2020

Find label issues with confident learning for NLP

In every machine learning project, the training data is the most valuable part of your system. In many real-world machine learning projects the largest gains in performance come from improving training data quality. Training data is often hard to aquire and since the data can be large, quality can be hard to check. Read more

#algorithms | #NLP | #XAI

December 10, 2019

How the LIME algorithm fails

You maybe know the LIME algorithm from some of my earlier blog posts. It can be quite useful to “debug” data sets and understand machine learning models better. But LIME is fooled very easily.

#machine learning | #NLP | #algorithms

November 24, 2019

Cluster discovery in german recipes

If you are dealing with a large collections of documents, you will often find yourself in the situation where you are looking for some structure and understanding what is contained in the documents. Here I’ll show you a convenient method for discovering and understanding clusters of text documents. Read more

#computer vision | #algorithms | #deep learning

August 5, 2019

Model uncertainty in deep learning with Monte Carlo dropout in keras

Deep learning models have shown amazing performance in a lot of fields such as autonomous driving, manufacturing, and medicine, to name a few. However, these are fields in which representing model uncertainty is of crucial importance. Read more

#algorithms | #NLP

April 7, 2019

Introduction to n-gram language models

You might have heard, that neural language models power a lot of the recent advances in natural language processing. Namely large models like Bert and GPT-2. But there is a fairly old approach to language modeling that is quite successful in a way. Read more

#machine learning | #algorithms | #NLP | #deep learning

February 12, 2019

How to use magnitude with keras

This time we have a look into the magnitude library, a feature-packed Python package and vector storage file format for utilizing vector embeddings in machine learning models in a fast, efficient, and simple manner developed by Plasticity. Read more

#kaggle | #NLP | #algorithms

November 24, 2018

Understanding text data with topic models

This is the first post of my series about understanding text datasets. A lot of the current NLP progress is made in predictive performance. But in practice, you often want and need to know, what is going on in your dataset. Read more

#Algorithms | #NLP | #XAI | #from-scratch

June 2, 2018

Debugging black-box text classifiers with LIME

Often in text classification, we use so called black-box classifiers. By black-box classifiers I mean a classification system where the internal workings are completely hidden from you. A famous example are deep neural nets, in text classification often recurrent or convolutional neural nets. Read more