Handling dataset shift

Why does this matter?

The performance of a model tends to decrease over time. Most models are deployed in dynamic and continually changing environments. When these changes alter the distribution of our input data or the relationship between the input data and the target variable, our model can suddenly become obsolete. This phenomenon is generally referred to as dataset shift.

For example, customer behaviour was drastically altered by the COVID-19 pandemic (more people buying online, more masks and hand sanitizers sold, etc…). It is easy to see how a model trained on pre-pandemic data may fail to predict customer behaviour.

This Roadmap

This guide will help you to mitigate dataset shifts. Firstly, we will cover the definitions of different types of dataset shift. Secondly, we will address how to detect it. Finally we will cover a few possible strategies to handle the shifts.

Last updated