Step 2: Detecting dataset shifts

In order to make sure that your algorithm does not break or fail suddenly once in production, it is essential to monitor the behaviour of the system and detect dataset shifts.

There is a number of statistical properties that you may be interested in calculating and monitor over time, including:

  • The performance of the model (e.g. accuracy). A decay in performance would indicate model decay. It is advisable to have alerts in case this happens.

  • Performance-related metrics like classification confidence. If these vary a lot it may indicate concept drift.

  • Specific features and their relationship to the target variable. If you suspect that these features are particularly vulnerable to drift, it may be relevant to see how these change over time.

  • Statistical tests can help you measure how much the distribution of the training data differs from the current data. Some of these tests are:

    • the population stability index (PSI)

    • kolmogorov-smirnov statistic (KS)

    • histogram intersection

  • Novelty Detection is suitable for more complex applications, such as computer vision. You can test the likelihood that a new data point comes from the original distribution.

  • Data points of adjacent time windows can also be used to detect concept drift. The scikit-multiflow package offers various types of algorithms, including the ADWIN algorithm implemented in our example notebook below.

Monitoring could include dashboards, ad-hoc analysis and investigations, or calculations implemented directly in your system and paired with alerts for unexpected behaviors. You can find an implementation of various techniques for monitoring dataset shift in our notebook, which can be accessed here or downloaded as a file:

Last updated