Model Drift Introduction and Concepts

Author:Murphy | View: 27296 | Time: 2025-03-22 21:06:48

Models change because the world changes – Image by arptrastogi at Unsplash.com

Taxes, death and model drift are the only three certainties in life. Ok, I might have added this last one to the adage but the truth is that all models suffer from decay.

After developing a Machine Learning mode, you will always see the same pattern develop:

The model has an expected performance on the test set during development.
The model behaves differently (generally, a bit worse) after going into production.
The model's performance degrades over time.

After a couple years, there's a high likelihood that your model's performance is way worse than when you first developed it. This can happen for a multitude of reasons, but the fundamental cause is that the world changes.

When the world changes, the data we use to represent real-life information also changes. The underlying data distributions shift, which will inevitably impact how our machine learning models learn and perform.

In this blog post, we'll examine examples of situations where underlying changes in the world can impact your models. Understanding these examples will better prepare you to develop plans and explain to your leadership why machine learning operations (MLOps) are so important today, defining the success of a data-driven organization.

Data Distribution Shift

Data distribution shift is the most common cause of model drift.

This occurs when the distribution of one of your variables shifts significantly compared to the time of development. Let's assume that we have a model that uses the age of a user as a feature for predicting app churn[1] with the following distribution:

Age Distribution at Time t – Image by Author

[1] Churn is the act of a customer deciding to stop using a service, voluntarily

Time t is when you developed the model. Now, imagine that the age distribution of the user base you are studying shifts to something like the following after a year:

Age Distribution at Time t vs. t+1— Image by Author

The user base is now much older than it was at time t. This inherently leads to worse performance by your algorithm. Your model is accustomed to a certain data distribution that no longer reflects current reality. If the shift is too large, you might end up predicting on examples for which you had a very small sample size during training.

This effect gets even worse if you use models that are highly dependent on feature distribution (for example, tree-based models use distribution sample size to make their cutoff on the branches).

Now, consider models that have hundreds of features where every feature will suffer from distribution shift – the impact on your models is enormous.

Catching data distribution shift is relatively easy:

Store the parameters of the distribution at the time of training: mean, median, standard deviation, IQR, kurtosis, etc.
Compare these values for the new data you are predicting on. The more these values deviate from each other, the greater the impact.
Don't forget that you need to compare central tendency and spread measures simultaneously. Two distributions can have the same mean or median but completely different shape.

Monitoring tools such as Mlflow or Azure Machine Learning Studio will help you keep track of your features' data distribution. Using these tools, it's quite normal to automatically retrain models with triggers based on certain distribution shifts.

Edge Cases

Another common situation that affects machine learning models in production is the issue of edge cases. For example, let's imagine that you are forecasting sales for a specific product. Ideally, you would like the world to remain static, with the factors affecting the sales of your product staying constant.

However, edge cases may suddenly arise and undermine your entire model. An abnormally low or high value in your sales can break your predictions. This happened during the COVID-19 pandemic, where edge cases started to become the norm, rendering most machine learning models useless.

Although this is a very obvious edge case situation, sometimes edge cases are trickier to spot. Typically, it requires a deep understanding of the underlying phenomena we are working with. Nevertheless, there are a couple of actions you can take:

Apart from checking a distribution, monitor obvious outliers in your variables. This is what we normally call uni-dimensional edge-cases.
Also, look at a combination of variables and features that may be edge case when looked together. This is what we normally call multidimensional edge cases.

However, looking at outliers shouldn't be the only way to spot edge cases. The "difficult" edge cases to identify are normally not outliers and need to be understood with the help of business users who understand the context of the underlying data.

Feedback Loops

In certain scenarios, your model acts on features of other models or even on itself, as in the case of time-series models.

Typically, if you have model outputs that work as features, this will produce some level of feedback loop. Checking the output values of your models and keeping track of their distribution should be enough to understand the likelihood of producing a feedback loop.

Particularly, it's important to measure explainability metrics such as SHAP values to understand if the influence of features generated by models is changing over time. Ideally, you should try to avoid involving any feature that is produced or can be influenced by the output of a machine learning model. If you can't avoid this due to certain business or technical decisions, then monitoring becomes absolutely critical.

Missing-as-a-Feature

In model development pipelines, we typically perform a step related to missing value imputation, enabling your model to deal with non-existent information. During model development, it is common practice to examine the number of missing values in your features and address them only if the proportion of missing data in a column falls below a specified threshold, discarding the feature if it's over that value.

A common problem arises when you put your model into production and one of the features starts receiving more missing values than expected. What will happen? You will start to predict based mainly on missing information (and your imputation method)!

This is a pattern commonly called "missing-as-a-feature" Essentially, you are just using your imputation method as values for the features in your predictions.

Missing as a Feature Example – Image by Author

Notice that tracking distribution is not enough to spot missing-as-feature scenarios. You need to explicitly monitor the % of missing values going into your model before the imputation step in your pipeline.

That's it! Thank you for taking the time to read this post. Model drift is a recent area where a lot of research is actively being developed – namely, how to use automatic triggers to retrain your model, or understand if you should train a new model from scratch.

There are many tools available that will help you manage your post-production models more effectively. The following is not an exhaustive list but contains some of the most well-known frameworks that you can experiment with:

https://mlflow.org/ – Open Source Mlops platform with awesome features. If you are curious, check out the repo of the library.
https://azure.microsoft.com/en-us/products/machine-learning/ – Azure ML Studio. If you want to get a bit fancy and play around with a product from Microsoft, Azure MLStudio has plenty of functionalities (based on MLFlow) that integrate quite well with the rest of the Azure stack.
https://github.com/NannyML/NannyML – Post deployment open source library with a lot of functionalities related with Model drift.

See you on the next post!

Feel free to visit me on my youtube channel, Udemy profile, or substack: