Choosing the Right Path: Churn Models vs. Uplift Models

Let's imagine that we work in e-commerce and the product manager comes to us and asks to build the Churn model.
But what in reality he asked us ?
What can the churn model give us ?
Well, it's all about understanding the likelihood of a specific customer leaving us. Our next steps are are driven by heuristic:
If we provide a discount to the clients who are likely to churn then they will stay.
However our goal is a bit different . Let's imagine that we can do only two things : give a treat or not give a treat , in our case it's gonna be a discount. There are 4 possible outcomes.

- Margin giveaway. We offered a discount, the user used it and purchased an item, but the user would have made the purchase even without the discount.It's a negative outcome because margin has been given away.
- Treatment costs. We offered a discount, but the users haven't used it and haven't made any purchases. This is also considered a negative outcome because we incur costs for treatments such as sending SMS messages, especially when dealing with a large user base.
- Success. We offered a discount, the user used it, and made a purchase solely due to the offer. This is the desired outcome we are aiming for.
- Lost customer. We offered a discount, but the user ended up leaving us. For example, in the case of a subscription service, the user received a notification with a discount, only to realize they had been paying for the subscription for the past 6 months and decided to cancel. This is the most negative outcome we can encounter.
Our true goal is not to estimate the probability to churn but we aim to apply the most appropriate treatment to each user.
How do we start achieving this goal ?
To begin, it is crucial to conduct a simple AB test. This involves providing a discount to one group while maintaining a control group without any discount.
After the experiment we have three primary approaches.
Two-Model Approach
The first approach involves building two separate models: one for the control group (without any discount) and one for the treatment group (with an discount). To build these separate models we can choose any type of ML model.
By running each client through both models, we can calculate the uplift as the difference between the predicted outcomes.
Pros:
- It is easy to implement.
Cons:
- It does not directly predict uplift. We estimate the probability of the users' action (purchase).
- The two-model setup introduces double error modeling, as both models have their own errors, leading to larger overall errors.
Target Transformation
The second approach revolves around transforming the target variable itself. By creating a new target that represents uplift, we can calculate the desired outcome directly.
We introduce a new target variable using the following formula:

Here, Y represents the original target variable, and W indicates whether the target treatment was applied or not. In other words Y represents whether the discount was given or not, and W indicates whether a purchase was made or not.
The transformed variable Z takes the value of 1 in two cases :
- The user belongs to the target group (W = 1) and Y = 1 ( the discount was given to the user and he has purchased ).
- The user belongs to the control group (W = 0) and Y = 0 ( the discount wasn't given to the user and the user hasn't purchased).
Then we just need to train the model ( for example logistic regression) with a new target.
To calculate uplift, we can use the following formula:
Pros :
- It is still easy to implement.
- It's more robust and stable than the first approach due the fact that we have only one model.
Cons :
- It still does not directly predict uplift. We predict the transformed variable.
Tree-Based Models
The third approach capitalizes on tree-based models.
The goal is to identify the subpopulations within a dataset that are most responsive to the treatment, thereby enabling targeted interventions for maximum impact.

The example decision tree for uplift purposes is depicted in the highlighted image above. The red color indicates the uplift values. By observing the image, we can conclude that the overall uplift difference is 0.0127 (based on a random metric). However, as we descend into the tree, we observe certain subpopulations exhibiting higher uplift differences.
These subpopulations become our target as they hold the potential for maximum benefits.
How to build this tree ?
There are numerous tutorials available on constructing decision trees, but here I will outline the basic approach.
- Select features and identify the target variable, which, in our case, is uplift.
- Choose a splitting criterion to determine how nodes are divided.
- Build the tree by recursively repeating the splitting process until a stopping criterion is met.
It's worth noting that there are three commonly used splitting criteria for building uplift trees, listed below in order of popularity:
- KL divergence
- Chi-Square
- Euclidean Distance
Pros :
- One of the most accurate methods
- We have a decision tree , therefore we can construct the forest of trees and different ensembles that increase the accuracy and reduce variance.
Cons :
- It's a decision tree method , therefore the algorithm tends to overestimate the categorical variables with many levels. To fix it we can use mean imputation.
Conclusion
Now we know that addressing customer churn requires strategies that go beyond just estimating the probability of churn. The ultimate goal is to apply the most appropriate treatment to each user and deliver business impact instead of churn probability.
Uplift modeling, which can be applied to various business challenges beyond churn, offers a powerful solution with immediate business impact.
There are still a lot of intriguing questions about uplift modelling such as handling multiple treatments, estimating different uplift models, and utilizing multi-armed bandits for production, but I will keep answers for the next post.