How to Optimize Your Marketing Budget
MARKETING ANALYTICS

Marketing Mix models are a powerful tool for understanding the impact of different marketing channels on sales. By building a marketing mix model, marketers can quantify the contribution of each channel to their overall sales, and then use this information to optimize their budget allocation.
So far, I have written an entire series about building marketing mix models, yet I still owe you an article about how to use these models to optimize media spending. Today is your lucky day since in this article, I will show you just that!

If you are new to Marketing Mix Modeling, you can start with my introductory article:
Prerequisites
Before we can optimize something, we have to build a model first. We will do it very quickly, so we can get to the main section of this article as soon as possible.
The Data
First, let us load some data. I will use the same dataset as in my old articles.
import pandas as pd
from sklearn.model_selection import cross_val_score, TimeSeriesSplit
data = pd.read_csv(
'https://raw.githubusercontent.com/Garve/datasets/4576d323bf2b66c906d5130d686245ad205505cf/mmm.csv',
parse_dates=['Date'],
index_col='Date'
)
X = data.drop(columns=['Sales'])
y = data['Sales']
The dataset looks like this:

The logic behind this table is the following: imagine you work in a company that sells some product. You can see the weekly sales of this product in the column Sales. In order to boost these sales, you spend some money on advertising, in our example TV, Radio, and Banner advertising. We now want to model the sales using advertising spending and more control variables, such as weekday, month, product price, weather, …

The Model
Building complex models such as XGBoost or deep neural networks are hard to interpret and optimize. We turn to a proven method that uses interpretable carryover and saturation effects to build a generalized additive model instead, as done here:

The carryover and saturation blocks are intuitive feature transformations:
- The carryover models that a media spending at time t might still influence the sales at times t + 1, t + 2, …, or the other way around, that sales that were observed at time t are also influenced by the spending at time t – 1, t – 2, …
- The saturation models diminishing rates of return, e.g., increasing your spending in a channel from 0 € to 100,000 € has a big impact, but changing it from 1,000,000,000 to 1,000,100,000 does not anymore.
Note: In the graphic, the control variables are omitted. This is fine because we do not need them for optimization anyway – we cannot change them anyway like the money we put into our media channels. The only control variable that we can change is the price, but we assume that it is constant here, and that we only really want to optimize our media expenditures.
So, the model has the form

for some yet-to-be-defined functions saturation and carryover. As an example, let us assume that

and

β is the saturation coefficient, λ the carryover strength, and ℓ the carryover length.
We can learn these parameters by treating them as hyperparameters, or by employing Bayesian methods and treating them as normal, learnable parameters. We have seen in detail how to get these parameters in all of my last articles about marketing mix modeling, so I will not go further into this topic.
Instead, let us assume that we have the numbers now, and we want to use them to create an optimized media budget allocation plan.
Optimizing The Media Budget
Let us assume that our previous marketing mix modeling attempts left us with the following parameters:
N = 200 # number of observations
# previous marketing mix modeling has given us these parameters
tv_coef = 10000 # α
tv_lags = 4 # ℓ
tv_carryover = 0.5 # λ
tv_saturation = 0.002 # β
radio_coef = 8000
radio_lags = 2
radio_carryover = 0.2
radio_saturation = 0.0001
banners_coef = 14000
banners_lags = 0
banners_carryover = 0.2
banners_saturation = 0.001
We will now rebuild the marketing mix model in Python using numpy.
But why? We have build a model using scikit-learn or PyMC already! Can't we reuse these?
Good question! We could use our pre-trained model and hand it over to a general-purpose Optimization algorithm that tries to find media spend inputs that maximize sales. However, this is called black-box optimization, and it has the problem that it likes to get stuck in local optima instead of finding a global optimum.
Another problem with black-box optimization is that the algorithms typically have various parameters that you have to play around with to find a good (but maybe not optimal) solution. That's why some people say that this kind of optimization is more an art than it is science.
Convexity to The Rescue
If we can formulate our problem as a convex optimization problem, we can solve it using libraries such as cvxpy that are guaranteed to find the best media budget allocation. I used this library here already to solve another optimization problem.
In order to use a convex optimization method, our model has to be convex or concave, meaning that putting a minus sign in front of the model makes it convex.

For example, if our model is y = _x_², it would be a concave function that is easy to minimize. y = 100 – _x_² would be a concave model that is easy to maximize.
I will not go into detail any further; just know that our model is in fact a concave function! In a model carryover-saturation model that we created, it is sufficient that the second derivative of the saturation function is negative, then the model is concave.

However, if we use other saturation functions such as Adbudg or other typical S-shaped functions, they might neither be concave nor convex, which makes it more difficult to optimize them.

Ok, enough of the theory. Just remember for now that our model is concave, which is great since then we can find a global optimum, i.e. a budget allocation that yields the maximum sales.
Reimplementing Our Model in Numpy
First, let us define some matrices that take care of the carryover effect.
import numpy as np
tv_carryover_matrix = sum([np.diag(tv_carryover**i*np.ones(N-i), k=-i) for i in range(tv_lags)])
radio_carryover_matrix = sum([np.diag(radio_carryover**i*np.ones(N-i), k=-i) for i in range(radio_lags)])
banners_carryover_matrix = np.eye(N)
I know that this is hard to grasp, so let's take a look at one of these matrices.

This implements a carryover with a strength of 0.2 and a length of 1. You can see this if you multiply this matrix with a spending vector.

With this out of the way, let us continue with the saturation. This is only a simple formula involving exp
, so no problem.
We can write:
sales = (
tv_coef * np.sum(1 - np.exp(-tv_saturation * tv_carryover_matrix @ data["TV"]))
+ radio_coef * np.sum(1 - np.exp(-radio_saturation * radio_carryover_matrix @ data["Radio"]))
+ banners_coef * np.sum(1 - np.exp(-banners_saturation * banners_carryover_matrix @ data["Banners"]))
)
This gives us the sum of the sales that come from our marketing efforts since we ignore the control variables here. ** The number is 3,584,648.73 €, and we want to increase it now by changing our media spendings! Spoiler: it turns out that we can increase this number by about 1.5 million to 5,054,070.21 €**. Wow! Not bad for just juggling some numbers.

About the result
You can also see an interesting pattern here.
It seems that spending your budget equally is your best bet.
Only the first period's spend is a bit higher, but then it kind of balances out. That is because of how our model works: In the first period, there is no adstock yet, so we have to invest a bit more to get our sales rolling. Starting from the second period, we only have to put so much into the model to keep the adstock high, but not too high because of the saturation. In the last period, the adstock does not matter anymore since time ends there, as far as the model is concerned. That's why the optimized budget there is lower.
Reimplementing Our Model in CVXPY
Alright, now we are ready to get to that optimal solution using cvxpy. First, we define the variables, in our case one for each channel and each timestep so 3N = 3200 = 600 variables in total.
Without anything else, the optimum would be setting all variables to infinity, so we need some constraints. The variables should all be
- non-negative, and
- we want the sum of all of these 600 variables to be smaller or equal to what we have spent historically.
Then, we want to optimize the model the model that we have implemented using numpy functions, but using their cvxpy equivalent instead, which typically means writing cp
instead of np
. We can even reuse the carryover matrix from before!
import cvxpy as cp
original_total_spends = data[["TV", "Radio", "Banners"]].sum().sum()
# declaring variables to be optimized, N=200 per channel
tv = cp.Variable(N)
radio = cp.Variable(N)
banners = cp.Variable(N)
# the constraints, positive spends and a bounded total budget
constraints = [
tv >= 0,
radio >= 0,
banners >= 0,
cp.sum(tv + radio + banners) <= original_total_spends,
]
# cvxpy formulation, the model looks like the numpy version
problem = cp.Problem(
cp.Maximize(
tv_coef * cp.sum(1 - cp.exp(-tv_saturation * tv_carryover_matrix @ tv))
+ radio_coef * cp.sum(1 - cp.exp(-radio_saturation * radio_carryover_matrix @ radio))
+ banners_coef * cp.sum(1 - cp.exp(-banners_saturation * banners_carryover_matrix @ banners))
), # like the numpy model, sum of all sales
constraints
)
We can now solve this maximization problem in very short time via
problem.solve()
# Output:
# 5054070.207463957
Nice! We can get the optimal budget via tv.value, radio.value, banners.value
. You can see that the spends are kind of constant for each week in each channel, which is maybe not as interesting as expected. But optimal is optimal, so we will take it.
We could have gotten 5 million instead of 3.6 million in the past. While this is nice to know, it is worthless now and might just upset the business. However, we can use this logic now to optimize future marketing spends as well, of course!
Further Constraints
That's it, now you have a basic budget optimization tool! And the good part is that you can model even more constraints that might come from the business. As an example, the business might say that the total radio spends are quite high:
sum(radio.value)
# Output:
# 524290.3686626207 (= 524,290.37 €)
The business wants it to be less than 300,000 €, for strategic reasons that the model cannot know. Alright, no problem, let's add it to the constraint set!
constraints = [
tv >= 0,
radio >= 0,
banners >= 0,
cp.sum(tv + radio + banners) <= original_total_spends,
cp.sum(radio) <= 300000 # new constraint
]
Easy as that. We can let the optimization run again and we end up with slightly reduced optimized sales of 4,990,178.80 €. But if we check the sum of the radio spends now
sum(radio.value)
# Output:
# 299999.9992275703
we can see that the business constraint was respected. And we can add even more constraints, such as
- the sum of two channels should be smaller or greater than some number, or
- in some weeks we don't allow any media spending.
You only have to model it using some sums and equalities or inequalities.
Conclusion
In this article, we first recapped the formulas for marketing mix models. This was important because we needed to reimplement the models. Luckily, since our models are easy and interpretable, this was no problem at all.
Our model had in fact another great property: it's concave! In this case, the maximum value of sales is uniquely defined, and we could get to it via convex optimization. Optimizing non-convex or non-concave functions is difficult in general, and more of an art that involves tuning many hyperparameters, that's why we didn't go this route.
As a grand finale, we optimized our media budget! It was about time. We have even seen how to incorporate more constraints into the model, such as that some channels need some minimum or maximum budget allocations. Using this approach, you can now optimize your future media budget allocation.
Another optimization that we did not talk about is minimizing your media budget under the constraint that you want to make a certain minimal amount of sales, i.e. spending as little money as you can to still reach your goal. This is something you can also implement yourself easily! In contrast, before we have taken all of the money we have and made as many sales as possible.
I hope that you learned something new, interesting, and valuable today. Thanks for reading!
As the last point, if you
- want to support me in writing more about machine learning and
- plan to get a Medium subscription anyway,
why not do it via this link? This would help me a lot!