Bayesian Marketing Mix Modeling in Python via PyMC

Author:Murphy  |  View: 22260  |  Time: 2025-03-23 18:56:59

Marketing Analytics

Photo by Greg Rakozy on Unsplash

In this article, I want to combine two concepts discussed in earlier posts: Bayesian modeling and Marketing Mix Modeling. Since the chances are high that you are unfamiliar with both topics, let me give you a quick introduction and further readings. I will

  1. motivate what marketing mix modeling is,
  2. what Bayesian modeling is, and
  3. why it makes sense to combine both.

Then, I will show you how to do it in practice using PyMC.

Preliminaries

If you are an eager reader of my articles (thank you!), you can skip a few sections to get right to the code. Otherwise, please just continue reading.

Marketing Mix Modeling

A fundamental problem for every business is to decide on which channels to spend the marketing budget on. You could spend 1,000€ on TV ads, 2,000€ on radio ads, and 3,000€ on web banners each day, just going with your gut feeling. But is this any good?

Maybe the web banner channel is saturated already, and spending only 1,500€ there is just as good as 3,000€. Then you could save 1,500€, or put them into other, better-performing channels for more sales.

Or maybe some channel even has a negative ROI – for each Euro you spend there on ads, you get less than one Euro back in return. We should definitely not waste too much money on such a channel, at least if it is not strategically important from a business perspective.


To answer questions like this, you must understand how the different media spendings (TV, radio, …) impact your sales or other KPIs of interest.

In marketing mix modeling, you start with a dataset of media spendings. It usually gets extended with some control variables, i.e., further information about anything that might impact the target KPI, such as holidays, the weather, soccer championships, lockdowns, the price of a product, and many more. To keep it concise, we will omit the control variables here. Then, of course, you need a KPI you want to predict. This is often sales, the number of new customers, and such. So, a typical dataset might look like this:

Image by the author.

In my older articles, I describe the motivation and how to conduct marketing mix modeling in more detail. In order to understand the rest of this article, please check them both out here:

Introduction to Marketing Mix Modeling in Python

An Upgraded Marketing Mix Modeling in Python

Bayesian Modeling

Many estimators and models stem from a maximum likelihood approach. As an example, imagine that you want to estimate the probability p of a coin showing heads. You flip it 10 times and see 8 heads; what do you conclude? A natural estimate for the probability is then p = 8 / 10 = 80%, which also happens to be the maximum likelihood estimate. You could also calculate a confidence interval to see if this estimate is reliable, but we want to take another path.

Imagine that we want to incorporate some prior knowledge about the probability p. If you have drawn the coin randomly from your wallet, for example, there is no reason to think that the coin is biased, so p should not be too far away from 50%, assuming that you are not a magician.

With Bayesian modeling, you can incorporate this prior knowledge to end up with a density estimate of p, i.e., not a single value but an entire distribution. This distribution will likely peak between the maximum likelihood estimate and the prior, maybe 65%.

The mode of the red curve is the maximum likelihood estimate. Image by the author.

All in all, Bayesian modeling is about finding a trade-off between prior knowledge and observed data. In the picture above, this means the following: without any data, we start off with the blue curve. This is just a belief, a gut feeling. Then, we observe data that tells us to move the blue curve more toward the red one. We end up with the yellow hybrid curve depicting the so-called posterior distribution.

You can read more about the motivation here:

A gentle Introduction to Bayesian Inference


Now, understanding the theory is good, but we also have to be able to apply it to get things done. Usually, I use the awesome Python library PyMC for Bayesian modeling. You can see it in action here:

Bayesian Linear Regression in Python via PyMC3

Why Marketing Mix Modeling with Bayes?

You can define marketing mix models with a lot of hyperparameters:

  • the saturation
  • the carryover strength
  • the carryover length

Then, you can use a hyperparameter optimization method to find the optimal combination. I have done that in my other article about marketing mix modeling, An Upgraded Marketing Mix Modeling in Python.

This approach works fine, but there is a thing about it that I don't like:

The hyperparameter estimates are often unstable.

This means that entirely different sets of hyperparameters might yield equally good models. There might be

  • Model A, with a TV carryover strength of 0.4 and a TV saturation of 0.8, and
  • Model B, with a TV carryover strength of 0.9 and a TV saturation of 0.5,

both having the same _r_² or MAPE on the test set. From a prediction standpoint, both models are interchangeable if you stay within the marketing spend boundaries you have seen so far.

However, extrapolating using Model A differs entirely from extrapolating with Model B. This is a rather unsatisfying and troublesome behavior because extrapolation is the thing to do when optimizing the media budget. If you used to spend 0€ to 1,000€ per day on TV ads historically so far, for the optimization, you have to know what happens when you spend 5,000€ or even 10,000€ as well.

You need a model with excellent extrapolation capabilities.

And usually, you have to choose between more than only two models. In this case, you can proceed in at least two ways:

  • You can pick the first model you create because you are not even aware of the problem. This approach is straightforward but dangerous.
  • You can pick a model that seems right to you, some domain experts, or the stakeholders. For some people, this is okay, but I prefer not baking the output expectations into the model because of the following question:

If somebody already knows the answer, why would I even build a model replicating that answer?


There might be ways to do something reasonable from here as well but I want to show you how to circumvent this problem using Bayesian modeling now.

Modeling Start!

First, let us grab our dataset.

import pandas as pd

data = pd.read_csv(
  'https://raw.githubusercontent.com/Garve/datasets/4576d323bf2b66c906d5130d686245ad205505cf/mmm.csv',
  parse_dates=['Date'],
  index_col='Date'
)

X = data.drop(columns=['Sales'])
y = data['Sales']

Then, we must define the saturation and carryover functionality, similar to the last article. In the PyMC language, it might look like this:

import pytensor.tensor as pt

def saturate(x, a):
    return 1 - pt.exp(-a*x)

def carryover(x, strength, length=21):
    w = pt.as_tensor_variable(
        [pt.power(strength, i) for i in range(length)]
    )

    x_lags = pt.stack(
        [pt.concatenate([
            pt.zeros(i),
            x[:x.shape[0]-i]
        ]) for i in range(length)]
    )

    return pt.dot(w, x_lags)

The saturation function should be easy to grasp. The carryover, however, is a bit of work. Basically, you can express the carryover transformation as a matrix-vector multiplication. You just have to assemble the matrix x_lags and the vector w first. As an example, we can transform the input vector x = (_x_₁, _x_₂, _x_₃, _x_₄) with a carryover length of 3 via

Image by the author.

The carryover function in the code above does exactly this. If we have these functions, we can finally start modeling.

import pymc as pm

with pm.Model() as mmm:
    channel_contributions = []

    for channel in X.columns:
        coef = pm.Exponential(f'coef_{channel}', lam=0.0001)
        sat = pm.Exponential(f'sat_{channel}', lam=1)
        car = pm.Beta(f'car_{channel}', alpha=2, beta=2)

        channel_data = X[channel].values
        channel_contribution = pm.Deterministic(
            f'contribution_{channel}',
            coef * saturate(
                carryover(
                    channel_data,
                    car
                ),
                sat
            )
        )

        channel_contributions.append(channel_contribution)

    base = pm.Exponential('base', lam=0.0001)
    noise = pm.Exponential('noise', lam=0.0001)

    sales = pm.Normal(
        'sales',
        mu=sum(channel_contributions) + base,
        sigma=noise,
        observed=y
    )

    trace = pm.sample(tune=3000)

All parameters (no hyperparameters anymore!) are defined via parameter = pm. ...: the regression coefficients coef, saturation strength sat, carryover strength car, the baseline base, and the noise noise.

Note that I did not take the carryover length into account; I instead set it to 21. This is because I could not yet figure out how to create matrices and vectors in the carryover function with variable dimensions in PyMC. But usually, I would use a Poisson random variable for the carryover length. Leave me a note if you know how to do this properly!

Tags: Bayesian Statistics Editors Pick Hands On Tutorials Marketing Analytics Marketing Mix Modeling

Comment