MMM: Bayesian Framework for Marketing Mix Modeling and ROAS

Author:Murphy  |  View: 20607  |  Time: 2025-03-22 21:26:17

This post was co-authored with Rafael Guedes.

Introduction

Scalable internet businesses depend on Marketing to drive growth. Not only that, of course, but at a certain scale, very few companies can afford not to be extremely efficient in acquiring customers. Two hot topics that companies are investing heavily into bringing Artificial Intelligence (AI) capabilities into marketing are Media Mix Modeling (MMM) and Customer Lifetime Value (LTV) prediction. Both are focused on increasing the return on the investment organizations deploy on marketing. This article covers what MMM is and the best practices for applying it.

MMM is a technique that allows marketing teams to measure the impact of their investments and how they contribute to driving conversations. The complexity of this task has increased rapidly in the past years since the platforms available for advertising have skyrocketed. This phenomenon has spread potential customers over different media channels that we can separate into offline or online buckets. Traditional offline channels are unplugged from digital support and can range from the newspaper, radio, television ads, and coupons to a booth at a trade show. Online channels exploded, and companies use many of them together, such as email, social media, organic search, paid search, affiliate marketing, and influencer marketing.

One important caveat is that a good MMM requires an equally accurate data-driven attribution model, i.e., which channels contributed to acquiring a specific customer. Also, note that while attribution is performed at the user level, MMM is usually applied at the acquisition channel level. Data-driven attribution is out of the scope of this article.

In this article, our focus is twofold. First, we develop a Bayesian model designed to increase transparency on how each media channel performs. Secondly, we optimize the budget allocation to maximize our variable of interest, which in this case is revenue. Besides providing a detailed view of how a Bayesian approach works for MMM, we also give a walkthrough on implementing and applying it using a public dataset. We test the model accuracy and calculate each channel's Return On Ad Spend (ROAS). Finally, we optimize a hypothetical budget across three channels to maximize revenue.

Figure 1: Marketing Mix Modelling – Optimising budget across different media channels (image by author with DALL-E)

As always, the code is available on our GitHub.

Media Mix Modeling: What is it?

MMM empowers organizations across the globe by measuring the effectiveness of their advertising channels and providing transparency on how media spending impacts sales. These models play an important role in supporting the decision-making process of budget allocation across channels by optimizing a target variable of interest, such as sales, return on ad spend (ROAS), revenue, conversion, LTV, etc.

Over the past years, many studies have been performed, and several models have been proposed to try to model the influence that spending has on the variables of interest [1]. These models are based on weekly or monthly data aggregated geographically. We are interested in modeling the relationship between our dependent variables, one or many of the variables of interest defined above, and independent variables. Some independent variables are obvious, e.g., the ad spend across channels. Still, we can extend our approach to include further related effects from price, product distribution, inflation, weather, seasonality, and market competition.

The traditional approaches rely on regression methods to infer causation from correlation. Nevertheless, the response of sales to media spending is not linear – there is saturation, which means diminishing returns at high-level spending. Moreover, advertisement has a lag or carryover effect, meaning spending in previous weeks can impact sales from the following weeks.

Figure 2: Example of Ad Saturation Curve and Ad Lag effect (image by author)

Bayesian Methods for Media Mix Modeling

Bayesian methods can be defined to consider the saturation/shape and lag/carryover effects.

Before diving into the model details, let's define a hypothetical dataset for a better understanding of what variables the model takes. Suppose we have weekly data at a country level where each row represents a Week (t), and each column represents either a Media Channel (m) or a Control Variable (c) such as seasonality or product price. The media spend of channel m at week t is defined as Xt,m, and the control variable for the same week is defined as Zt,c.

Table 1: Hypothetical weekly dataset with 3 media channels, one control variable, and the target variable (Sales)

Lag or Carryover Effect

The carryover effect is modeled by a function called adstock [1]. This function creates a cumulative effect of the spending in a specific channel. It transforms its time series through a weighted average of the media spend from the current week and previous L-1 weeks. L is the maximum duration of the carryover effect for a particular media channel, and it plays an important role in estimating the weight Wm in the weighted average equation.

Equation 1: Adstock function that models the carryover effect

L can be set differently across media channels. It is a hyperparameter to be defined by an expert. If no prior information exists for a particular channel, the authors advise setting L to a large number, such as 13, to capture potentially heavily lagged effects.

The equation that defines the weight can have two different forms:

  1. Immediate/Geometric Adstock [2] when the advertisement effect peak happens at the same time as the ad exposure, i.e., we have a peak in sales in the same week we increased the spending of a media channel. In equation 2, αm is the retention rate of the ad effect.
Equation 2: Geometric decay function
  1. Delayed Adstock [1] when the advertisement effect peak takes longer to build up and does not immediately impact sales. In equation 3, θm is the delay of the peak effect.
Equation 3: Delayed Adstock function

Let's pick up our hypothetical dataset and calculate the Immediate and Delayed Adstock for the Facebook channel. To start, we added 5 more weeks to the dataset. We consider a retention rate (αm) of 80% and a peak delay (θm) of 5 weeks. After that, we calculate the weight for the immediate effect and the weight for the delayed effect to get to the final value of Immediate and Delayed Adstock at week 8.

Table 2: Immediate and Delayed Adstock calculation for Facebook

Figure 3 shows how much each week's spending contributes to the sales volume at week 8.

Figure 3: Immediate and Delayed Adstock effect for our hypothetical Facebook Spend (image by author)

Saturation or Shape Effect

The saturation or shape effect is modeled by transforming the media spends through a curvature function such as the logistic saturation function [3]. It is defined as follows:

Equation 3: Logistic saturation function

where x represents the media spends, and λ controls the steepness of the saturation curve, i.e., determines how quickly the media spend effect saturates. We can then interpret a low λ value as a more gradual increase in the response function, which translates into media spending having a noticeable effect over a large range of values. Conversely, higher λ values will result in diminishing returns on spending. Figure 4 shows these different behaviors very clearly.

Figure 4: Logistic saturation functions based on different parameters (source)

It is difficult to know which parameters we should use for the model since these are quite specific for how each channel behaves. Nonetheless, in a Bayesian approach, these parameters are estimated using prior distributions. Hence, the model selects the most likely value parameters for given data. Therefore, we must set a distribution rather than a single value.

Combining the Carryover and Shape Effect

As mentioned in the previous two sections, to model the carryover and shape effect, we need to apply the transformations to the media spending of each channel. It raises the question of which transformation should be applied first. The authors suggest to:

  • The shape effect follows the carryover if the media spending is heavily concentrated on certain periods.
  • The carryover follows the shape effect if the media spending is evenly distributed across multiple time periods.

Since organizations usually tend to concentrate their marketing activity, the most common approach is the carryover → shape ffect combination.

That said, the dependent variable sales y at week t can be modeled through a linear combination of media spending and control variables. We also use a regression coefficient β to model different effects for different media channels.

Equation 4: Modeling sales combining Carryover → Shape Effect and control variables

where

Tags: Deep Dives Machine Learning Marketing Marketing Mix Modeling Python

Comment