How to Create an RFM Model in BigQuery

Author:Murphy  |  View: 30010  |  Time: 2025-03-22 22:32:02
Created via DALL-E

In this article we will cover:

  • What RFM Segmentation is and its importance in marketing
  • How to create the RFM quantiles in Bigquery
  • How and what RFM segments you can create from the RFM quantiles
  • Considerations for your own RFM model

*Note – all data used in this article is fictional and generated by myself in BigQuery.

Let's start with the basics, what even is an RFM Model?

An RFM (Recency, Frequency, Monetary) model is a customer segmentation technique that uses past purchase behaviour to divide customers into distinct groups.

  • Recency measures how recently a customer made a purchase
  • Frequency assesses how often a customer transacts
  • Monetary looks at how much a customer spends

These three segments alone can improve your understanding of your customer base, but you can combine these together to form segments. These segments help you identify which customers are your best customers, those who are slipping away, or others who are engaged but spend less.


Getting Started

Time for an example, and to keep it simple we'll just focus on the frequency segment for the moment.

Imagine you're in charge of a large community garden, where hundreds of gardeners come to plant and care for the plants. To understand which gardeners are your most frequent, you decide to log when each gardener visits.

Created via DALL-E

With our new data log of say 100 gardeners, we can count how many times each visited the garden, then split them into five equal parts known as quintiles.

Just so it seems a little more human, we could give each quintile a name:

  1. First Quintile – The Rare
  2. Second Quintile – The Occasional
  3. Third Quintile – The Steady
  4. Fourth Quintile – The Committed
  5. Fifth Quintile – The Devoted

Below is a query which calculates the frequency quintiles and provides an aggregated view of each segment.

WITH rfm_groupings AS (
SELECT
    gardener_id,
    total_visits,
    NTILE(5) OVER (ORDER BY total_visits) AS frequency_rank
FROM
    (
    SELECT
        gardener_id,
        COUNT(date_visited_garden) AS total_visits
    FROM
        `medium_examples.gardener_visits`
    GROUP BY
        gardener_id
    )
)

SELECT
    frequency_rank,
    COUNT(gardener_id) AS num_gardeners,
    MIN(total_visits) AS min_frequency,
    MAX(total_visits) AS max_frequency
FROM
    rfm_groupings
GROUP BY
    frequency_rank
ORDER BY
    frequency_rank

Query Breakdown

  1. The CTE: It begins by creating a temporary table (WITH clause) named rfm_groupings. This table is going to help us organise our gardeners based on how often they visit the garden.
  2. Count the Visits: Inside this temporary table, we count how many times each gardener has visited the garden total_visits.
  3. Rank the Gardeners: Next, we use the NTILE(5) function, which splits all our gardeners into 5 equal groups based on their visit frequency.
  4. Summarise the Groups: After we've got our gardeners neatly grouped, the final step is to look at each frequency group and find out how many gardeners are in it (COUNT(gardener_id) AS num_gardeners), as well as the least (MIN(total_visits) AS min_frequency) and the most (MAX(total_visits) AS max_frequency) frequent visits in each group.
An aggregated view of the frequency segment

Above is the result. An aggregated view which shows how these gardeners were split into five groups based on how frequent they visit the garden.

Now, you usually don't use the RFM model to create an aggregated view like I've shown above, but I've added this here just so you can get an idea of what's happening behind the curtain.

Let's talk through the data. The 20 gardeners we call "Devoted" were super busy bees, visiting their plots between 138 and 167 times last year. That's a lot of gardening!

On the flip side, we've got the "Rare" visitors. These folks made it to the garden only 3 to 27 times in the same period. They're not around much, but there's always a spot for them to grow something when they can make it.

This gives us the F part of our RFM model!

With frequency covered, next up, "Monetary" and ** "Recency"**.


Including Monetary and Recency

Created via DALL-E

Creating the "Monetary" and "Recency" metrics for the RFM model is quite straightforward as we already have the structure laid out from creating "Frequency":

  • Monetary Metric: To determine each gardener's total spend, we simply add up their spend when visiting the community garden.
  • Recency Metric: To calculate recency, we find out how many days have passed since each gardener's last visit. The fewer days since the last visit, the better, as it indicates more recent engagement.

The Query

WITH rfm_groupings AS (
SELECT
    gardener_id,
    NTILE(5) OVER (ORDER BY days_since_last_visit desc) AS recency_rank,
    NTILE(5) OVER (ORDER BY total_visits) AS frequency_rank,
    NTILE(5) OVER (ORDER BY total_spend) AS monetary_rank,
FROM
    (
    SELECT
        gardener_id,
        DATE_DIFF(current_date() - 1, max(date_visited_garden), day) as days_since_last_visit,
        COUNT(date_visited_garden) AS total_visits,
        SUM(spend_during_visit) as total_spend,
    FROM
        `medium_examples.gardener_visits`
    GROUP BY
        gardener_id
    )
)

SELECT
    concat(recency_rank, frequency_rank, monetary_rank) as rfm_segment,
    recency_rank,
    frequency_rank,
    monetary_rank,
    COUNT(gardener_id) AS num_gardeners,
FROM
    rfm_groupings
GROUP BY
    recency_rank, 
    frequency_rank, 
    monetary_rank

Query Breakdown

Recency Ranking with a Twist: For recency_rank, it orders by days_since_last_visit in descending order (desc). This is because a lower number of days since the last visit signifies more recent activity, which is desirable. However, NTILE naturally segments in ascending order, so to correctly align our ranking with the RFM model's logic—where the most recent visits are more valuable—the order is inverted.

Combining RFM Segments: The final SELECT statement combines the individual ranks into a single rfm_segment, counting the number of gardeners (num_gardeners) within each unique RFM combination. This allows us to see how gardeners cluster within the RFM framework, identifying high-value individuals or those who might need more engagement.

10 Segments shown as an example of the RFM model

Putting Our RFM Model to Work

Now that we've got our RFM model ready, it's time for the interesting part: figuring out how to use it and what it can tell us about our customers.

Remember, the whole point of breaking down our customer base into these segments is so we can get to know them better and make our Marketing messages hit home more effectively.

Note – I've generated a new larger dataset to help demonstrate the RFM model going forward. No particular theme in mind this time.


In the Rfm model, a score of 5 is like an A+, and a 1 is, well, not so great. So, our all-stars are the ones scoring between 4 and 5 across the board. These customers are recent shoppers, they visit often, and they don't hold back on spending.

To figure out where everyone fits, you can build a case statement to group these together. If there's a group we haven't named yet, we just keep refining our case statement until everyone's included.

Below is a starting point, notice how I have ‘Other‘ defined in the ELSE part of the case statement. This allows us to add a segment at a time, while keeping an eye on what RFM combinations haven't yet been captured in any of our segments by using string_agg.

SELECT
    case 
        when    
            recency_rank between 4 and 5 
            and frequency_rank between 4 and 5 
            and monetary_rank between 4 and 5 
        then "Top Tier" 
        else "Other" 
    end as clean_rfm_segment,

    string_agg
        (
        distinct concat(recency_rank, frequency_rank, monetary_rank) 
        order by concat(recency_rank, frequency_rank, monetary_rank) desc
        ) as captured_rfm_segments,

    count(customer_id) as total_customers
FROM
    rfm_groupings 
GROUP BY
    clean_rfm_segment

Looking at our Top Tier customers above. They fit into 8 specific RFM combos, making up about 14% of our customer base – that's 2,079 people who are really into what we're offering.

Finishing the RFM Segmentation

Through some trial and error, I've expanded the case statement to encompass all RFM combinations into meaningful segments.

    case 
        when    
            recency_rank between 4 and 5 
            and frequency_rank between 4 and 5 
            and monetary_rank between 4 and 5 
        then "1. Advocates"

        when
            recency_rank between 3 and 5 
            and frequency_rank between 3 and 5 
            and monetary_rank between 3 and 5 
        then "2. Loyalists"

        when
            recency_rank between 3 and 5 
            and frequency_rank between 2 and 5 
            and monetary_rank between 1 and 5 
        then "3. Promising"

        when
            recency_rank between 3 and 5 
            and frequency_rank = 1
            and monetary_rank between 1 and 5
        then "4. Need Attention"

        when    
            recency_rank between 2 and 3 
            and frequency_rank between 1 and 5 
            and monetary_rank between 3 and 5 
        then "5. Losing High Value" 
        when    
            recency_rank between 2 and 3 
            and frequency_rank between 1 and 5
            and monetary_rank between 1 and 3
        then "6. Losing Low Value" 
        when    
            recency_rank = 1 
        then "7. Lost" 
    end

Photo by Nik on Unsplash

Understanding The Customer Segments

Advocates

  • Who They Are: Recent, frequent shoppers with high spending.
  • Strategy: Ideal for exclusive rewards and further engagement opportunities.

Loyalists

  • Who They Are: Customers showing strong recent engagement and significant spending.
  • Strategy: Deserve recognition and retention efforts, perfect for loyalty programs.

Promising

  • Who They Are: Recent customers showing potential but with lower frequency and spend.
  • Strategy: Need encouragement to boost their shopping frequency and spending.

Need Attention

  • Who They Are: Customers who have made a recent purchase but have only done so once.
  • Strategy: An opportunity to introduce these typically new customers to the brand and products, encouraging deeper engagement.

Losing High Value

  • Who They Are: Once frequent and high-spending customers now showing decreased recency.
  • Strategy: Require personalized re-engagement strategies and incentives to rekindle their interest and loyalty.

Losing Low Value

  • Who They Are: Customers similar to the high-value segment but with overall lower spending.
  • Strategy: Also in need of re-engagement efforts, though strategies may differ given their lower value.

Lost

  • Who They Are: Customers who have not made a purchase in a long time, making them unlikely to return.
  • Strategy: Focus efforts on more engaged segments, as re-engaging lost customers may not be resource-efficient.

Wrapping Up

The RFM model is a potent, yet simple tool for better understanding customer behaviour and enhancing marketing strategies. It's key to remember that there isn't a one-size-fits-all approach to RFM Segmentation; many adjust the number of segments from three to ten based on your customer base size and desired level of detail.

The examples provided are meant to illustrate the concept and should be customized to fit your business's unique needs and the nature of your customer interactions. While I've utilised the NTILE function for ease, manually setting segment criteria allows for tailored segmentation that aligns more closely with your specific business model.

Ultimately, RFM segmentation is about unlocking deeper customer engagement and crafting more effective marketing campaigns. By aligning your segmentation strategy with your business's distinct attributes and goals, you can maximize the impact of your marketing efforts.

Photo by Jason Wong on Unsplash

That brings this article to an end. If you have any questions or challenges, please feel free to comment, and I'll reply when I can.

I frequently write articles for BigQuery and Looker Studio. If you're interested, consider following me here on Medium for more!

All images, unless otherwise noted, are by the author.

Stay classy folks! Tom

Tags: Bigquery Customer Value Marketing Rfm Segmentation

Comment