SQL Mastery: Advanced Techniques for Data Professionals

Author:Murphy  |  View: 22269  |  Time: 2025-03-22 22:15:11

During my tenure as a lead data analyst at Chime, three crucial SQL techniques— Window Functions, Regex, and CTEs – significantly advanced my capabilities, propelling me from intermediate proficiency to the expertise required for a lead analyst role. This article details these so you can up-level your skills and unlock new dimensions in data exploration.

Window Functions

A window function (or analytic function) makes a calculation across multiple rows that are related to the current row, and lets you calculate things like:

  • Rankings
  • Running totals
  • 7-day moving averages (i.e. average values from 7 rows before the current row)

Creating rankings with window functions is an extremely powerful technique in Analytics and data science. Consider for this transactions dataset, where we have transactions made by customers.

A sample transaction table screenshot, dummy data created by me using ChatGPT.

Ranking Window Functions:

A ranking window function allows us to add a column to generate a rank for each customer's first, second, third etc. transaction. We could also add a ranking for their biggest to smallest transaction by amount

  • RANK() assigns a rank to each row within a partition based on specified criteria.
  • PARTITION BY divides the result set into partitions, and ranks are calculated separately for each partition.
  • ORDER BY determines the order in which rows are ranked within each partition, with earlier rows receiving lower ranks.
SELECT *
, row_number() OVER (PARTITION BY user_id ORDER BY amount desc) AS transaction_amt_rank
FROM transactions;
Here we have our transaction_amt_rank which ranks each customer's transactions by amount descending (largest = rank 1).

There are a variety of different window functions that you can use in SQL that have a few differences: rank(), dense_rank(), row_number() all vary, personally I like to use row_number() for it's lack of repetition. For a full explainer, check out Data Camp's cheat sheet


Regex

What is Regex? Regex is a type of data magic✨ and a powerful tool for pattern matching and text manipulation.

Regex stands for "regular expressions" and is a sequence of characters, used to search and locate specific sequences of characters that match a pattern.

Image created by me, using DALL-E. This is DALL-E's interpretation of Regex.

If we look at our dummy transaction data, we have a list of merchant names, but most of them have extra stuff tacked on, like "TARGET #5326." This issue is very common in transaction data, where each retail location is unique and merchant_name often contains extra info, but in an analysis of spending behaviour, we only care that the person went to "Target"

TL;DR want all of the "Target", regardless of the numbers or words after them, to just say "Target." That's where regex is an extremely powerful tool to have in your arsenal.

Let's look at the code for how we could accomplish this:

Sql">REGEXP_REPLACE(source, pattern, replacement [, flags])

Comment