SQL Explained: Common Table Expressions

Author:Murphy  |  View: 29961  |  Time: 2025-03-22 21:37:33

Common Table Expressions (or CTEs as they are known) in SQL are temporary, named result sets containing intermediate data that are derived from another SQL query. Once you have data in a CTE, you can reference that data one or more times within the same query.

Given the above description, you might think that CTEs sound a lot like regular temporary tables in SQL and in some respects they are. So, why would you use a CTE rather than a temporary table? To answer this we need to take a look at two of the main disadvantages of temporary tables.

One is that temporary tables can contribute to more complex code, especially if they're used across different parts of a large SQL script. They need explicit creation, deletion and possibly indexes built on them, adding overhead to your SQL and session management.

Secondly, temporary tables consume physical storage which might be a consideration if space is tight and you have lots of them. Additionally, when looking at queries that use temporary tables it might not be clear what data is contained in the temporary table and where the data came from.

CTEs don't have the above issues. First off, they are ephemeral so that once the SQL session has ended the CTE goes out of scope and any memory it uses is freed up.

Secondly, you can see exactly what data is "in" a CTE. Its creation and population are right there in your SQL script.

Now, as well as their advantages, CTEs do have certain disadvantages and probably shouldn't be used in the following cases.

  • You have to refer to the data that would be contained in a CTE more than once or twice in your query. This is because the CTE has to be re-populated each time it's referenced. However, if your CTE data volume is small, even this needn't be a bar from using one.
  • Related to the above, CTEs cannot be indexed so will not be as performant as an indexed temporary table if the data volume is large.

So, to summarize, use CTEs in the following scenarios.

  • You want to avoid using temporary tables
  • The CTE data volume is relatively low
  • You are referring to the CTE data in your query just once (maybe twice depending on the data volume)

One last advantage of CTEs I haven't mentioned yet is that many modern SQL dialects implement the ability to have recursive CTEs, i.e. CTEs that refer to themselves. Naturally, this means it's much easier to code recursive and hierarchical-based SQL queries. We'll see some examples of that later on.

Now that we have a fuller understanding of what CTEs are and do, it's time to take a look at some examples of their use.

NB: certain SQL operations shown below that include date and date arithmetic operations may not apply to all database SQL dialects. Check your SQL documentation if you're using a database other than Oracle

Setting up a test environment.

I use Oracle's live Sql website to run my tests. I explained how to get access to and use this service in a previous article I wrote on using Grouping Sets, Rollup and Cube in SQL. It's completely free to set up and use. The link to that article is below.

SQL Explained: Grouping Sets, Rollup, and Cube

Sample table creation & population

For the non-recursive CTE examples, we'll use a table of customer transactions. Here are the table creation and data insert statements needed to re-create the input table and data.

CREATE TABLE transactions (
    TransactionID INT PRIMARY KEY,
    CustomerID INT,
    Amount DECIMAL(10, 2),
    TransactionDate DATE
);

INSERT INTO Transactions (TransactionID, CustomerID, Amount, TransactionDate)
VALUES (1, 1001, 150, TO_DATE('2021-01-01', 'YYYY-MM-DD'));

INSERT INTO Transactions (TransactionID, CustomerID, Amount, TransactionDate)
VALUES (2, 1002, 200, TO_DATE('2021-01-04', 'YYYY-MM-DD'));

INSERT INTO Transactions (TransactionID, CustomerID, Amount, TransactionDate)
VALUES (3, 1001, 100, TO_DATE('2021-01-04', 'YYYY-MM-DD'));

INSERT INTO Transactions (TransactionID, CustomerID, Amount, TransactionDate)
VALUES (4, 1003, 250, TO_DATE('2021-01-05', 'YYYY-MM-DD'));

INSERT INTO Transactions (TransactionID, CustomerID, Amount, TransactionDate)
VALUES (5, 1002, 300, TO_DATE('2021-01-05', 'YYYY-MM-DD'));

INSERT INTO Transactions (TransactionID, CustomerID, Amount, TransactionDate)
VALUES (6, 1003, 180, TO_DATE('2021-01-08', 'YYYY-MM-DD'));

INSERT INTO Transactions (TransactionID, CustomerID, Amount, TransactionDate)
VALUES (7, 1001, 190, TO_DATE('2021-01-08', 'YYYY-MM-DD'));
SELECT * FROM Transactions

TRANSACTIONID CUSTOMERID AMOUNT TRANSACTIONDATE
============= ================= ===============
1             1001       150    01-JAN-21
2             1002       200    04-JAN-21
3             1001       100    04-JAN-21
4             1003       250    05-JAN-21
5             1002       300    05-JAN-21
6             1003       180    08-JAN-21
7             1001       190    08-JAN-21

The syntax for a standard CTE is surprisingly simple, just,

WITH cte_name [(column_list)] AS (cte_query_definition) statement

Where:

  • cte_name is the name given to the CTE
  • column_list is an optional list of column names for the CTE
  • cte_query_definition is a query that defines the CTE's result set
  • statement is a single SELECT, INSERT, UPDATE, DELETE, or MERGE statement that references the CTE

Test 1 – A simple CTE

Retrieve all customers who have a total spend of greater than $250

WITH CustomerTotals AS (
    SELECT CustomerID, SUM(Amount) AS TotalSpent
    FROM Transactions
    GROUP BY CustomerID
)
SELECT CustomerID, TotalSpent
FROM CustomerTotals
WHERE TotalSpent > 250;

CUSTOMERID TOTALSPENT
========== ==========
1001       440
1002       500
1003       430

In this case, the non-CTE version of this query is also straightforward. Just this,

SELECT CustomerID, SUM(Amount) AS TotalSpent
FROM Transactions
GROUP BY CustomerID
HAVING SUM(Amount) > 250;

Test 2 – A more complex CTE

Where CTEs come into their own is if you have a query similar to this.

Retrieve customers whose average spend is greater than the average spend for all customers

Using CTEs, a query like this becomes relatively easy. The two aggregate values we require can naturally be split out into two individual CTEs and we then simply compare the results of both to determine what to retrieve.

WITH CustomerAverages AS (
    SELECT CustomerID, AVG(Amount) AS AvgAmount
    FROM Transactions
    GROUP BY CustomerID
),
OverallAverage AS (
    SELECT AVG(Amount) AS OverallAvg
    FROM Transactions
)
SELECT a.CustomerID, a.AvgAmount
FROM CustomerAverages a, OverallAverage o
WHERE a.AvgAmount > o.OverallAvg;

CUSTOMERID AVGAMOUNT
========== =========
1002       250
1003       215

Compare this with the non-CTE version,

SELECT CustomerID, AVG(Amount) AS AvgAmount
FROM transactions
GROUP BY CustomerID
HAVING AVG(Amount) > (
    SELECT AVG(sub.AvgAmount)
    FROM (
        SELECT AVG(Amount) AS AvgAmount
        FROM transactions
        GROUP BY CustomerID
    ) sub
);

To my mind, using a CTE makes it a lot more obvious what's going on and what the author of the SQL's intent is.

Test 3 – Recursive CTEs

Recursive CTEs are queries where one of the CTEs in a query refers to itself. Sharp-eyed readers may have noticed that it was possible to write our CTE example from above using a recursive CTE.


WITH CustomerAverages AS (
    SELECT CustomerID, AVG(Amount) AS AvgAmount
    FROM Transactions
    GROUP BY CustomerID
),
OverallAverage AS (
    SELECT AVG(AvgAmount) AS OverallAvg
    FROM CustomerAverages
)
SELECT a.CustomerID, a.AvgAmount
FROM CustomerAverages a, OverallAverage o
WHERE a.AvgAmount > o.OverallAvg;

Since all the information we needed for the second CTE: OverallAverage was ** already contained in the first CTE: CustomerAverage**s, we could use the data in the first CTE in our calculations for the second CTE.

NB: In most SQL dialects, you specify that you're using a recursive CTE by putting the reserved keyword recursive in between the with clause and the CTE name, like this,

with recursive cte(col1, col2 etc...) as my_cte
...
...

Oracle assumes a CTE is recursive if it refers to itself in its query so we don't need the special recursive keyword.

For our final example, we'll use a recursive CTE to fill in gaps in our original time-series data.

Referring back to our original table data, you'll notice that we are missing customer data entries for the 2nd & 3rd of January and the 6th & 7th of January.

Our task is to produce a report that shows the sum of the customer spend per day. Where there are no entries for a date, we return zero for the sum of the customer spend value for that date.

This is a good use case for recursive CTEs.

WITH DateRange (dt) AS (
    SELECT MIN(TRANSACTIONDATE) 
    FROM Transactions
    UNION ALL
    SELECT dt + INTERVAL '1' DAY 
    FROM DateRange WHERE dt < (SELECT MAX(TRANSACTIONDATE) FROM Transactions)
),
AggregatedData AS (
    SELECT TRANSACTIONDATE, SUM(AMOUNT) AS TOTAL_SPEND
    FROM Transactions
    GROUP BY TRANSACTIONDATE
)
SELECT dr.dt AS TRANSACTIONDATE,
       NVL(ad.TOTAL_AMOUNT, 0) AS TOTAL_SPEND
FROM DateRange dr
LEFT JOIN AggregatedData ad ON dr.dt = ad.TRANSACTIONDATE
ORDER BY dr.dt;

TRANSACTIONDATE  TOTAL_SPEND
===============  ===========
01-JAN-21        150
02-JAN-21        0
03-JAN-21        0
04-JAN-21        300
05-JAN-21        550
06-JAN-21        0
07-JAN-21        0
08-JAN-21        370
  • The DateRange CTE: This part creates a continuous date range from the earliest to the latest transaction date ensuring there are no date gaps.
  • The AggregatedData CTE: Sums up the transactions for each date.
  • The Final SELECT: Joins the date range with the aggregated transaction data, using an NVL function to replace NULL values with 0, ensuring days without transactions show a total of 0.

Summary

In conclusion, Common Table Expressions (CTEs) are a versatile and powerful feature in SQL that can enhance the readability, maintainability, and execution of complex queries. Through this article, I've explored the basic structure and functionality of CTEs, looking into both non-recursive and recursive types to demonstrate their utility in a variety of scenarios.

The three test cases I included showed not only how CTEs simplify the management of complex SQL operations by breaking them down into more manageable parts but also showcased their strength in handling recursive data.

I would also argue that the use of CTEs allows for cleaner and more organized SQL scripts, making it easier for developers and analysts to write, debug, and optimize their database queries.

If you're still unsure about CTEs, think about how you would solve the problem shown in my Test 3 example without using them. I think you'll find that traditional SQL approaches to a problem like this are more cumbersome and less efficient.

_OK, that's all for me just now. I hope you found this article useful. If you did, please check out my profile page at this link. From there, you can see my other published stories and subscribe to get notified when I post new content._

I know times are tough and wallets constrained, but if you got real value from this article, please consider buying me a wee dram.

If you liked this content, I think you'll find these articles interesting too.

SQL Explained: Grouping Sets, Rollup, and Cube

Gradio: Beyond the Interface

Tags: Common Table Expressions CTE Data Science Recursive Cte Sql

Comment