Anatomy of Windows Functions

Author:Murphy  |  View: 22430  |  Time: 2025-03-22 21:17:20
Photo by Marcus Woodbridge on Unsplash

Introduction

The IT area is known for its constant changes, with new tools, new frameworks, new cloud providers, and new LLMs being created every day. However, even in this busy world, some principles, paradigms, and tools seem to challenge the status quo of ‘nothing is forever'. And, in the data area, there is no example of this as imposing as the SQL language.

Since its creation back in the 80s, it passed the age of Data Warehouses, materialized itself in the Hadoop/Data-lake/Big Data as Hive, and is still alive today as one of the Spark APIs. The world changed a lot but SQL remained not only alive but very important and present.

But SQL is like chess, easy to understand the basic rules but hard to master! It is a language with many possibilities, many ways to solve the same problem, many functions and keywords, and, unfortunately, many underrated functionalities that, if better known, could help us a lot when building queries.

Because of this, in this post, I want to talk about one of the not-so-famous SQL features that I found extremely useful when building my daily queries: Window Functions.

What is a Window Function

The traditional and most famous SGBDs (PostgreSQL, MySQL, and Oracle) are based on relational algebra concepts. In it, the lines are called tuples, and, the tables, are relations. A relation is a set (in the mathematical sense) of tuples, i.e. there is no ordering or connection between them. Because of that, there is no default ordering of lines in a table, and the calculus performed on one line does not impact and it is not impacted by the results of another. Even clauses like ORDER BY, only order tables, and it is not possible to make calculus in a line based on the values of other lines.

Simply put, window functions fix this, extending the SQL functionalities, and allowing us to perform calculations in one row based on the values ​​of other lines.

Basic cases for understanding/ The Anatomy

1-Aggregating Without Aggregation

The most trivial example to understand Windows functions is the ability to ‘aggregate without aggregation‘.

When we made an aggregation with traditional GROUP BY, the whole table is condensed into a second table, where each line represents a group's element. With Windows Functions, instead of condensing the lines, it's possible to create a new column in the same table containing the aggregation results.

For example, if you need to add up all the expenses in your expense table, traditionally you would do:

SELECT SUM(value) AS total FROM myTable

With Windows functions, you would make something like that:

SELECT *, SUM(value) OVER() FROM myTable
-- Note that the window function is defined at column-level
-- in the query

The image below shows the results:

Image 1. Traditional Group By vs Windows Functions.

Rather than creating a new table, it will return the aggregation's value in a new column. Note that the value is the same, but the table was not ‘summarized‘, the original lines were maintained – we just calculated an aggregation without aggregating the table

Tags: Data Engineering Data Science Database Getting Started Sql

Comment