5 Redshift SQL Functions You Need to Know

Author:Murphy  |  View: 27473  |  Time: 2025-03-22 22:30:52

If you're a new Redshift user, you may find that the SQL syntax varies from the SQL you've written within other data warehouses.

Each data warehouse has its own flavor of SQL and Redshift is no exception.

At first, it can be frustrating to discover that your favorite functions do not exist. However, there are a lot of great Redshift functions that you can take advantage of in your code.

In this article, I will walk you through the most helpful Redshift functions I've discovered in my work. Each function includes a definition and code example of how to use it.

PIVOT

PIVOT is a function that's built into Redshift that allows you, well, to pivot your data. What do I mean by this? Pivoting allows you to reshape your data where the values in rows become columns or values in columns become rows.

PIVOT can help you:

  • count values in a column
  • aggregate row values
  • derive boolean fields based on column or row values

I recently used PIVOT in Redshift to find whether different pages were active or not for each user. To do this, I needed to PIVOT the page_typefield and use the user_id field to group the data.

I set a condition within the PIVOT function to COUNT(*) for each of the different page types, as each user could only have one of each type.

Keep in mind that if a user can have multiple of each page type then using COUNT to return a boolean will not work.

The code looked like this:

Sql">SELECT
  id, 
  has_homepage::boolean, 
  has_contacts_page::boolean, 
  has_about_page::boolean
FROM (SELECT id, page_type FROM user_pages WHERE is_active) 
PIVOT(COUNT(*) FOR page_type IN ('home' AS has_homepage, 'contact' AS has_contact_page, 'about' AS has_about_page))

Without the use of PIVOT, I would have had to create a separate CTE for each page_type and then JOIN all of these together in the final CTE. Using PIVOT made my code much more clear and concise.

LTRIM

LTRIM allows you to trim a specified string from the beginning of a text field. It allows you to bypass the use of CHARINDEX and LENGTH which you may otherwise have to use to trim strings.

A few weeks ago I took the time to create a custom dbt macro to trim strings from text fields. I knew the code would have to be used many times, in multiple use cases. At the time, I hadn't realized this function even existed!

Once I discovered LTRIM, I removed that macro and simply used this function in all the places I referenced that macro.

Let's say you have a data ingestion tool that adds the name of the tool to the beginning of each field you ingest. To trim that prefix, you would write this:

SELECT
  LTRIM('airbyte_', source_id) AS source_id,
  LTRIM('airbyte_', source_name) AS source_name 
FROM source_data

The values that were previously airbyte_32534058 and airbyte_hubspot will not be 32534058 and hubspot.

NVL2

NVL2 is a function unique to Redshift that helps evaluate NULL and NOT NULL values. It is similar to COALESCE with a slight difference.

NVL2 allows you to specify a column name, a value to return if the column's value is NOT NULL, and a value to return if the column's value is NULL.

COALESCE takes in a column name and two or more different values. It will evaluate each value until it finds one that is NOT NULL. NVL2 only takes in a column name and two values. If conditions are not properly met, a datatype will be returned.

For example, if you have multiple contact fields, you could use NVL2 to ensure at least one of the contact values is returned for a user.

SELECT 
  user_id, 
  NVL2(user_email, user_email, account_email) AS email_address 
FROM users

In this case, if a user does not have an email in the user_email column, NVL2 will check if they have a field in the account_email column that can be returned.

JSON_EXTRACT_PATH_TEXT

It's no secret that handling JSON objects in the data warehouse can be a pain. Luckily, JSON_EXTRACT_PATH_TEXT makes this fairly seamless to do in Redshift.

JSON_EXTRACT_PATH_TEXT allows you to extract the value of a specified key, with the option of returning NULL for records that may not have that key. This is a benefit compared to JSON_PARSE which will throw an error if a record does not have the key you specify.

JSON_EXTRACT_PATH_TEXT takes in three arguments- the field you are parsing, the name of the key whose value you wish to extract, and a boolean indicating if you want NULLs to be returned for invalid records.

For example, let's say you want to extract a user's email from a user_contact_info JSON object. To do this, you will need to specify the key that holds email values (email in this case). Then, to handle records that don't have this key, you will need to specify TRUE.

SELECT
  user_id,
  JSON_EXTRACT_PATH_TEXT(user_contact_info, email, TRUE) AS user_email 
FROM users

Now, for every email value that exists, that will be returned, and for every email that does not exist, NULL will be returned.

NULLIF

NULLIF allows you to compare two arguments and return NULL if they are equal. If they are not equal, the first argument is returned.

This function comes in handy for validating two queries where you expect values in two fields to be equal. It can also be good for situations where you only want a value returned if the fields differ.

For example, if we wanted to validate that the emails in the two fields mentioned earlier, user_email and account_email were the same, we could use this function to help us identify the users where they vary.

SELECT
  user_id, 
  user_email, 
  account_email
FROM users
WHERE NULLIF(user_email, account_email) IS NOT NULL

Here, using it in the where condition allows us to return both the user_email and account_email, so we can properly compare the differences.

Conclusion

Now that you know some of the most helpful SQL functions unique to Redshift, you can feel confident working in Redshift and writing the most efficient queries possible.

And remember, if you are coding something that you think may have a function already built out, search through the Redshift docs! They are quite easy to use and will allow you to discover a function that could save you coding time.

Happy querying!

Tags: Data Engineering Data Warehouse Function Redshift Sql

Comment