Optimize Data Warehouse Storage with Views and Tables

Author:Murphy  |  View: 29345  |  Time: 2025-03-23 19:12:42

With the rise of modern data stacks, many companies are moving their databases from on-prem to the cloud. They are beginning to utilize Data Warehouse tools like Snowflake, Redshift, and DuckDB in order to take advantage of all of the benefits of the cloud.

While these data warehouses typically help smaller companies save money, compute costs on the cloud can easily rack up. It is essential that you optimize your warehouse for storage and computing costs. This means you need to understand the best way to store your data so that it can be utilized by data teams in a cost-effective manner.

In this article, we will discuss the difference between views and tables, the different types of views that exist in data warehouses, and the use cases for each of them. By the end of this article, you should be able to identify the best option for storing your different datasets while saving on costs.

What is a view?

A view is a defined query that sits on top of a table. Unlike a table, it doesn't store the actual data. It always contains the latest data because it reruns every time it is queried. Whereas a table is only as fresh as the last time it was created or updated, no matter when you query it.

There are two main types of views- non-materialized and materialized views.

Non-materialized Views

Non-materialized views are what people typically think of when they think about a view. This type only runs when the view is actually queried, otherwise, it is not stored in the database.

Non-materialized views are great because they take up no storage space, which means you don't have to worry about paying for a lot of storage. They also only run when they are needed, saving you money in computing resources. This means, if a source table isn't needed for months or weeks at a time, you won't have to pay to maintain it. You only pay for it once the analyst or analytics engineer resumes working with that table.

The best part? Non-materialized views still have all of the same functions as a table! You can perform joins, aggregations, and window functions on them if need be.

Unfortunately, just like with everything, there is always a con that comes with all of the pros. Non-materialized views are not ideal for large amounts of data with complex logic since this logic is run every time the view is queried.

For example, I typically create all of my source data tables as non-materialized views that reference my raw data. These are simple SELECT statements that contain basic functions such as column renaming, casting, and data cleaning. Because their underlying logic is simple, they run fast whenever I query these source tables.

If I were to create complex data models containing joins and window functions as views, chances are my views would never load when I queried them. Or they would just take an extremely long time! Obviously, this isn't ideal. You would end up using way more computing power to run this query on a view than you would by creating that view as a table instead.

Remember: Non-materialized views are great to utilize, but only when the logic creating them is a simple SELECT statement.

Materialized Views

Materialized views are the less common view out of the two we discuss. Materialized views behave more like a table. They are faster to query and considered more accessible than non-materialized views. And, just like a table, they take up more storage space in your data warehouse and require more computing resources. This in turn means they are the more expensive option out of the two types of views.

It's not often that you will want to utilize them. In fact, I've never come across a use case where it made sense to use them. According to Snowflake's documentation, you should only use materialized views if ALL of the following are true:

  • The results of the view are used frequently
  • The query powering the view uses a lot of resources
  • The view changes frequently

It's very rare for all three of these to be the case with your base/staging, intermediate, and core dbt models. Base/staging models don't consume a lot of resources and intermediate and core data models don't change frequently. Of course, there are always exceptions to this, but I have yet to experience a scenario when this is true.

How these views should be used in data modeling

If you are an analytics engineer, then you may be wondering how un-materialized and materialized views can be used in data modeling. Let's look at dbt base (or staging) models as well as core models.

‍dbt Base Models

dbt base models exist as views on top of your raw data. They are created as un-materialized views in order to keep the integrity of the raw data while utilizing proper naming conventions and company standards. The code in these models is basic SQL select statements that read directly from the raw data ingested into your warehouse via ELT from ingestion tools like Airbyte. A typical base model looks like this:

select
  ad_id AS facebook_ad_id,
  account_id,
  ad_name AS ad_name_1,
  adset_name,
  month(date) AS month_created_at,
  date::timestamp_ntz AS created_at,
  spend
from {{ source('facebook', 'basic_ad')}}

‍If you look at the underlying logic of this file in dbt, it actually compiles in Snowflake (my data warehouse of choice)to look like this:

create or replace view data_mart_dev.base.base_facebook_ads 
  as (

    select
      ad_id AS facebook_ad_id,
      account_id,
      ad_name AS ad_name_1,
      adset_name,
      month(date) AS month_created_at,
      date::timestamp_ntz AS created_at,
      spend
    from raw.facebook.basic_ad

    );

Because you are only using basic date functions and renaming columns, the views are still fast to query on demand. This in turn saves storage space that you would otherwise use to save an almost identical copy of the raw data.

dbt Core Models

Your core models in dbt are more complex than your base models and often contain multiple CTEs, joins, and window functions. While you may have a specific use case to create these as materialized views, you will most likely create these as a table in your data warehouse. Tables are ideal for handling complex transformations that will take a long time to run if stored as a view.

Here is a code example of one of my core data models:

with

  fb_spend_unioned AS (

    select created_at, spend, 'company_1' AS source from {{   ref('base_fb_ads_company1')}}
    UNION ALL
    select created_at, spend, 'company_2' AS source from {{ ref('base_fb_ads_company2')}}

  ),

  fb_spend_summed AS (

    select
      month(created_at) AS spend_month,
      year(created_at) AS spend_year,
      created_at AS spend_date,
      sum(spend) AS spend
    from fb_spend_unioned 
    where spend != 0
    group by
      created_at,
      month(created_at),
      year(created_at)

  )

  select * from fb_spend_summed

When compiled in Snowflake as SQL, the code will look like this:

create or replace table data_mart_dev.core.fb_spend_summed

  as (

    with 

    fb_spend_unioned AS (

    select created_at, spend, 'company_1' AS source from {{   ref('base_fb_ads_company1')}}
    UNION ALL
    select created_at, spend, 'company_2' AS source from {{ ref('base_fb_ads_company2')}}

),

  fb_spend_summed AS (

    select
      month(created_at) AS spend_month,
      year(created_at) AS spend_year,
      created_at AS spend_date,
      sum(spend) AS spend
    from fb_spend_unioned 
    where spend != 0
    group by
      created_at,
      month(created_at),
      year(created_at)

  )

  select * from fb_spend_summed

  );

Notice that this is being created as a table within Snowflake rather than a view. This is ideal for any data that will be directly used in a BI tool, which most core data models are. They can be easily queried on demand without the underlying logic needing to be run. This ensures fast dashboards that stakeholders can trust.

Conclusion

Views and tables exist for different reasons in your data warehouse. Views do not store the actual data and can be used as a tool to save money with simple queries that sit on top of other tables. Tables should be utilized to store data generated by more complex logic, ensuring performance and availability are always high.

When used correctly, non-materialized views are a great tool for saving money within Snowflake without sacrificing performance. I highly recommend using them for your base models within dbt in order to create high-quality data that follows all of the company standards you've put in place. And, don't forget to use tables with your core dbt models. The performance increase is worth the higher cost!

For more on analytics engineering, subscribe to my free weekly newsletter where I share learning resources, tutorials, best practices, and more.

Check out my first ebook, The ABCs of Analytics Engineering, an all-in-one guide to getting started in the role of analytics engineering.

Tags: Cloud Computing Data Analytics Data Engineering Data Storage Data Warehouse

Comment