Cooking with Snowflake

Author:Murphy  |  View: 26753  |  Time: 2025-03-23 18:35:34

Snowflake optimisation

The Snowflake community is rife with information dumps on how to optimize expensive queries. We know because we combed through a ton of them. What we present here are three tactical ways in which we've done this at Toplyne.

Toplyne's business involves extracting real-time insights from real-time data. This data is currently sourced from our customers' Product Analytics, CRM, and payments system.

CRM and payment data volumes are mostly manageable. A product will have a limited set of paying customers and marginally more who are tracked in a CRM. However, product analytics data is much higher in volume.

Toplyne's POC (proof-of-concept) and MVP (minimum viable product) were built on product analytics data. We knew right from the beginning we needed to use a Data Warehousing solution to handle the data. The solution had to pass two clear requirements:

  1. It should easily ingest a few 100 gigabytes of data.
  2. It should offer a simple yet concise API to interact with this data.

We compared 3 solutions: BigQuery, Redshift & Snowflake.

Post-exploration, Snowflake was a clear choice. The simple reason is its SQL-based interface. SQL meant there was no cold start latency for our engineering ops. None of the engineers at Toplyne came from a DWH background, still, we found ourselves up to speed very quickly.

The process of interacting with customers' product analytics data is simple as follows:

  1. The product analytics data lands into Snowflake via a connector. (There are a lot of over-the-counter as well as native connectors for the same).
  2. Login to Snowflake and use the in-built worksheets to write SQL.

    Tags: Airflow Etl Python Snowflake Sql

Comment