4 Things All Analytics Engineers Must Learn in 2024

Author:Murphy  |  View: 22914  |  Time: 2025-03-22 23:07:46

I've dedicated 2024 to be the year of learning and investing in my growth. While this is important to do every year, sometimes we put this to the back burner and instead focus on progressing in our careers, or maybe our businesses.

Unfortunately, many people don't realize how important learning new things is to grow in their careers—especially a career in data. It is easy to fall behind in your knowledge, as the space constantly changes.

Analytics engineers need to carve out proper time each week to ensure they expand their knowledge on different elements of the data stack, key fundamentals of analytics engineering, and all the new tools on the market.

Use this list as a guide on what to focus on as an analytics engineer. Choose a different one of these topics to study each week or month, and watch your success in your role skyrocket.

How to enforce data contracts between analytics engineers and software engineers

Data contracts have been all of the rage for the last few years, and there's a good reason why. It's a problem that has existed between analytics engineers and software engineers forever!

Many of us have simply accepted it as the norm, rather than trying to change it, hence why it's taken so long for any progress to be made.

If you aren't familiar with data contracts, these are the agreements and expectations set between data producers and data consumers on what the data will look like.

It's common for data producers to change a data type or field name without letting the downstream consumer (in our case, analytics engineers) know. Then, because one of our data models depends on the changed field, something downstream breaks.

The data team is then forced to go into fire-fighting mode, changing that field in the staging layer of its data models, ensuring its dependencies downstream can use this modified field.

While one field may not seem like a big deal, if changes are constantly being made by producers without consumers being on board, consumers will need to spend all of their time putting out these fires, pulling them away from key work. Not to mention the changed fields that go unnoticed or the ones that cause data downtime for the business.

How can you learn to enforce these contracts? Well, tools are just now coming to market in an attempt to do this. dbt recently launched a contracts feature that allows you to set "guarantees" on your data models.

When a contract is enforced, all of the fields' datatypes are declared, allowing dbt to track any changes. If one of these guaranteed datatypes has changed when the model goes to build, dbt will stop the model from building.

In addition to declaring datatypes, you can also specify not_null, primary_key, foreign_key, and unique constraints using dbt contracts. Just be sure to check if your chosen data warehouse supports these, as all constraints are not supported by all warehouses.

How to model your data to be used by data analysts

Data Modeling has been around for over 20 years. It has only recently changed with the invention of cloud data warehouses and data transformation tools like dbt.

While many analytics engineers learn about dim and fact models from dbt documentation itself, I have found it helpful to seek out other data modeling resources to deepen my understanding.

Dimensional data modeling was first written about in 1996 by Ralph Kimball in his book The Data Warehouse Toolkit. This is a great place to start with understanding more about dim and fact models in a context outside of Dbt.

With learning about dimensional modeling also comes the need to learn about dimensional modeling concepts like star schema and snowflake schema. These discuss dim and fact models and how they relate to one another in the overall architecture of your data warehouse.

I've personally been taking a data engineering course with Zach Wilson which has covered data modeling in-depth during the first week. However, this covers data modeling from the perspective of a data engineer who has to process large amounts of data rather than from an analytics perspective.

How to set naming conventions and standards across the entire data stack

Setting naming conventions and standards across a data stack seems like a fairly simple thing to do, but boy could that not be further from the truth. Analytics engineers like to build. They don't necessarily like to document and standardize processes. However, this is exactly what will allow your data stack to scale.

When you don't set proper standards for your code, data models, event tracking, reverse ETL jobs, etc., your data environment becomes the wild wild west. Nobody can figure things out on their own and needs to constantly seek out the person who defined those things to understand them.

This creates a huge bottleneck in the analytics workflow. Even worse, if someone who wrote 80% of your models or pipelines leaves the company, you are screwed!

Standards and naming conventions come down to a processes problem, rather than a technical problem. As analytics engineers, processes are just as important as the technical problems for the reasons I've just mentioned.

A few things that have greatly helped me:

  • Defining a dbt style guide and having this live in your GitHub repo
  • Documenting all my Airbyte data syncs and reverse ETL jobs in Notion and assigning them owners
  • Creating a GitHub branch naming convention
  • Following best practices when setting up a new tool

We are currently dealing with a problem of poorly named events in Segment. Because we never followed best practices, or set naming standards, with have hundreds of messy events, many of them repeated, cluttering our data warehouse. We now have to take the time to merge all of these datasets, repoint event names, and essentially start from scratch in setting it up the right way.

Learn proper standardization and naming convention practices now so you don't run into this same problem! I recommend looking at the documentation provided by each tool you use in the Modern Data Stack and following their recommendations. If something doesn't fit your business, redefine it in a way that does.

How to use Git

Last but not least, every analytics engineer needs to learn how to use Git. While I know enough Git commands to create a new branch, add my changes, commit them, and push them, I am lacking in how to use Git to reduce and solve merge conflicts. For some reason, something just always goes wrong.

You don't realize how much time you waste trying to solve issues like merge conflicts via the command line when you don't understand what's happening under the hood. This is why I've made it a goal for myself to master merge conflicts via Git.

And, if you aren't using Git at all, especially with dbt, I highly recommend you learn. This will allow you to effectively work via the command line, speeding up code changes.

Using a version-control platform like GitHub is essential for maintaining high-quality code in production. It acts as a checks and balances, allowing you to track all the changes that have been made. This way, if something goes wrong, you can revert the change or quickly pinpoint the fix.

It also allows you to take advantage of features like pull requests which allow someone else to review your code before you merge it to production. This is a best practice all analytics teams should be following!

For more on Git and how to use it to improve your analytics workflows, check out this article.

What's next for 2024?

Now that you know the different things you should be learning as an analytics engineer this year, I want you to choose a different topic every week or month (depending on how much time you have). Dedicate that time to researching how you can improve.

I recommend searching different articles on Medium, looking at books, and checking out data newsletters to find the information you will need to be successful.

Remember, learning is a constant journey. We can't expect to improve at anything if we don't take the time out of our days to deepen our knowledge and skillset. Seeking answers to the problems you're facing will ultimately lead to great success as an analytics engineer.

For more on improving your skills as an analytics engineer, check out my weekly newsletter where I share tutorials, best practices, and data strategy.

Tags: Analytics Analytics Engineering Data Modeling Dbt Modern Data Stack

Comment