A Day in the Life of a Data Scientist

Author:Murphy | View: 26013 | Time: 2025-03-22 21:21:38

Despite Data Science being one of the fastest growing careers in the past few years, a lot of people in my day to day life still don't quite understand what I do (and by extension, what data scientists in general do).

I've even had people at my own workplace ask me multiple times, "What is it that you do again?"

For those who don't fully understand what a data scientist does (but want to), or those who are interested in getting into the career but not sure if it'd be the right fit, I want to provide some insight into what data scientists do on a day to day.

Check on my models

Usually one of the first things I do in the morning (besides check slack messages and emails) is take a look at my modeling dashboards.

Dashboards will typically show some window of historical data plotted along with the predicted values so you can see how the forecasts have been doing. Additionally, the forecast for the next future time period is shown.

I typically check for a few things:

How the model performed on its most recent forecast. This is a good initial sanity check for how the model is doing. If I see a MAPE of 3% for the latest forecast and the predicted values are tracking well with the actual values, it's definitely a good sign. Additionally, if the goal of the model is more specific, such as to predict highs or lows, I can spot check how well it did at this task over the last time period. Of course, you can always get lucky, so it's important to not just check the most recent predictions but the historical data as well.
How it has been performing over the past few time periods (few days, weeks, month, etc). In addition to the most basic chart showcasing predictions vs actuals over time, I also have charts showing the error metric averages over time (such as MAPE, RMSE, R2 and MAE). Examining these, I aim to detect any signs of model decay. If it looks like the performance is steadily decreasing over time, this could indicate an issue such as data or concept drift. This means I have to investigate further, get to the bottom of the issue, and potentially retrain my model with new data or new/improved features.

I've written more in depth about model decay and the different kinds of drift that can occur and cause a model's performance to decline:

What is Model Decay?

Any abnormal data (spikes, zeros, flat lines). In my current domain (energy) sometimes energy meter readings will randomly malfunction (causing nulls and/or zeros). Other times, there will be a read error where the read is incorrectly reported to much larger than it actually is. Obviously, these incorrect values cannot remain in the database, because I do not want the model to retrain with bad data. These issues can be automated to some extent (such as excluding or interpolating zeros and nulls when retraining the model), but other times, I have to manually correct them in the database. With spikes in particular, I'll either set them to null so they can be dropped, or I can sometimes reach out to another colleague who is able to import the actual/corrected data.

Meetings

Meetings are a big part of being a data scientist. As a technical expert, you will have to communicate and collaborate with people from many other domains and fields of expertise. Because the work I do is considered confusing and complex to people who didn't study it, an important part of my job is learning how to explain the projects I'm working on to others, especially those who know nothing about data science.

In my day to day there are 3 types of meetings I typically attend:

Team meetings. These are very general meetings where I meet with my entire team, which consists of data scientists, product managers, business intelligence and other analysts, and engineers. I present what I'm working on, get input from other team members with different kinds of domain knowledge (such as what variables might affect a certain energy type or building type's consumption), and give project updates.
Code review. This is a meeting I typically have with other data scientists only. We go way more in depth and even usually screen share to actually look at code. Here we discuss any changes we want to make to our models as well as the efficiency of the code and how it could be improved (run time, organization, readability). If there are any errors we are running into, we also debug or discuss possible solutions.
Project specific meetings. These meetings are usually a mix of data scientists, software engineers, project managers, and interested clients. They pertain to a specific project, for example, "24 Hour Energy Consumption Forecasting for Building 5". Here, we initially do a lot of project scoping: we discuss what the goal of the model is, what types of features to include in our model, what kind of data we are using, and how we will deliver the product to the client. In later meetings, I give project updates using more technical language as well as present relevant findings from my analyses of the data and initial model training. If we need to pivot on methodology, I explain why, how I will do it, and provide a timeline for when I'll complete the changes.

Code & model improvements

Now this is where I talk about the fun part – coding! Though coding is not all I do, it is a very important part of my job as a data scientist. Once I have scoped out a project and crafted a plan for how I will go about implementing and deploying the finished product, I start coding.

I code primarily in Python but I also use a good amount of SQL in order to query the data I need.

When I'm first developing a new model, I start out using Jupyter notebooks. Notebooks are a great way to test out different ways of cleaning/preprocessing data, training different models with different features, and more. Once I have my final model, I move over to using .py files and a more general project structure (which may include things like Dockerfiles, a main file, and modules).

I also spend a good amount of time improving old code. I will restructure it by adding new classes, new arguments, global variables, abstracting out duplicate code and removing arguments/variables I don't need.

Lastly, when I'm not working on a tight deadline for a specific project, I like to work on learning new libraries and models.

Recently, I learned about the NIXTLA library for time series forecasting and spent a good chunk of time learning how to use it, reading the documentation, and implementing my own models. It was a valuable time investment because it has saved me a ton of time on methods I'd previously implemented by hand (such as forecasting using lag columns).

Documentation

A good chunk of my job is documenting my work. I want to make it easier for new employees to be able to catch up to what I've been working on so that they can help me, give me suggestions and even take over parts of the code themselves.

Ah yes, the README file… every programmer is all too familiar with it. It's truly a project's backbone because it provides information on how to actually use all of the code you've written. Typically, the README file contains:

Background information on the project (what it does, why it was built)
An outline of the project's structure (A look inside the project: where all the folders and files are & their names)
Instructions for installing, importing, and/or setting up the project/code
Basic instructions for getting started using the code
Who owns/maintains the project and how to get help or more information

Docstrings and comments are just as important, if not more important, than the README file. They provide details at the level of each file, method, and line of code as to what is going on. I spend a lot of time ensuring that I have accurate docstrings and comments. If I make code changes, I also have to update docstrings and comments.

In addition to docstrings, comments, and a README file, I usually also have more documentation written in Google docs and slides. Here, I go more in depth in terms of the background and use case as well as what kind of model I'm using and why. I may also provide examples and images of charts here, as well as links to websites which can explain some of the ML concepts I'm describing for people who want to learn more.

Overall

Being **** a data scientist is fun because you get to solve problems every day. Unfortunately, this can also be the exhausting part of the job. You don't have a repetitive job by any means so it's never boring, but sometimes you just want to relax and do something that doesn't require so much mental power.

It's important to remember that each data scientist will have a different experience, especially across companies and industries. However, I do hope this post provided you with more insight as to what a data scientist does every day.

Tags: Career Advice Data Analysis Data Science Data Science Careers Machine Learning