Your First Year as a Data Scientist: A Survival Guide

Author:Murphy | View: 23117 | Time: 2025-03-22 21:44:37

In August I'll be coming up on my second year as a full time data scientist! (Technically my third year, but I was an intern my first year working in the field.)

I've learned a tremendous amount in the past few years. And although I did go to university for Data Science, which taught me a lot, there's only so much college can prepare you for when it comes to entering the workforce.

Nothing really beats just getting a ton of first hand experience.

Being able to make mistakes on my own, fix them, and repeating the same tasks over and over has allowed me to more fully develop and solidify my skillset.

The first year can be daunting, regardless of if you have a data science degree or not. The work environment is fast paced, at times stressful, and intimidating.

Luckily, there are steps you can take to make your first year easier and more enjoyable. In this article, I want to provide you with a few tips for surviving — and ultimately thriving — in your first year.

Find a mentor

This is probably one of the best pieces of advice I can offer to someone starting out. A mentor is someone who has more experience working full time in the industry (ideally at least a couple years more than you) and is friendly, supportive, and wants to help you improve.

A mentor could really be anyone, from a coworker to an old friend who graduated earlier than you to a family friend who's been in the field for a long time.

Ideally, though, your mentor is someone who also works in your organization. Although your manager can serve as a mentor in many ways, managers do often already have a lot on their plate, so getting a fellow coworker who has a few more years of experience than you to mentor you might be your best bet.

Mentors can:

Share their own projects / code with you to serve as a reference point (This is why having a mentor who works at your company is ideal. Someone who works for another company wouldn't be able to share their code with you, because it's confidential/proprietary information).
Review your code and find mistakes you wouldn't have noticed
Check in with you weekly, biweekly, or even monthly (depending on their availability) to make space for questions, concerns and provide feedback on your progress
Point you to specific libraries, models or other tools that have made their job easier that aren't extremely well known

If you find someone who you think would be a good mentor for you, you can reach out by email or LinkedIn, letting them know you are looking for senior people in the field to connect with. If things go well you can ask if they'd be interested in mentoring you.

Master the basics

As a beginner data scientist it's important that you focus primarily on learning and practicing the fundamentals of data science and how these apply to your job.

Many new data scientists are excited to get right to the "cool" stuff — deep learning, LLMs, sentiment analysis, etc. They want to try complex models and fancy ways of hyperparameter tuning.

You need to slow down.

The reality is that when programming, even experts will make mistakes. Bugs will appear and models will have strange outputs at times. And if you have put all of your energy into building the most complicated model you could find without understanding its building blocks and the fundamentals of the machine learning life cycle, how are you going to be able to debug that code?

It will be very difficult if not impossible.

Focus on mastering the basics, such as:

Data exploration. Can ** you** generate appropriate visuals and charts to fully explore the dataset at hand? Do you know how to examine the statistical properties of a dataframe? How to detect outliers? Are you able to draw useful inferences from the data exploration process which will aid you in building a good model?
Data cleaning. After exploring the data and identifying potential weak or important points, can you remove outliers and nulls? Should you interpolate those values instead (if so, what's the best way to do that)? Can you identify and handle bad data? Additionally, are you able to work well with different data type columns – for example, converting between them (eg a date time to a string and back, a date time to an int, an int to a float)?
Data preprocessing. Do you know how to generate new features from raw data (for example – creating time series columns like "hour" from a timestamp)? Are you able to encode categorical features (eg one-hot encoding, ordinal encoding, cyclical encoding) to feed proper inputs into your model? Can you scale your numerical values using MinMax or Standard Scaler (and do you know when it is necessary to do so)?
Feature selection. Given that you've explored your data properly and gotten valuable insights from your workflow, do you know what features you should start with? And once you have added these features, are you able to select the best features for your model and exclude those which are not helping and/or degrading the performance?
Selecting and training a simple model. Do you know when and why you should use a linear regression over a random forest model? An XGBoost model over a random forest? How much data do you need to collect for training? Are you able to test the model's performance using cross validation and a train/test split?

Yes, a lot of these basics are taught in school or in data science certifications, and you've probably done all of these steps in your personal projects before. But implementing a workflow in a corporate project which incorporates all of these steps is a very different experience. Domain expertise will come into play as well as the specifics of your company's data and the business case/goal of the model.

Therefore, it is crucial that you build your confidence in these basic areas before you attempt to complicate your workflow by adding on a bunch of fancy optimization techniques or building highly complex models.

Study your domain

I've talked a lot about domain knowledge. Each company will have its own domain in which it operates, whether that be energy, finance, sales, marketing, climate, healthcare, etc.

Depending on the industry you're working in, there will be differences in the way you:

Collect and preprocess data. Data **** will come in various formats/structures. Maybe you will have to get really good at reading SQL queries into Python, or at processing json files. Some kinds of data will be more prone to spikes/outliers than others. Sometimes you want to remove those outliers and other times you may want to keep them in to be able to predict them in the future.
Select features for models. More **** likely than not you will need to pull data from multiple data sources. You might be initially given training data that only contains a timestamp and value, and based on what kind of data it is, you'll need to decide what kind of features you'll need to train your model. These could be weather features, customer demographic data, other product sales data, and more. You need domain knowledge to figure out what kinds of features might have a significant impact on your target variable.
Choose which models to train. Some domains will primarily use classifier models (such as those who do things like spam detection or computer vision & image classification). Others will use regressors for time series forecasting. Within forecasting, depending on the data available and domain, it could be preferable to use more statistical models (such as ARIMA) or tree based models like Random Forest and XGBoost.

As data scientists we come in with a lot of technical knowledge but we don't always have the tools to apply it to individual problems in our field.

One of the best ways to garner more knowledge is by learning from others on your team or in your broader work community. More likely than not you will have people who are experts in all kinds of work, such as various kinds of engineers and analysts.

Try to set up a recurring meeting (could be as frequent as once a week to once a month) with one person or a group of people where you can learn more about your domain and ask questions.

For example, at my company, for a while I had a recurring meeting called "Energy Basics" where I learned about things like how energy meters work, what kinds of factors affect energy consumption in different building types and what affects energy usage for different energy types (electric, gas, water, steam, etc).

Talk to your fellow tech coworkers and see if they're also interested in meeting with domain experts. This can also make the experience more enjoyable and motivating.

I provided an example of domain knowledge at work in a previous article about thinking like a great data scientist.

3 Ways to Think Like a (Great) Data Scientist

Take online courses

Most companies will pay for you to take online courses and earn certifications.

Courses allow you to brush up on skills that you may have forgotten already from your school days as well as expand into other areas of data science/ML that your degree didn't cover.

I have taken courses on deep learning, statistics, and machine learning engineering, and they have been great supplemental resources for my job. Deep learning and statistics courses allowed me to review material I learned in college and do some test projects for extra practice. My machine learning engineering course equipped me with a lot of valuable knowledge that my degree didn't teach me (such as how to put models into production, how and when to retrain them, and monitor them).

Data science degrees can only provide so much scope. On top of that, each student will have to choose certain electives at some point and forego others. After you graduate and get your first job, it's important to continue to learn and try and fill in those gaps.

Ask your manager if this is something your company offers and how you can take advantage of it.

Ask for help

For those of us who went to university, asking for help on projects and assignments can feel foreign.

Get stuck on a problem at college? Too bad, the professor and TAs can only help you so much. They can't just give you the answer.

In the corporate environment that's not the case. Chances are someone else has already solved the problem you are facing. And though you can find great information online, nothing beats having someone to sit down with you and work through the problem with you. When it's a coworker, you have the added bonus of someone knowing the domain, business case, database and data format.

You might default to sitting at your computer for hours scratching your head trying to figure out how to resolve a certain bug. But it's important to remind yourself that there are many people around you who have been in your position before and would be happy to help out. And even if you can't get the complete answer, they can provide you with pieces to the puzzle.

Additionally, don't be afraid to ask coworkers if they can look over your work or examine the bug/issue to see if they can attack it from a different angle. Two heads are better than one.

Overall

Your first year will be full of mistakes, learning opportunities and milestones. The most important thing to remember is that you're not alone. If you noticed, 4/5 of the tips I gave involve support from others: mentorship, asking for help, taking courses, and learning domain knowledge from coworkers. These all tie into my other point — mastering the basics.

If you prioritize learning in your first year, ingrain the fundamentals of data science into your mind and muscle memory, and rely on those with more experience to guide you, you will build a phenomenal foundation for your future data science career.

Tags: Career Advice Data Science Data Science Careers Data Science Training Office Hours