How to Build Good Habits as a Data Scientist
It's almost March, which means that most people who made a solemn New Year's resolution a mere two months ago have likely already abandoned it. (That's what both Google and ChatGPT tell us, anyway.)
Is it a reason to despair? On the contrary—in the absence of calendar-based (read: externally imposed and arbitrary) expectations to set goals for ourselves, now's the perfect time to reflect on the areas where we actually want to grow. We can then take small, concrete steps to make steady progress.
To help you on this self-guided journey, we're highlighting four excellent articles that approach the topic of strong habits in a wide range of Data Science and machine learning contexts. What they have in common is a commitment to sustainable, gradual improvement over quick, short-lived fixes.
- Making your data consistently reliable is a process. Barr Moses (and coauthor Will Robins) have observed hundreds of data teams struggle to maintain data-quality standards. They've distilled the lessons they've learned into a six-step, long-term sequence ("a data reliability marathon," if you will)—one that can inspire change among individual practitioners as well as entire organizations.
- Leadership is in the details. Having worked under both excellent and less-effective managers, and after leading a few ML projects himself, Aliaksei Mikhailiuk has gained a deep understanding of the principles that make success possible. He shares insights about communication, infrastructure, and documentation, and the practical steps you can take to improve in each of these areas.
- How to run a tight data science ship. Whether you're a solo consultant or part of a large, resource-rich data team, you can—and should—refuse to let chaos dominate your workflows. Alexandre Rosseto Lemos recently published a practical guide for anyone who'd like to grow their organizational skills; it's full of advice tailored specifically to the needs of data professionals.
- Keep track of your own progress and accomplishments. **** Whether your performance review is coming up or you're looking for a new position, having a strong record of your accomplishments is key. Semi Koen‘s latest post emphasizes the importance of taking detailed, real-time notes of your work highlights, and of getting into the habit of advocating for yourself based on robust self-assessment.
If you've made it this far, you're clearly in possession of a strong reading habit; why not continue to nurture it with a few more recent standouts?
- How should you choose the right similarity measure for your recommendation algorithm? Jiahui Wang‘s concise introduction covers four essential types.
- For a beginner-friendly primer on the bias-variance trade-off, head right over to Cassie Kozyrkov‘s breezy three-part series on the topic.
- Comparing the performance of models is a crucial part of any machine learning project; that's why, according to Joris Guerin, it's so important to design robust ML experiments.
- Diving deep into the math underpinning reinforcement learning, Shailey Dash explains stochastic theory in the context of the Markov Decision model (MDP).
- Jorge Martín Lasaosa shared a comprehensive guide to the XGBoost algorithm, taking us from the original paper to a full Python implementation.
We hope you consider becoming a Medium member this week – it's the most direct and effective way to support the work we publish.
Until the next Variable,
TDS Editors