Roll Up Your Sleeves: 9 Data and Machine Learning Project Walkthroughs You Should Explore
Feeling inspired to write your first TDS post? We're always open to contributions from new authors.
Welcome to a special, hands-on, project-focused edition of The Variable! We know how important practical skills are to data and ML professionals these days. When job searches are as competitive as they are, there's extra value in demonstrating your ability to solve complex, real-world problems, use cutting-edge tools effectively, and draw meaningful insights even from the messiest datasets.
To help you expand your toolkit and inspire you to learn new topics and try out new workflows, we've gathered some of our most interesting recent tutorials. They zoom in on the nitty-gritty details of project execution, and, in many cases, come with lots of code snippets you can experiment with on your end. Let's get to it!
- Exploring LLMs for ICD Coding – Part 1Working at the intersection of ML and healthcare management, Anand Subramanian shows the potential of leveraging large language models to streamline the crucial process of clinical coding.
- How to Build Neural Networks for Node Classification In her latest post, Claudia Ng offers a thorough guide to building graph-based neural networks – starting out with nothing more than a CSV file and PyTorch Geometric.
- From Data to Dashboard: Visualizing the Ancient Maritime Silk Road with Dash Leaflet and SeaRoute librariesFor all geospatial data aficionados out there, Maria Mouschoutzi, PhD‘s debut TDS post explains how to tackle the challenge of visualizing maritime routes.

- Building an Observable arXiv RAG Chatbot with LangChain, Chainlit, and Literal AIIn a detailed walkthrough of a RAG-based project, Tahreem Rasul takes us through the steps it takes to create a semantic research-paper engine by stringing together a suite of powerful tools.
- Using LLMs to Learn From YouTubeApproaching retrieval-augmented generation from a different direction, Alok Suresh‘s guide explores ways to extract information from videos and use it for a better-performing question-answering chatbot.
- Unlocking Valuable Data and Model Insights with Python Packages Yellowbrick and PiML (with Code)In the mood for tinkering with some Python? Dr. Theophano Mitsa shares an accessible introduction to the Yellowbrick and PiML packages (and shows how you can use them to better understand model behavior).
- Building Transformer Models for Proteins From ScratchComputational biology is one of the fields that has seen the most innovation thanks to recent advances in AI. Case in point: Yuan Tian‘s fascinating work on building a basic protein transformer model to predict the antigen specificity of antibody sequences.
- Exploring Shiny for Python With A Puppy Traits Web ApplicationShiny for Python has opened up the previously R-focused library to a wider audience of data scientists. Deepsha Menghani‘s step-by-step tutorial will help you make the most of its app-building powers.
-
Recreating PyTorch from Scratch (with GPU Support and Automatic Differentiation) "What is happening internally during these operations? How does all of this work?" These are the questions that Lucas de Lima Nogueira asked himself when using PyTorch – so he attempted to recreate the library himself.
Ready to roll down your sleeves and ponder more theoretical questions for a while? We've got you covered.
- Sydney Nye‘s new guide to graph theory is a comprehensive resource for learners, covering its history, underlying math, and potential applications.
- How can physics principles open up the space for deeper insights into our data? Tim Lou, PhD‘s thought-provoking article points to fascinating interdisciplinary connections.
- "Is it better to engineer a feature to contain as much information as possible about a code system, or to find a way to let a model do the work?" Valerie Carey explores alternative treatments for hierarchical categoricals.
- Using an engaging, fishing-inspired example, Jarom Hulet offers a detailed explainer on multi-armed bandit problems (and how to solve them).
- To round out your reading this week, we recommend Elliott Stam‘s thoughtful reflection on data ROI: a helpful primer for teams and managers on how to avoid practices that lead to negative returns.
Thank you for supporting the work of our authors! We love publishing articles from new authors, so if you've recently written an interesting project walkthrough, tutorial, or theoretical reflection on any of our core topics, don't hesitate to share it with us.
Until the next Variable,
TDS Team