How to Look at Common Machine Learning Tasks with Fresh Eyes
We'd never recommend changing robust, well-performing workflows just for the sake of change; "if it ain't broke, don't fix it" is a common folksy idiom for a reason: it's very often the correct approach.
Still, there's a sizeable gap between "very often" and "always," and our most frustrating days at work typically come about when our time-tested methods fail to produce our expected outcomes or perform poorly. This is where expanding our knowledge base really pays off: instead of getting stuck in the mental equivalent of a spinning wheel of death, we try something different, tinker with our process, and (sooner or later) move forward with a new solution.
In the spirit of embracing fresh perspectives, we've put together a lineup of excellent recent posts that offer an original spin on common Machine Learning workflows. They cover procedures like drift detection and model training and tasks ranging from image segmentation to named-entity recognition. Make room in your toolkit—you'll want to add these!
Before diving in, a quick update: if you're looking for other ways to stay up-to-date with our best recent articles beyond the Variable, we just launched several Medium lists to help you discover more great reads.
- Algorithmic recommender systems are everywhere, from e-commerce sites to streaming services, and their outputs can sometimes feel repetitive and obvious. As Christabelle Pabalan shows, there's no reason to settle for uninspired choices—in fact, injecting recommender systems with elements of novelty and serendipity can result in better user retention.
- Detecting drift in models trained on unstructured data, like embeddings used in LLM-powered apps, "is a fairly new topic, and there are no ‘best practice' methods," say Elena Samuylova and Olga Filippova. To help you pick the most effective approach, they ran several experiments and share clear recommendations based on their findings.
- Many data scientists and ML practitioners view the rapid rise of synthetic-data options for model training as a cause for celebration, but recognize that it comes with serious concerns around data quality and long-term performance. Vincent Vatter walks us through recent research from Microsoft that points to a productive path forward.
- Model calibration is a key step in many classification tasks, but calculating it in a way that optimizes accuracy can be tricky. Maja Pavlovic is here to help with a clear, practical tutorial on handling expected calibration error (ECE).
- If you reached a dead end in your recent image-segmentation project using convolutional neural networks, Dhruv Matani and Naresh offer an alternative: give a Vision Transformer-based model a try instead.
- As a data scientist at the NOS – the Dutch Public Broadcasting Foundation— Felix van Deelen has access to a rich corpus of news stories; Felix's debut TDS article explores the potential to use this textual data in named-entity recognition projects.
- There's no one-size-fits-all solution for detecting anomalies in your data, which makes it a good idea to familiarize yourself with a few options. Viyaleta Apgar introduces us to a beginner-friendly technique based on the Gaussian distribution, and shows how to implement it in the context of a multivariate model.
- To optimize your regression model more effectively, Erdogan Taskesen proposes adding a Bayesian flavor to the hyperparameter-tuning step of model training; the tutorial includes a full implementation that relies on the power of the HGBoost library.
Thank you for supporting our authors! If you enjoy the articles you read on TDS, consider becoming a Medium member – it unlocks our entire archive (and every other post on Medium, too).
Until the next Variable,
TDS Editors