To Really Learn a New Topic, Take Your Time
In a culture that valorizes speed above (almost) all else—"move fast and break things"—it's a good idea to remind ourselves that sometimes, slowing down is the most effective shortcut. This is especially true when you're in the process of exploring complex ideas, something that data science and machine learning professionals need to do on a more or less daily basis.
We've just marked the longest day of the year here in the Northern Hemisphere, so we figured there's no better time to celebrate some of our longest—and best-executed—deep dives. Whether they tackle a thorny theoretical concept or walk us through a cutting-edge tool or workflow, they are the kind of articles that require us to pause, think, and digest—and reward us with new and long-lasting insights. Happy reading!
- Ethical and regulatory concerns around personal health information (PHI) makes dealing with medical images extremely complicated. Adrienne Kline shares a detailed overview of an open-source tool that provides a "robust protocol" around "de-identification, sequestering of relevant patient information, ROI [region of interest] identification, and file compression of medical images."
- How are our expectations structured, and what do they have to do with statistical reasoning? Sachin Date‘s latest work is a fascinating and patient guide to the theoretical underpinnings of the concept of expectation: it starts on a choppy crossing of the English Channel and ends with the math behind the quantum wave function.
- Reza Bagheri‘s articles tend to be one-stop resources that readers bookmark and turn to again and again. We suspect Reza's new deep dive, on autoencoders and their role in dimensionality reduction, will be no exception, as it covers both the essential theoretical elements and the PyTorch implementation of linear and non-linear autoencoders.
- For a beginner-friendly—but equally useful—introduction to working with PyTorch, Leonie Monigatti‘s tutorial on image classification is a great choice. It's comprehensive and clear, and while the examples used here are images of big cats, it's easily adaptable to more common real-world classification projects.
- The challenge of sorting through mountains of data and locating the right information quickly is a familiar one in every growing organization. Janna Lipenkova‘s primer on Text2SQL presents an innovative framework that combines the accessibility of large language models with the querying power of SQL.
- To end our selection on a particularly reflective note, we leave you with Andre Ye‘s piece on the fundamental structure and inner workings of algorithms. It zooms in on one of the most thought-provoking (and most important) questions people working with data should reflect on: what does "learning" mean for a machine?
Thank you for supporting our authors! If you enjoy the articles you read on TDS, consider becoming a Medium member – it unlocks our entire archive (and every other post on Medium, too).
Until the next Variable,
TDS Editors