This Is How LLMs Break Down the Language
The science and art behind tokenization- 21552Murphy2025-03-22
One-Tailed Vs. Two-Tailed Tests
Choosing between one- and two-tailed hypotheses affects every stage of A/B testing. Learn why the hypothesis direction matters and explore the pros and cons of each approach.- 25085Murphy2025-03-22
LettuceDetect: A Hallucination Detection Framework for RAG Applications
How to capitalize on ModernBERT’s extended context window to build a token-level classifier for hallucination detection- 21321Murphy2025-03-22
How to Spot and Prevent Model Drift Before it Impacts Your Business
3 essential methods to track model drift you should know- 23654Murphy2025-03-22
Experiments Illustrated: How We Optimized Premium Listings on Our Nursing Job Board
Also, how georandomization can help clean up spillovers- 29933Murphy2025-03-22
Experiments Illustrated: How Random Assignment Saved Us $1M in Marketing Spend
Also, a casual intro to the multiple comparisons problem- 21088Murphy2025-03-22
Are You Still Using LoRA to Fine-Tune Your LLM?
A look at this year’s crop of LoRA alternatives- 25606Murphy2025-03-22
Linear Regression in Time Series: Sources of Spurious Regression
Why does the autocorrelation of the errors term matter?- 29030Murphy2025-03-22
From Fuzzy to Precise: How a Morphological Feature Extractor Enhances AI's Recognition Capabil
Mimicking human visual perception to truly understand objects- 26422Murphy2025-03-22
The Impact of GenAI and Its Implications for Data Scientists
What we can learn from Anthropic’s analysis of millions of Claude.ai chats- 28454Murphy2025-03-22
Mastering Hadoop, Part 3: Hadoop Ecosystem: Get the most out of your cluster
Exploring the Hadoop ecosystem — key tools to maximize your cluster’s potential- 21700Murphy2025-03-22
Mastering Hadoop, Part 2: Getting Hands-On — Setting Up and Scaling Hadoop
Understanding Hadoop’s core components before installation and scaling- 29174Murphy2025-03-22
Six Organizational Models for Data Science
Setting a team up for success or failure- 26190Murphy2025-03-22
Platform-Mesh, Hub and Spoke, and Centralised | 3 Types of data team
Why understanding team structure is critical for data and AI- 29749Murphy2025-03-22
Fourier Transform Applications in Literary Analysis
How mathematics and data analysis can offer a head start to analysing poetry, before even reading the words.- 24226Murphy2025-03-22
How to Make Your LLM More Accurate with RAG & Fine-Tuning
And when to use which one- 24442Murphy2025-03-22
Mastering the Poisson Distribution: Intuition and Foundations
Take a dive into the foundations and exemplifying use cases of the Poisson distribution- 23085Murphy2025-03-22
Anatomy of a Parquet File
Parquet from scratch: A Python deep dive into a raw parquet file- 22549Murphy2025-03-22
Heatmaps for Time Series
Visualizing trends and outliers with non-linear color scales- 24228Murphy2025-03-22
Algorithm Protection in the Context of Federated Learning
A pragmatic look into protecting algorithms and models deployed into real-world federated analysis and learning settings in healthcare.- 22171Murphy2025-03-22
Genius Cliques: Mapping out the Nobel Network
Combining Network Science, Data Visualization, and Wikipedia to uncover hidden connections between all the Nobel laureates.
Data Science Expertise Comes in Many Shapes and Forms
Our weekly selection of must-read Editors' Picks and original features
