HyperLogLog implemented using SQL
We look at an implementation of the HyperLogLog cardinality estimation algorithm written entirely in declarative SQL- 27695Murphy2025-03-23
K-means Clustering: An Introductory Guide and Practical Application
Using clustering algorithms such as K-means is one of the most popular starting points for machine learning. K-means clustering is an...- 26414Murphy2025-03-23
The 4 Small but Powerful Ways to Improve Your Data Skills This Year
Level up Your Data Game by Mastering These 4 Skills- 28121Murphy2025-03-23
Benchmarking Machine Learning Models with Cross-Validation and Matplotlib in Python
Learn how to create an object-oriented approach to compare and evaluate the performance of machine learning models using cross-validation...- 26449Murphy2025-03-23
The smart, flexible way to run code on Kubernetes
When I was a beginner using Kubernetes, my main concern was getting code to run on the cluster. Thrown into a new world, I saw all these...- 24484Murphy2025-03-23
How To Forecast With Moving Average Models
Tutorial and theory on how to carry out forecasts with moving average models for time series analysis- 28548Murphy2025-03-23
Introducing PyCircular: A Python Library for Circular Data Analysis
Circular data can present unique challenges when it comes to analysis and modeling- 28104Murphy2025-03-23
Can Reinforcement Learning Generalize Beyond Its Training?
A Case Study in Model Generalization- 25628Murphy2025-03-23
Variance Reduction with Importance Sampling
Mathematical explanation and Python implementation- 20294Murphy2025-03-23
Convolutional vs Feedforward Autoencoders for Image Denoising
Cleaning corrupted images using convolutional and feedforward autoencoders- 21098Murphy2025-03-23
Do Transformers Lose to Linear Models?
Long-Term Forecasting using Transformers may not be the way to go- 22236Murphy2025-03-23
TDSP: When Agile Meets Data Science
A practical guide to applying agile principles to data science projects- 27768Murphy2025-03-23
How You Can (and Why You Should) Access Amazon S3 Resources with Python
Use automation to move data to and from the cloud- 26713Murphy2025-03-23
ONNX: The Standard for Interoperable Deep Learning Models
Learn about the benefits of using the ONNX standard for deploying models across frameworks and hardware platforms- 21108Murphy2025-03-23
Handling Slowly Changing Dimensions (SCD) using Delta Tables
Handling the challenge of slowly changing dimensions using the Delta Framework- 20407Murphy2025-03-23
Simulating the Card Game ‘War'
A coding story about a simple game with an infinite twist- 23565Murphy2025-03-23
On the Importance of Compressing Big Data
Why and How to Minimize your Data Storage Footprint- 28759Murphy2025-03-23
Anomaly Detection using Sigma Rules (Part 1): Leveraging Spark SQL Streaming
Sigma rules are used to detect anomalies in cyber security logs. We use Spark structured streaming to evaluate Sigma rules at scale.- 24846Murphy2025-03-23
Understanding Noisy Data and Uncertainty in Machine Learning
The actual reason your machine learning model isn't working- 27382Murphy2025-03-23
Overcoming the Limitations of Large Language Models
How to enhance LLMs with human-like cognitive skills- 20572Murphy2025-03-23
Genius Cliques: Mapping out the Nobel Network
Combining Network Science, Data Visualization, and Wikipedia to uncover hidden connections between all the Nobel laureates.Data Science Expertise Comes in Many Shapes and Forms
Our weekly selection of must-read Editors' Picks and original features