Unlock the Secret to Efficient Batch Prediction Pipelines Using Python, a Feature Store and GCS

Author:Murphy | View: 23195 | Time: 2025-03-23 18:40:35

THE FULL STACK 7-STEPS MLOPS FRAMEWORK

This tutorial represents lesson 3 out of a 7-lesson course that will walk you step-by-step through how to design, implement, and deploy an ML system using MLOps good practices. During the course, you will build a production-ready model to forecast energy consumption levels for the next 24 hours across multiple consumer types from Denmark.

By the end of this course, you will understand all the fundamentals of designing, coding and deploying an ML system using a batch-serving architecture.

This course targets mid/advanced Machine Learning engineers who want to level up their skills by building their own end-to-end projects.

Nowadays, certificates are everywhere. Building advanced end-to-end projects that you can later show off is the best way to get recognition as a professional engineer.

Course Introduction

At the end of this 7 lessons course, you will know how to:

design a batch-serving architecture
use Hopsworks as a feature store
design a feature engineering pipeline that reads data from an API
build a training pipeline with hyper-parameter tunning
use W&B as an ML Platform to track your experiments, models, and metadata
implement a batch prediction pipeline
use Poetry to build your own Python packages
deploy your own private PyPi server
orchestrate everything with Airflow
use the predictions to code a web app using FastAPI and Streamlit
use Docker to containerize your code
use Great Expectations to ensure data validation and integrity
monitor the performance of the predictions over time
deploy everything to GCP
build a CI/CD pipeline using GitHub Actions

If that sounds like a lot, don't worry. After you cover this course, you will understand everything I said before. Most importantly, you will know WHY I used all these tools and how they work together as a system.

If you want to get the most out of this course, I suggest you access the GitHub repository containing all the lessons' code. This course is designed to read and replicate the code along the articles quickly.

By the end of the course, you will know how to implement the diagram below. Don't worry if something doesn't make sense to you. I will explain everything in detail.

Diagram of the architecture you will build during the course [Image by the Author].

By the end of Lesson 3, you will know how to implement and integrate the batch prediction pipeline and package all the Python modules using Poetry.

Course Lessons:

If you want to grasp this lesson fully, we recommend you check out our previous lesson, which talks about designing a training pipeline that uses a feature store and an ML platform:

A Guide to Building Effective Training Pipelines for Maximum Results

Data Source

We used a free & open API that provides hourly energy consumption values for all the energy consumer types within Denmark [1].

They provide an intuitive interface where you can easily query and visualize the data. You can access the data here [1].

The data has 4 main attributes:

Hour UTC: the UTC datetime when the data point was observed.
Price Area: Denmark is divided into two price areas: DK1 and DK2 – divided by the Great Belt. DK1 is west of the Great Belt, and DK2 is east of the Great Belt.
Consumer Type: The consumer type is the Industry Code DE35, owned and maintained by Danish Energy.
Total Consumption: Total electricity consumption in kWh

Note: The observations have a lag of 15 days! But for our demo use case, that is not a problem, as we can simulate the same steps as it would in real-time.