Unlock the Secret to Efficient Batch Prediction Pipelines Using Python, a Feature Store and GCS
THE FULL STACK 7-STEPS MLOPS FRAMEWORK

This tutorial represents lesson 3 out of a 7-lesson course that will walk you step-by-step through how to design, implement, and deploy an ML system using MLOps good practices. During the course, you will build a production-ready model to forecast energy consumption levels for the next 24 hours across multiple consumer types from Denmark.
By the end of this course, you will understand all the fundamentals of designing, coding and deploying an ML system using a batch-serving architecture.
This course targets mid/advanced Machine Learning engineers who want to level up their skills by building their own end-to-end projects.
Nowadays, certificates are everywhere. Building advanced end-to-end projects that you can later show off is the best way to get recognition as a professional engineer.
Table of Contents:
- Course Introduction
- Course Lessons
- Data Source
- Lesson 3: Batch Prediction Pipeline. Package Python Modules with Poetry.
- Lesson 3: Code
- Conclusion
- References
Course Introduction
At the end of this 7 lessons course, you will know how to:
- design a batch-serving architecture
- use Hopsworks as a feature store
- design a feature engineering pipeline that reads data from an API
- build a training pipeline with hyper-parameter tunning
- use W&B as an ML Platform to track your experiments, models, and metadata
- implement a batch prediction pipeline
- use Poetry to build your own Python packages
- deploy your own private PyPi server
- orchestrate everything with Airflow
- use the predictions to code a web app using FastAPI and Streamlit
- use Docker to containerize your code
- use Great Expectations to ensure data validation and integrity
- monitor the performance of the predictions over time
- deploy everything to GCP
- build a CI/CD pipeline using GitHub Actions
If that sounds like a lot, don't worry. After you cover this course, you will understand everything I said before. Most importantly, you will know WHY I used all these tools and how they work together as a system.
If you want to get the most out of this course, I suggest you access the GitHub repository containing all the lessons' code. This course is designed to read and replicate the code along the articles quickly.
By the end of the course, you will know how to implement the diagram below. Don't worry if something doesn't make sense to you. I will explain everything in detail.

By the end of Lesson 3, you will know how to implement and integrate the batch prediction pipeline and package all the Python modules using Poetry.
Course Lessons:
- Batch Serving. Feature Stores. Feature Engineering Pipelines.
- Training Pipelines. ML Platforms. Hyperparameter Tuning.
- Batch Prediction Pipeline. Package Python Modules with Poetry.
- Private PyPi Server. Orchestrate Everything with Airflow.
- Data Validation for Quality and Integrity using GE. Model Performance Continuous Monitoring.
- Consume and Visualize your Model's Predictions using FastAPI and Streamlit. Dockerize Everything.
- Deploy All the ML Components to GCP. Build a CI/CD Pipeline Using Github Actions.
- [Bonus] Behind the Scenes of an ‘Imperfect' ML Project – Lessons and Insights
If you want to grasp this lesson fully, we recommend you check out our previous lesson, which talks about designing a training pipeline that uses a feature store and an ML platform:
A Guide to Building Effective Training Pipelines for Maximum Results
Data Source
We used a free & open API that provides hourly energy consumption values for all the energy consumer types within Denmark [1].
They provide an intuitive interface where you can easily query and visualize the data. You can access the data here [1].
The data has 4 main attributes:
- Hour UTC: the UTC datetime when the data point was observed.
- Price Area: Denmark is divided into two price areas: DK1 and DK2 – divided by the Great Belt. DK1 is west of the Great Belt, and DK2 is east of the Great Belt.
- Consumer Type: The consumer type is the Industry Code DE35, owned and maintained by Danish Energy.
- Total Consumption: Total electricity consumption in kWh
Note: The observations have a lag of 15 days! But for our demo use case, that is not a problem, as we can simulate the same steps as it would in real-time.

The data points have an hourly resolution. For example: "2023–04–15 21:00Z", "2023–04–15 20:00Z", "2023–04–15 19:00Z", etc.
We will model the data as multiple time series. Each unique price area and consumer type tuple represents its unique time series.
Thus, we will build a model that independently forecasts the energy consumption for the next 24 hours for every time series.
Check out the video below to better understand what the data looks like