Introduction to Multi-Stage Image Build for Python
I never paid too much attention to the size of my images until I started deploying my code into Github Actions using containers. The math here is simple: the bigger the size of your container, the longer the load time is, and therefore, the higher your costs are. The moment my Python image size reached to 5Gb (thanks, PyTorch!), I started to explore more efficient image-build approaches.
TLDR – I was able to reduce the size of my baseline image by 65% using the multi-stage build approach.
In this post, we will review three build approaches and see how, with a few simple steps, we can reduce the size of our Python image. We will start with our baseline build using the official Python image – python:3.1o
, and then explore the image slim version, and finish by introducing a more advanced approach – the multi-stage build.

Prerequisites
To follow along with this tutorial, you will need the following settings:
- Docker Desktop (or equivalent) if you are using a macOS or Windows OS machine, or Docker installed if you are using a Linux OS
- Docker Hub account to pull the image from
Throughout this tutorial, we will use different flavors of the official Python image – python:3.1o
.
Scope
Before getting started, let's define the settings of the dockerized Python environment. This will enable us to make an apples-to-apples comparison and benchmark the different build sizes. We will use Python 3.10 and set the environment using the below helper files:
requirements.txt
wheel
pandas
numpy
matplotlib
requests
plotly
install_requirements.sh
#!/usr/bin/env bash
VENV_NAME=$1
python3 -m venv /opt/$VENV_NAME
&& export PATH=/opt/$VENV_NAME/bin:$PATH
&& echo "source /opt/$VENV_NAME/bin/activate" >> ~/.bashrc
apt-get update && apt-get install -y --no-install-recommends
gcc
python3-dev
build-essential
&& rm -rf /var/lib/apt/lists/*
source /opt/$VENV_NAME/bin/activate
pip install --upgrade pip
pip3 install --no-cache-dir -r ./requirements/requirements.txt
Where the requirements.txt
file defines the Python environment required libraries. The install_requirements.sh
bash script sets the virtual environment and installs the libraries from the requirements.txt
file.
Last but not least, we will use the below bash script to build the images:
build_docker.sh
#!/bin/bash
echo "Build the docker"
dockerfile_name=$1
image_name=$2
docker build . -f $1
--progress=plain
--build-arg VENV_NAME="python-poc"
-t $2
This script uses two arguments (i.e., $1
and $2
in the script) that represent the following:
- The build Dockerfile name
- The image name
So, for example, if we want to build an image using a Dockerfile named as Dockerfile.base
and set the image as rkrispin/python-base:3.10
, we will use the following command on the CLI:
>bash build_docker.sh Dockerfile.base rkrispin/python-base:3.10
All the code in this post is available on the below repository:
GitHub – RamiKrispin/python-image-optimization: Approaches for Python Image Size Optimizations …
Baseline Python Image Build
We will use the default Python 3.10 image – python:3.10
to set our baseline build. Before we start to build the baseline image, let's pull the python:3.10
image and review its size:
>docker pull python:3.10 ok
3.10: Pulling from library/python
1e92f3a395ff: Pull complete
374850c6db17: Pull complete
421c44fab18b: Pull complete
b9717a38adec: Pull complete
51795e508cf7: Pull complete
4915706e8f81: Pull complete
0b4c880267da: Pull complete
21a087e2a4d2: Pull complete
Digest: sha256:f68383667ffe53e85cc0fe4f5a604d303dfa364f238ac37a4675980a2b93b1c5
Status: Downloaded newer image for python:3.10
docker.io/library/python:3.10
Using thedocker images
command, you can see that the default Python image size is 1 GB:
>docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
python 3.10 ae8368ce557d 3 weeks ago 1.01GB
As we are going to build the baseline image on top of the python:3.10
image, the final image size will be greater than 1 GB. To build the baseline image, we will use the following Dockerfile:
Dockerfile.base
FROM python:3.10
ARG VENV_NAME="my_project"
ENV VENV_NAME=$VENV_NAME
RUN mkdir requirements
COPY install_requirements.sh requirements.txt requirements/
RUN bash ./requirements/install_requirements.sh $VENV_NAME
This simple build includes the following steps:
- Import the
python:3.10
image as the base image - Use an argument variable to set the virtual environment name
- Create a local folder and copy the helper files –
install_requirements.sh
andrequirements.txt
- Set the virtual environment and install the required libraries
We will use the build_docker.sh
helper bash script to build the image calling the Dockerfile.base
and naming the image as rkrispin/python-base:3.10
:
>bash build_docker.sh Dockerfile.base rkrispin/python-base:3.10
Let's now use again the docker images
command to review the size of the new image and compare it with the size of the base image:
>docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
rkrispin/python-base 3.10 abdde42f41ed 17 seconds ago 1.38GB
python 3.10 ae8368ce557d 3 weeks ago 1.01GB
As you can see from the docker images
output, the size of the baseline image is 1.38 GB. This means that the setting of the virtual environment and installing the required libraries added an additional ~380Mb to the image size. Potentially, the reduction in the size of the image could be accomplished by reducing the size of one of the following:
- Base image (e.g.,
python:3.10
) - The virtual environment
Generally, there is more room for improvement in reducing the size of the base image than in the virtual environment. We will address both in the coming sections.
Using A Slim Image
In this section, we will address the size of the base image. A quick win would be to use the slim version of the base image – python:3.10-slim
. The slim image is a minimalist version of the python:3.10
image. The size of the image is 154 Mb, which is about 15% of the size of the original Python image.
Let's rebuild the baseline image, this time using the python:3.10-slim
image as the base image:
Dockerfile.slim
FROM python:3.10-slim
ARG VENV_NAME="my_project"
ENV VENV_NAME=$VENV_NAME
RUN mkdir requirements
COPY install_requirements.sh requirements.txt requirements/
RUN bash ./requirements/install_requirements.sh $VENV_NAME
We will use again the build_docker.sh
base script to build the image calling the Dockerfile.slim
file and naming the image as rkrispin/python-slim:3.10
:
>bash build_docker.sh Dockerfile.slim rkrispin/python-slim:3.10
We can now go ahead and check the new image size using the docker images
command:
>docker images ok
REPOSITORY TAG IMAGE ID CREATED SIZE
rkrispin/python-slim 3.10 67443164153f About a minute ago 829MB
rkrispin/python-base 3.10 abdde42f41ed 2 hours ago 1.38GB
python 3.10 ae8368ce557d 3 weeks ago 1.01GB
python 3.10-slim d9d4dd71d4ee 3 weeks ago 154MB
As you can see, simply by using the slim image, we reduced the image size by 40% from 1.38 GB to 829 Mb.
Note: You can notice that setting the virtual environment and installing the required librarieson on top of the slim image added 675 Mb to the image size (e.g., from 154 Mb to 829 Mb). The main reason for this difference is related to the missing dependencies in the slim image that were installed during the setting of the virtual environment. One thing to take into account is that using minimalist image as a baseline in some case could turn out as double sword problem. If some the environment dependencies are not available on the slim image, you may ended up spending more time on identify the missing dependies and install them.
In the next section, we review a more advanced build method using the multi-stage approach.
Multi-Stage Builds Approach
Before getting started, let's pause and explain what is a multi-stage build. The philosophy behind the multi-stage build is fairly simple – building binary applications requires some tools and dependencies that are no longer needed once the build is done. In a single-stage build (e.g. regular build approach), those dependencies are left over and take up unnecessary space. A multi-stage solve this issue by using the following steps:
- Create a builder image to compile the binary applications. This image includes all the tools and dependencies required to build the binary applications
- Create a second image that simply copies the built binary applications from the builder image
The figure below illustrates the multi-stage process: using the builder image to create the binaries and then setting a second image (e.g., the final image) that copies the binaries from the builder without the dependencies.

In the next section, we will review the process of setting up a multi-stage build. We will leverage the previous Dockerfile – Dockerfile.slim
to create the first stage (e.g., the builder image) and set the Python virtual environment. We then introduce the second stage, where we will copy the virtual environment from the builder image.
Multi-Stage Build with Slim Image
Let's now take the Dockerfile we used in the previous example (Dockerfile.slim
) and transition it to a multi-stage build using the following approach:
- Set the
Dockerfile.slim
as the builder image - Set a second build using again the
python:3.10-slim
image and copy the Python virtual environment from the builder image

The implementation of this approach is available on the below Dockerfile:
Dockerfile.multi-stage
# Stage I
FROM python:3.10-slim AS builder
ARG VENV_NAME="my_project"
ENV VENV_NAME=$VENV_NAME
RUN mkdir requirements
COPY install_requirements.sh requirements/
COPY requirements.txt requirements/
RUN bash ./requirements/install_requirements.sh $VENV_NAME
# Stage II
FROM python:3.10-slim
ARG VENV_NAME="my_project"
ENV VENV_NAME=$VENV_NAME
COPY --from=builder /opt/$VENV_NAME /opt/$VENV_NAME
RUN echo "source /opt/$VENV_NAME/bin/activate" >> ~/.bashrc
Where the first stage of this build (commented as Stage I) is a duplication of the Dockerfile.slim
Dockerfile we used in the previous build. To define the first stage as the builder, we added the AS builder
argument to the FROM
command. The build second stage (commented as Stage II) creates a new image. We use the python:3.10-slim
as the base image and then copy the Python virtual environment from the builder image using the COPY
command with the — from=builder
argument.
Note that the second build does not heritage any attributes or settings from the first build unless using specific commands or arguments such as the
— from=builder
argument. Therefore, after we copy the virtual environment from the builder image, we need to update the.bashrc
file and set it up again as our default virtual environment. Alternatively, we could copy the.bashrc
file from the builder as theinstall_requirements.sh
file set it on the builder image.
Let's now go ahead and use again the build_docker.sh
to build the Dockerfile.multi-stage
Dockerfile and named it as rkrispin/python-multi-stage:3.10
:
>bash build_docker.sh Dockerfile.multi-stage rkrispin/python-multi-stage:3.10
Using the docker images
command, you can see that the size of the new image – rkrispin/python-multi-stage
is 483Mb. This is a drop in the image size of 65% (!!!) with respect to the baseline image – rkrispin/python-base
.
>docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
rkrispin/python-multi-stage 3.10 44c0a6f79a62 8 seconds ago 483MB
rkrispin/python-slim 3.10 67443164153f 3 hours ago 829MB
rkrispin/python-base 3.10 abdde42f41ed 6 hours ago 1.38GB
python 3.10-slim d9d4dd71d4ee 3 weeks ago 154MB
python 3.10 ae8368ce557d 3 weeks ago 1.01GB
There is also a difference of about 350Mb between the size of therkrispin/python-slim
image and the one of the rkrispin/python-multi-stage
image. This reflects the size of the items that were installed and used to set the virtual environment but were no longer needed once the environment was set.
Summary
In this post, we saw different methods to reduce the image size. Simply replacing the base image with the slim image reduced the image size by 40%, from 1.38 GB to 829 Mb. We then reviewed the multi-stage build approach, which enabled us to further reduce the image size to 483 Mb. This reflects a total drop in the image size by 1 GB with respect to the baseline build, or 65%.
The next step would be to evaluate alternative images to the slim image, such as the Python Alpine image. While we might achieve additional size reduction, that might require an additional effort to identify and set missing dependencies.
Resources
- Code: https://github.com/RamiKrispin/python-image-optimization
- Multi-Stage Builds: https://docs.docker.com/build/building/multi-stage/
-
- Setting A Dockerized Python Environment – The Elegant Way – https://medium.com/p/f716ef85571d
-
- Setting A Dockerized Python Environment – The Hard Way – https://towardsdatascience.com/setting-a-dockerized-python-environment-the-hard-way-e62531bca7a0
More content is available on my data science channel.