Introduction to Multi-Stage Image Build for Python

Author:Murphy  |  View: 23512  |  Time: 2025-03-22 22:00:34

I never paid too much attention to the size of my images until I started deploying my code into Github Actions using containers. The math here is simple: the bigger the size of your container, the longer the load time is, and therefore, the higher your costs are. The moment my Python image size reached to 5Gb (thanks, PyTorch!), I started to explore more efficient image-build approaches.

TLDR – I was able to reduce the size of my baseline image by 65% using the multi-stage build approach.

In this post, we will review three build approaches and see how, with a few simple steps, we can reduce the size of our Python image. We will start with our baseline build using the official Python image – python:3.1o, and then explore the image slim version, and finish by introducing a more advanced approach – the multi-stage build.

A cargo ship full of containers (created by the author with Midjourney)

Prerequisites

To follow along with this tutorial, you will need the following settings:

  • Docker Desktop (or equivalent) if you are using a macOS or Windows OS machine, or Docker installed if you are using a Linux OS
  • Docker Hub account to pull the image from

Throughout this tutorial, we will use different flavors of the official Python image – python:3.1o.

Scope

Before getting started, let's define the settings of the dockerized Python environment. This will enable us to make an apples-to-apples comparison and benchmark the different build sizes. We will use Python 3.10 and set the environment using the below helper files:

requirements.txt

wheel
pandas
numpy
matplotlib
requests
plotly

install_requirements.sh

#!/usr/bin/env bash

VENV_NAME=$1

python3 -m venv /opt/$VENV_NAME  
    && export PATH=/opt/$VENV_NAME/bin:$PATH 
    && echo "source /opt/$VENV_NAME/bin/activate" >> ~/.bashrc

apt-get update && apt-get install -y --no-install-recommends 
    gcc 
    python3-dev 
    build-essential 
    && rm -rf /var/lib/apt/lists/*
source /opt/$VENV_NAME/bin/activate 

pip install --upgrade pip

pip3 install  --no-cache-dir -r ./requirements/requirements.txt

Where the requirements.txt file defines the Python environment required libraries. The install_requirements.sh bash script sets the virtual environment and installs the libraries from the requirements.txt file.

Last but not least, we will use the below bash script to build the images:

build_docker.sh

#!/bin/bash

echo "Build the docker"
dockerfile_name=$1
image_name=$2

docker build . -f $1 
               --progress=plain 
               --build-arg VENV_NAME="python-poc" 
               -t $2

This script uses two arguments (i.e., $1 and $2 in the script) that represent the following:

  • The build Dockerfile name
  • The image name

So, for example, if we want to build an image using a Dockerfile named as Dockerfile.base and set the image as rkrispin/python-base:3.10, we will use the following command on the CLI:

>bash build_docker.sh Dockerfile.base rkrispin/python-base:3.10

All the code in this post is available on the below repository:

GitHub – RamiKrispin/python-image-optimization: Approaches for Python Image Size Optimizations …

Baseline Python Image Build

We will use the default Python 3.10 image – python:3.10 to set our baseline build. Before we start to build the baseline image, let's pull the python:3.10 image and review its size:

>docker pull python:3.10                     ok
3.10: Pulling from library/python
1e92f3a395ff: Pull complete
374850c6db17: Pull complete
421c44fab18b: Pull complete
b9717a38adec: Pull complete
51795e508cf7: Pull complete
4915706e8f81: Pull complete
0b4c880267da: Pull complete
21a087e2a4d2: Pull complete
Digest: sha256:f68383667ffe53e85cc0fe4f5a604d303dfa364f238ac37a4675980a2b93b1c5
Status: Downloaded newer image for python:3.10
docker.io/library/python:3.10

Using thedocker images command, you can see that the default Python image size is 1 GB:

>docker images
REPOSITORY               TAG           IMAGE ID       CREATED         SIZE
python                   3.10          ae8368ce557d   3 weeks ago     1.01GB

As we are going to build the baseline image on top of the python:3.10 image, the final image size will be greater than 1 GB. To build the baseline image, we will use the following Dockerfile:

Dockerfile.base

FROM python:3.10

ARG VENV_NAME="my_project"
ENV VENV_NAME=$VENV_NAME

RUN mkdir requirements

COPY install_requirements.sh requirements.txt requirements/

RUN bash ./requirements/install_requirements.sh $VENV_NAME

This simple build includes the following steps:

  • Import the python:3.10 image as the base image
  • Use an argument variable to set the virtual environment name
  • Create a local folder and copy the helper files – install_requirements.sh and requirements.txt
  • Set the virtual environment and install the required libraries

We will use the build_docker.sh helper bash script to build the image calling the Dockerfile.baseand naming the image as rkrispin/python-base:3.10:

>bash build_docker.sh Dockerfile.base rkrispin/python-base:3.10

Let's now use again the docker images command to review the size of the new image and compare it with the size of the base image:

>docker images
REPOSITORY             TAG       IMAGE ID       CREATED          SIZE
rkrispin/python-base   3.10      abdde42f41ed   17 seconds ago   1.38GB
python                 3.10      ae8368ce557d   3 weeks ago      1.01GB

As you can see from the docker images output, the size of the baseline image is 1.38 GB. This means that the setting of the virtual environment and installing the required libraries added an additional ~380Mb to the image size. Potentially, the reduction in the size of the image could be accomplished by reducing the size of one of the following:

  • Base image (e.g., python:3.10)
  • The virtual environment

Generally, there is more room for improvement in reducing the size of the base image than in the virtual environment. We will address both in the coming sections.

Using A Slim Image

In this section, we will address the size of the base image. A quick win would be to use the slim version of the base image – python:3.10-slim. The slim image is a minimalist version of the python:3.10 image. The size of the image is 154 Mb, which is about 15% of the size of the original Python image.

Let's rebuild the baseline image, this time using the python:3.10-slim image as the base image:

Dockerfile.slim

FROM python:3.10-slim

ARG VENV_NAME="my_project"
ENV VENV_NAME=$VENV_NAME

RUN mkdir requirements

COPY install_requirements.sh requirements.txt requirements/

RUN bash ./requirements/install_requirements.sh $VENV_NAME

We will use again the build_docker.sh base script to build the image calling the Dockerfile.slim file and naming the image as rkrispin/python-slim:3.10:

>bash build_docker.sh Dockerfile.slim rkrispin/python-slim:3.10

We can now go ahead and check the new image size using the docker images command:

>docker images                                                                                                                     ok 
REPOSITORY             TAG         IMAGE ID       CREATED              SIZE
rkrispin/python-slim   3.10        67443164153f   About a minute ago   829MB
rkrispin/python-base   3.10        abdde42f41ed   2 hours ago          1.38GB
python                 3.10        ae8368ce557d   3 weeks ago          1.01GB
python                 3.10-slim   d9d4dd71d4ee   3 weeks ago          154MB

As you can see, simply by using the slim image, we reduced the image size by 40% from 1.38 GB to 829 Mb.

Note: You can notice that setting the virtual environment and installing the required librarieson on top of the slim image added 675 Mb to the image size (e.g., from 154 Mb to 829 Mb). The main reason for this difference is related to the missing dependencies in the slim image that were installed during the setting of the virtual environment. One thing to take into account is that using minimalist image as a baseline in some case could turn out as double sword problem. If some the environment dependencies are not available on the slim image, you may ended up spending more time on identify the missing dependies and install them.

In the next section, we review a more advanced build method using the multi-stage approach.

Multi-Stage Builds Approach

Before getting started, let's pause and explain what is a multi-stage build. The philosophy behind the multi-stage build is fairly simple – building binary applications requires some tools and dependencies that are no longer needed once the build is done. In a single-stage build (e.g. regular build approach), those dependencies are left over and take up unnecessary space. A multi-stage solve this issue by using the following steps:

  • Create a builder image to compile the binary applications. This image includes all the tools and dependencies required to build the binary applications
  • Create a second image that simply copies the built binary applications from the builder image

The figure below illustrates the multi-stage process: using the builder image to create the binaries and then setting a second image (e.g., the final image) that copies the binaries from the builder without the dependencies.

Multi-stage build illustration ((Image credit: Rami Krispin)

In the next section, we will review the process of setting up a multi-stage build. We will leverage the previous Dockerfile – Dockerfile.slim to create the first stage (e.g., the builder image) and set the Python virtual environment. We then introduce the second stage, where we will copy the virtual environment from the builder image.

Multi-Stage Build with Slim Image

Let's now take the Dockerfile we used in the previous example (Dockerfile.slim) and transition it to a multi-stage build using the following approach:

  • Set the Dockerfile.slim as the builder image
  • Set a second build using again the python:3.10-slim image and copy the Python virtual environment from the builder image
Multi-stage build with the Python slim image (Image credit: Rami Krispin)

The implementation of this approach is available on the below Dockerfile:

Dockerfile.multi-stage

# Stage I
FROM python:3.10-slim AS builder

ARG VENV_NAME="my_project"
ENV VENV_NAME=$VENV_NAME

RUN mkdir requirements
COPY install_requirements.sh requirements/

COPY requirements.txt requirements/
RUN bash ./requirements/install_requirements.sh $VENV_NAME

# Stage II
FROM python:3.10-slim

ARG VENV_NAME="my_project"
ENV VENV_NAME=$VENV_NAME

COPY --from=builder /opt/$VENV_NAME /opt/$VENV_NAME
RUN echo "source /opt/$VENV_NAME/bin/activate" >> ~/.bashrc

Where the first stage of this build (commented as Stage I) is a duplication of the Dockerfile.slim Dockerfile we used in the previous build. To define the first stage as the builder, we added the AS builder argument to the FROM command. The build second stage (commented as Stage II) creates a new image. We use the python:3.10-slim as the base image and then copy the Python virtual environment from the builder image using the COPY command with the — from=builder argument.

Note that the second build does not heritage any attributes or settings from the first build unless using specific commands or arguments such as the — from=builder argument. Therefore, after we copy the virtual environment from the builder image, we need to update the.bashrc file and set it up again as our default virtual environment. Alternatively, we could copy the .bashrc file from the builder as the install_requirements.sh file set it on the builder image.

Let's now go ahead and use again the build_docker.sh to build the Dockerfile.multi-stage Dockerfile and named it as rkrispin/python-multi-stage:3.10:

>bash build_docker.sh Dockerfile.multi-stage rkrispin/python-multi-stage:3.10 

Using the docker images command, you can see that the size of the new image – rkrispin/python-multi-stage is 483Mb. This is a drop in the image size of 65% (!!!) with respect to the baseline image – rkrispin/python-base.

>docker images
REPOSITORY                    TAG         IMAGE ID       CREATED         SIZE
rkrispin/python-multi-stage   3.10        44c0a6f79a62   8 seconds ago   483MB
rkrispin/python-slim          3.10        67443164153f   3 hours ago     829MB
rkrispin/python-base          3.10        abdde42f41ed   6 hours ago     1.38GB
python                        3.10-slim   d9d4dd71d4ee   3 weeks ago     154MB
python                        3.10        ae8368ce557d   3 weeks ago     1.01GB

There is also a difference of about 350Mb between the size of therkrispin/python-slim image and the one of the rkrispin/python-multi-stage image. This reflects the size of the items that were installed and used to set the virtual environment but were no longer needed once the environment was set.

Summary

In this post, we saw different methods to reduce the image size. Simply replacing the base image with the slim image reduced the image size by 40%, from 1.38 GB to 829 Mb. We then reviewed the multi-stage build approach, which enabled us to further reduce the image size to 483 Mb. This reflects a total drop in the image size by 1 GB with respect to the baseline build, or 65%.

The next step would be to evaluate alternative images to the slim image, such as the Python Alpine image. While we might achieve additional size reduction, that might require an additional effort to identify and set missing dependencies.

Resources

More content is available on my data science channel.

Tags: Data Science DevOps Docker Mlops Python

Comment