Pytest Tutorial: An Introduction To Unit Testing

Author:Murphy  |  View: 23202  |  Time: 2025-03-23 18:55:19
Photo by Yancy Min on Unsplash

Background

Imagine you are a data scientist who has just developed some awesome new model that is going to bring the company a lot of money. The next step is to send it to production. You spend some days making the code PEP compliant, applying **[linting](https://en.wikipedia.org/wiki/Lint%28software%29), etc. Finally, you create a pull request**_ on GitHub excited about your new release. Then, a Software Engineer asks: ‘I don't see any tests here?'

This scenario has happened to me and is quite frequent with junior Data Scientists. Testing is an essential part of any software project and Data Science is no different. Therefore, it is an important concept and tool to nail down as it will be invaluable in your career. In this post, I dive into the need for testing and how we can easily carry them out by using Pytest.

What are Tests?

Testing is something we do naturally by simply inferring if the output is what we have expected which is called exploratory testing. However, this is not ideal especially when you have a large codebase with numerous steps, as it would be hard to detect where the problem is occurring.

Therefore, it is common to practice having written tests for your code. You would have some input and expected output. This _automates_ the testing process and speeds up the debugging process.

The most common and frequent written tests are _unit tests_. These are tests that test small blocks of code, typically functions and classes, to verify the block is doing what it should.

The general advantages of unit tests are:

  • Speeds up debugging and finding the issues
  • Identifying bugs earlier
  • More robust and maintainable code
  • Leads to better code design with less complexity

Unit tests are the foundational tests in the testing period with integration and _system_ testing following.

Software testing pyramind. Diagram by author.

What is Pytest?

Pytest is an easy-to-use python package to carry out unit testing. It is the most popular testing package alongside Python's native unit test framework. Pytest has several advantages over other testing frameworks:

  • Open source
  • Skip and label tests
  • Parallelized test execution
  • Very easy and intuitive to use

Now let's begin some testing!

Installation and Setup

You can install pytest through pip by simply writing:

pip install pytest

In your terminal or command line. If you want a certain version:

pip install pytest==

You can verify it is installed on your machine through:

pytest --version

The best practice is to have the tests in a separate directory, such as tests/, to the main code. Another requirement is that all test files are prefixed with test_*.py or suffixed *_test.py using _snake case. Similarly, all test functions and classes should start with `testorTest(_**[camel case](https://en.wikipedia.org/wiki/Camel_case)**_) respectively. This ensures thatpytest` knows which functions, classes, and files are tests.

Basic Example

Let's go through a very simple example.

First, we will create a new directory pytest-example/ containing two files: calculations.py and test_calculations.py. In the calculations.py file, we will code the following function:

Python">def sum(a: float, b: float) -> float:
    """
    Calculate the sum of the two numbers.

    :param a: The first number to be added.
    :param b: The second number to be added.
    :return: The sum of the two numbers.
    """
    return a + b

And in the test_calculations.py file, we write its corresponding unit test:

from calculations import sum

def test_sum():
    assert sum(5, 10) == 15

This test can be run by executing either of these commands:

pytest
pytest test_calculations.py

And the output looks like this:

Image from author.

Good news, our test passed!

However, if our assert is incorrect:

def test_sum():
    assert sum(5, 10) == 10

The output would be:

Image from author.

Several Tests

It is possible to have several tests for different functions. For example, let's add another function to calculations.py :

def sum(a: float, b: float) -> float:
    """
    Calculate the sum of the two numbers.

    :param a: The first number to be added.
    :param b: The second number to be added.
    :return: The sum of the two numbers.
    """
    return a + b

def multiply(a: float, b: float) -> float:
    """
    Calculate the product of the two numbers.

    :param a: The first number to be added.
    :param b: The second number to be added.
    :return: The product of the two numbers.
    """
    return a * b

And then add the test for the multiply function in test_calculations.py:

from calculations import sum, multiply

def test_sum():
    assert sum(5, 10) == 15

def test_multiply():
    assert multiply(5, 10) == 50

Executing pytest:

Image from author.

The two tests have passed!

However, what if you wanted, say, to just run the test_multiply function? Well, all you need to do is pass that function name as an argument when executing pytest:

pytest test_calculations.py::test_multiply 
Image from author.

As we can see, pytest only ran test_multiply as we wanted!

If we wanted to now add a divide function, it would be best practise to now turn them into classes:

class Calculations:
    def __init__(self, a: float, b: float) -> None:
        """
        Initialize the Calculation object with two numbers.

        :param a: The first number.
        :param b: The second number.
        """
        self.a = a
        self.b = b

    def sum(self) -> float:
        """
        Calculate the sum of the two numbers.

        :return: The sum of the two numbers.
        """
        return self.a + self.b

    def multiply(self) -> float:
        """
        Calculate the product of the two numbers.

        :return: The product of the two numbers.
        """
        return self.a * self.b

    def divide(self) -> float:
        """
        Calculate the quotient of the two numbers.

        :return: The quotient of the two numbers.
        """
        return self.a / self.b
from calculations import Calculations
import pytest

class TestCalculations:
    def test_sum(self):
        calculations = Calculations(5, 10)
        assert calculations.sum() == 15

    def test_multiply(self):
        calculations = Calculations(5, 10)
        assert calculations.multiply() == 50

    def test_divide(self):
        calculations = Calculations(5, 10)
        assert calculations.divide() == 0.5

Pytest Fixtures

In the above TestCalculations class, notice that we initialise the Calculations class several times. This is not optimal and luckily pytest has fixtures to address this exact scenario:

from calculations import Calculations
import pytest

@pytest.fixture
def calculations():
    return Calculations(5, 10)

class TestCalculations:
    def test_sum(self, calculations):
        assert calculations.sum() == 15

    def test_multiply(self, calculations):
        assert calculations.multiply() == 50

    def test_divide(self, calculations):
        assert calculations.divide() == 0.5

Instead of initialising Calculations multiple times, we can attach the fixture as a decorator to contain the information on the input data.

Pytest Parametrize

Up to this point, we have only passed one test case for each test function. However, there may be multiple edge cases you want to test and verify. Pytest makes this process very easy through the parametrize decorator:

from calculations import Calculations
import pytest

@pytest.fixture
def calculations():
    return Calculations(5, 10)

class TestCalculations:

    @pytest.mark.parametrize("a, b, expected_output",
                             [(1, 3, 4), (10, 50, 60), (100, 0, 100)])
    def test_sum(self, a, b, expected_output):
        assert Calculations(a, b).sum() == expected_output

    def test_multiply(self, calculations):
        assert calculations.multiply() == 50

    def test_divide(self, calculations):
        assert calculations.divide() == 0.5

Where we have used the pytest.mark.parametrize decorator to test several inputs for the sum function. The output looks like this:

Image from author.

Notice that we have 5 test passing instead of 3, this is because we are passing two extra tests to the sum function.

Summary & Further Thoughts

Testing, particularly Unit Testing, is an essential skill to learn and understand as Data Scientist as it helps prevents bugs and speeds up development time. The most common testing package, in Python, is Pytest. This is an easy-to-use framework with an intuitive testing procedure. In this article, we have shown how you can use Pytest making use of its fixtures and parametrize features.

The full code used in this article is available here:

Medium-Articles/Software Engineering /pytest-example at main · egorhowell/Medium-Articles

Another Thing!

I have a free newsletter, Dishing the Data, where I share weekly tips for becoming a better Data Scientist.

Dishing The Data | Egor Howell | Substack

Connect With Me!

References & Further Reading

Tags: Data Science Programming Pytest Python Unit Testing

Comment