How to Boost the Performance of Python Using Caching Techniques

Author:Murphy | View: 26253 | Time: 2025-03-22 21:54:45

Once you are not a Python newbie anymore, it will be time to explore built-in features in Python. I bet there will be many out-of-the-box built-in features in Python that will surprise you. Today, I'll introduce one of them in this article.

Have you ever encountered scenarios where a function must be executed multiple times and some results can be reused? In the first example below, you will see that the caching mechanism improved the performance of a recursive function by 120 times. So, in this article, I'll introduce the lru_cache decorator in functools which has been built-in in Python since version 3.2.

In the functools module, there is another decorator called cache that is more straightforward. However, easy to use sometimes means less flexibility. So, I'll start with the cache decorator and introduce its limitations. Then, how lru_cache will give us more flexibility will be focused on.

1. Recap of the @cache decorator

In one of my previous articles, I have introduced the @cache decoration from the functools module that is built-in in Python. In that article, I used a Fibonacci recursive function to demonstrate the power of the caching mechanism and achieved roughly 120 times faster than the version without caching!

The example is as follows. Let's first write a Fibonacci recursive function.

def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

Like most other Programming languages, Python also needs to build a "stack" for a recursive function and calculate the value on every stack.

However, the "cache" decoration will improve the performance significantly. Also, it is not difficult to do so. We just need to import it from the functools module, and then add the decoration to the function.

Python">from functools import cache

@cache
def fibonacci_cached(n):
    if n < 2:
        return n
    return fibonacci_cached(n-1) + fibonacci_cached(n-2)

Here are the running results and the performance comparison.

If you want to know why the performance has improved significantly, please check the article below.

How to Use Python Built-In Decoration to Improve Performance Significantly

Re-write the function using lru_cache

In fact, we don't really need to "rewrite" anything. To use the lru_cache decorator, we can simply use it the same as the cache decorator. So, only the decorator's name is changed.

from functools import lru_cache

@lru_cache
def fibonacci_cached(n):
    if n < 2:
        return n
    return fibonacci_cached(n-1) + fibonacci_cached(n-2)

It can be seen that similar performance improvements are reproduced.

OK. Now, you may ask why we need lru_cache and what are the differences between it and the cache decoration? The rest sections will answer.

2. What is the limitation of "@cache"?

Before we start to introduce the lru_cache decorator, we need to understand what are the limitations of the cache decorator. Actually, the major limitation is the memory concerns.

Let me just simulate a use case. Suppose we are developing an API endpoint getting user details from the database. To be simple, I will skip all the steps and only define a function to return a user randomly. So, you can easily copy-paste my code to try it out straight away if you want.

import random
import string
import tracemalloc
from functools import cache

@cache
def get_user_data(user_id):
    return {
        "user_id": user_id,
        "name": "User " + str(user_id),
        "email": f"user{user_id}@example.com",
        "age": random.randint(18, 60),
        "self-introduction": ''.join(random.choices(string.ascii_letters, k=1000))
    }

In the above code, the random module was used to generate age and self-introduction for this user. The self-introduction is just a 1000-length string with random characters for simulation purposes.

Now, let's write a simulating function to call this get_user_data() function. In order to understand the memory usage, we need to use the tracemalloc module which is also built-in in Python.

def simulate(n):
    tracemalloc.start()
    _ = [get_user_data(i) for i in range(n)]
    current, peak = tracemalloc.get_traced_memory()
    print(f"Current memory usage: {current/(1024**2):.3f} MB")
    tracemalloc.stop()

So, we run the get_user_data() function for n times. Simultaneously, we use the tracemalloc module to track memory usage.

Here is the memory usage for 10,000 simulations, the memory usage is about 14MB.

Here is the memory usage for 100,000 simulations, the memory usage is about 143MB.

And here is the memory usage for 1,000,000 simulations, the memory usage is about 1421MB.

More importantly, the memory used for caching will never be released until the process is halted. Also, In real-world scenarios, we may not know how much memory we will need for caching. This will lead to an out-of-control consequence.

I guess you have already realised, that's the case we need to use the lru_cache.

Why is it called "lru"?

The term "lru" in the lru_cache decoration stands for "Least Recently Used". It indicates the algorithm used by the caching mechanism is to manage the memory by retaining the frequently accessed data and disposing of those least used data.

Wait, that sounds like we can add an upper limit of the data it can cache. Yes, this is exactly the flexibility that lru_cache can give us.

3. Controlling the maximum memory size

Now, let's have a look at how to control the maximum memory size.

We can reuse the previous get_user_data() for comparison purposes. Of course, we need to make some changes to the previous example for demonstration purposes.

Firstly, we need to import the lru_cache, and use this decorator on the function.

import random
import string
import tracemalloc
from functools import lru_cache

@lru_cache(maxsize=1000)
def get_user_data(user_id):
    return {
        "user_id": user_id,
        "name": "User " + str(user_id),
        "email": f"user{user_id}@example.com",
        "age": random.randint(18, 60),
        "self-introduction": ''.join(random.choices(string.ascii_letters, k=1000))
    }

For demonstration purposes, I will set the maxsize=1000 so that we can compare the memory usage with the cache decorator without memory control.

Then, we need to modify the simulate function as well. This time, let's output the memory usage every 100 runs.

def simulate(n):
    tracemalloc.start()
    for i in range(n):
        _ = get_user_data(i)
        if i % 100 == 0:
            current, peak = tracemalloc.get_traced_memory()
            print(f"Iteration {i}: Current memory usage: {current/(1024**2):.3f} MB")
    tracemalloc.stop()

Then, let's simulate 2,000 runs.

It is obvious that the first 1000 runs show that the memory usage is accumulating. After that, for the 1001–2000 runs, the memory usage becomes stable without a significant increase. The chart below shows the trend of the increase.

Therefore, when we set the maxsize of the lru_cache, before it reaches the upper limit number of objects it is allowed to cache, it will work the same as the cache decoration. When it reaches the upper bound, it will start to drop from the Least Recently Used cached object.

That is how lru_cache is more flexible because it mitigates the memory concerns of the caching.

4. More tools for the cache management

OK, it is great to see how easy it is to use the cache and lru_cache in Python. However, can we manage whatever we have cached? Setting up the maximum size is definitely a good one, but any more? Yes, let's have a look at two control functions of the cache.

Using cache_info() to monitor the performance

The first useful function is the cache_info() function. It can show the current size of the case, how many times the cached data was utilised (hits) and how many of them have never been used (misses).

Let's use the simple Fibonacci function again.

from functools import lru_cache

@lru_cache(maxsize=32)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

OK, now let's run the function and check the cache info.

print(fibonacci(30))
print(fibonacci.cache_info())

We can see that the maxsize shows the limit of the lru_cache was set to be 32 objects at maximum. There are already 31 used (cached). Out of these 31 cached values, there were 28 hits. There were also 31 misses which means there were 31 times that the function had to run its logic to calculate the value because there was no cached value. It looks like our caching strategy is pretty successful in this function.

The cache_info() function gives us criteria to verify if our caching feature helped. For example, if there were very few hits and many misses, it probably means that the function may not be worth to be cached.

Using cache_clear() to reset the cache.

This function can be guessed easily from its name. Yes, it will clear all the cached results once we run it. Let's use the Fibonacci example again.

from functools import lru_cache

@lru_cache(maxsize=32)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

# Run the function and check the info
print(fibonacci(30))
print(fibonacci.cache_info())

# Clear the cache and check the info again
fibonacci.cache_clear()
print(fibonacci.cache_info())

In the above code, we simply run the Fibonacci function again. Then, by checking the status, it shows exactly the same stats as the previous one, of course. After that, we run the cache_clear() function to truncate all the cached results. Now, let's check the info again, nothing will be left.

By knowing both the above-mentioned two functions, we will be able to manage our cache easier. For example, one of the big issues of caching is that the memory will never be released, but if we can periodically or dynamically clear the cache, this issue could be mitigated or even resolved.

Summary

In this article, I have started by introducing the cache decorator from the functools which is a Python built-in module. Although it is easier to use, there are some limitations such as the lack of memory control. Instead, lru_cache will give us the flexibility to make sure we don't have a memory leak in a more complex situation.

Apart from that, there are also some more tips introduced such as showing the statistics of the existing caches and cleaning the cached results. By using these management functions well, we can achieve more customised behaviour to avoid bugs and utilise memory more efficiently. Hope this article helps!

Tags: Artificial Intelligence Data Science Machine Learning Programming Python