Is Julia Faster than Python and Numba?

Author:Murphy  |  View: 26332  |  Time: 2025-03-23 12:42:06

Optimisation

Photo by Stanos on Unsplash

Numba is a widely used optimisation library for Python that elevates function execution times into the same ballpark as the C language, and C is undoubtedly rapid.

Is that level of optimisation enough to compete with a newer, purpose built, targeted language like Julia? And if so, are there any caveats to achieving that level of execution speed in Python?


Introduction

Photo by Ann H on Pexels

I have previously written an article comparing NumPy to Julia. The outcome was essentially that Julia is indeed faster than NumPy, in general. However, it is a bit more nuanced than that, so I encourage you to check out the article to get the whole story:

Is Julia Really Faster than Python and Numpy?

One of the most common responses to that article was something along the lines of:

Well you should also use Numba. It is simple to implement, and makes things even faster!

-quite a few people

…so this article is going to attempt to address that suggestion head on.

Does using Numba match, or even exceed, the speed of Julia? Is it as easy to use and implement as people seem to claim? And, are there any downsides?

Let's find out…

A quick primer on Julia

Photo by Ann H on Pexels

As some of you may not have read the previous article about NumPy, I will repeat the "What is Julia?" section included in that article here, but feel free to skip ahead if you have already read the previous article.

What is Julia?

Just in case you have no idea what Julia is, here is a quick primer.

Julia is an open source language that is dynamically typed, intuitive, and easy to use like Python, but with the speed of execution of a language like C.

It has been around approximately 11 years (born in 2012), so it is a relatively new language. However, it is at a stage of maturity where you wouldn't call it a fad.

The original creators of the language are active in a relevant field of work:

For the work we do – scientific computing, machine learning, data mining, large-scale linear algebra, distributed and parallel computing – …

  • julialang.org – Jeff Bezanson, Stefan Karpinski, Viral B. Shah, Alan Edelman

All in all, it is a modern language specifically designed to be used in the field of data science. The aims of the creators themselves tell you a great deal:

We want the speed of C with the dynamism of Ruby. We want a language that's homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

(Did we mention it should be as fast as C?)

  • julialang.org – Jeff Bezanson, Stefan Karpinski, Viral B. Shah, Alan Edelman

Sounds quite exciting right?

Incidentally, if you want an idea as to how Python and Julia compare side-by-side in terms of syntax and general usage, then you may want to check out my other article, which takes an in depth look at running a deep learning image classification problem using both Julia (Flux) and Python (TensorFlow):

Julia's Flux vs Python's TensorFlow: How Do They Compare?

What is Numba, and why is it so fast (and popular)?

Photo by Towfiqu barbhuiya on Unsplash

The idea behind Numba is extremely simple.

Pre-compile the Python code to machine code, and execute the compiled code rather than the Python code. Or for a little more detail:

Numba translates Python functions to optimized machine code at runtime using the industry-standard LLVM compiler library. Numba-compiled numerical algorithms in Python can approach the speeds of C or FORTRAN.

numba.pydata.org

To approach the speed of C (or FORTRAN) by definition means that Numba is indeed going to be extremely fast.

Implementation is (generally) easy

One of the major pluses to the way Numba is implemented is that it is, in most cases, very easy to use. Here is an example of a normal Python function, and then the equivalent Numba function:

#normal function
def loop_function(a, b):
    c = np.zeros(a.shape, dtype=np.float32)
    for i in np.arange(c.shape[0]):
        if a[i] < b[i]:
            c[i] = 1.0
        else:
            c[i] = 2.0
    return c, a + b

#the same function implemented with Numba implemented
@njit
def loop_function_numba(a, b):
    c = np.zeros(a.shape, dtype=np.float32)
    for i in np.arange(c.shape[0]):
        if a[i] < b[i]:
            c[i] = 1.0
        else:
            c[i] = 2.0
    return c, a + b

Literally a single decorator! It would appear to be verging on criminal not to implement Numba on every function. (more on the reality of that later…)

So should I replace NumPy with Numba?

Numba is not a replacement, or alternative, to Numpy. It is designed to be utilised in addition to NumPy.

Numba is designed to be used with NumPy arrays and functions. Numba generates specialized code for different array data types and layouts to optimize performance. Special decorators can create universal functions that broadcast over NumPy arrays just like NumPy functions do.

-numba.pydata.org

This is excellent, as NumPy is already extremely powerful, and Numba just elevates it even further. Plus, there is no requirement to re-write all your NumPy code if you decide you want to utilise Numba.

Numba's other tricks

The compilation step is only the start. Numba has an extensive set of additional features that can potentially further increase execution speed. Some examples:

  1. Parallel processing – if you have a CPU with multiple cores, you can potentially use them in parallel to speed up processing
  2. Fastmath – reduce numerical accuracy to gain speed of execution
  3. Cache – save compiled code in a cache to reduce compile overhead on repeat future usage
  4. CUDA – Use your GPU for your calculations

There are of course more options than the four above, and plenty of customisation should you need it. This article will primarily focus on parallel processing and Fastmath from the list above.

Using a GPU (CUDA) warrants a whole other article to do it justice, so that won't be covered in this article.

The disadvantages

There are two main negative points to Numba.

The first, and most obvious, is that compilation is required, and compilation takes time.

If the function is only executed once, the compilation time may be a significant disadvantage. However, if the code requires repeat usage of the compiled function (like in a loop), then the disadvantage could become negligible. It all depends on the circumstances.

…compilation is required, and compilation takes time.

The second, as per the quote directly from the Numba website itself, is that Numba can only be implemented on functions it is designed to be used with.

Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code.

-numba.pydata.org

I will point out that the functions available are quite extensive, so you may not find that this is an issue, but it is a negative point all the same. There is also quite a bit of in built flexibility to allow custom code to be written, such as NumPy ufuncs. So there are some workarounds if you do have quite unique requirements.

The basis of the speed test

Image by StockSnap from Pixabay

As alluded to in the introduction to this article, this particular article is the culmination of two previous articles.

The first provides a guide on how to utilise NumPy vectorization to speed up your Python code:

How to Speedup Data Processing with Numpy Vectorization

…a natural progression was to see how NumPy, and it's implementation of vectorization stacked up against Julia:

Is Julia Really Faster than Python and Numpy?

…and now, mainly due to comments on the previous Julia article, I think it will be interesting to see how Julia stacks up against the added features of Numba.

So, let's get into it!

How will the tests work?

Photo by Nguyen Dang Hoang Nhu on Unsplash

There will be three different function tested. Each function will increase in complexity.

Function 1 – A simple summation

The inputs (a and b) to the following functions are defined as a 1D array/vector with one million elements. Each element is a random number taken from a normal distribution, and of type float32.

For the Python function, the arrays will be NumPy arrays.

# Example of an input array
series1 = np.random.randn(1000000).astype(np.float32)

# Python + NumPy + Numba
@njit
def sum_nums_numba(a, b):
    return a + b
# Example of an input array
series1 = randn(MersenneTwister(12), Float32, 1000000);

# Julia
function sum_nums(a::Vector{Float32}, b::Vector{Float32})
    return a + b
end

Function 2 – A loop function

Loops are ubiquitous, and therefore worth looking at.

The input arrays will be the same as for Function 1.

# Python + NumPy + Numba
@njit
def loop_function_numba(a, b):
    c = np.zeros(a.shape, dtype=np.float32)
    for i in np.arange(c.shape[0]):
        if a[i] < b[i]:
            c[i] = 1.0
        else:
            c[i] = 2.0
    return c, a + b
# Julia
function loop_function(a::Vector{Float32}, b::Vector{Float32})
    c::Vector{Float32} = zeros(Float32, size(a))
    for i = 1:size(c)[1]
        a[i] < b[i] ? c[i] = 1.0 : c[i] = 2.0
 end
 return c, a + b
end

Function 3 – Matrix manipulation

Matrix manipulation is a key component of many algorithms and tasks in the field of data science (especially deep learning), and so is an important factor to consider.

Input into the functions will take the form of a 100 by 100 matrix of random numbers taken from a normal distribution.

# Example of an input matrix
matrix1 = np.random.randn(100,100).astype(np.float32)

# Python + NumPy + Numba
@njit
def matrix_func(mat_a, mat_b):
    a = mat_a.T
    c = np.dot(mat_b,a)
    d = c.reshape(50,50,4)
    e = mat_b.reshape(50,4,50)
    f = d.reshape(200,50)
    g = e.reshape(50,200)
    h = np.dot(f,g)
    i = h.reshape(40000)
    result = 0.0
    for j in np.arange(i.shape[0]):
        result = result + (i[j] - np.sum(i)) / np.sqrt(abs(np.average(i)))
    return result
# Example of an input matrix
matrix1 = randn(MersenneTwister(12), Float32, 10000);
matrix1 = reshape(matrix1,(100,100));

# Julia
function matrix_func(mat_a, mat_b)
    a = mat_a'
    c = mat_b * a
    d = reshape(c,(50,50,4))
    e = reshape(mat_b,(50,4,50))
    f = reshape(d,(200,50))
    g = reshape(e,(50,200))
    h = f * g
    i = reshape(h,40000)
    result = 0.0
    for j = 1:size(i)[1]
        result = result + (i[j] - sum(i)) / sqrt(abs(mean(i)))
    end
    return result
end

Additional investigations

To make things a little more informative, the following items will also be investigated:

  1. All of the Numba functions will be timed both with and without the inclusion of the compilation stage. This will help judge the impact of compilation on overall execution time
  2. A single iteration, and multiple iterations will be performed. Again, to investigate the impact of compilation on execution time
  3. Apart from the ‘normal' running of the functions, the effectiveness of parallel processing in both Numba and Julia will be compared
  4. The additional benefits of the Fastmath parameter in Numba will be investigated

The measurements

The timing of the functions will be conducted using the timeit module in Python and the BenchmarkTools module in Julia.

iterations = 10000
timeit.timeit(stmt=numba_func, setup=setup, number=iterations)
@benchmark sum_nums(rand_array1, rand_array2) samples=10000

Some general information (environment, versions, etc.)

All numbers that will follow were run on the exact same hardware, which uses an 4 core/8 thread CPU (i7–4790K) (exact details are printed in the Jupyter notebooks).

The software versions were as follows:

Julia: 1.9.2

Python: 3.11.4 NumPy: 1.23.5 Numba: 0.57.1

The notebooks for this article

Photo by Jessica Lewis on Pexels

All the code used to generate the results in this article are available in their entirety in two Jupyter notebooks here:

notebooks/julia-numba-comparison at main · thetestspecimen/notebooks

The results

Photo by Anna Nekrashevich on Pexels

Starting with…

A simple summation

First of all over a single iteration.

A simple summation of two arrays over one iteration – Graph by Author

This first run over a single iteration is just to illustrate the significant overhead, compared to the overall execution time, that the compilation of the function may have when using Numba.

It is worth pointing this out to avoid falling into the trap of just using Numba for everything, and not thinking about whether it is appropriate. If you have to execute a relatively simple function only once, then it will typically be better to just use NumPy directly, or at least take advantage of caching if your situation allows for it.

Now let's increase the iterations to reduce the impact of the initial compilation.

A simple summation of two arrays over 10000 iterations – Graph by Author

Well, there you have it. Julia is dead last!

In reality the difference is small (10000 iterations – Julia[7.2s] – NumPy[5.1s]), but it is a difference all the same.

Even more surprising is that Numba actually makes the execution slightly slower than NumPy, even if the Numba compilation time is ignored.

This goes a long was to illustrate that properly vectorized NumPy calculations are very well optimised, which is another element to consider when thinking about about whether Numba is worth using for your particular application.

Note: for more detail on what exactly NumPy vectorization is, and how it works, please check out my previous article where I go into detail:

How to Speedup Data Processing with Numpy Vectorization

Looped Function

Moving on to a more realistic and slightly more complicated scenario, let's have a look at a looping function.

Essentially, running element by element through a 1 million element array and replacing each element based on an if-else statement.

# Python + NumPy + Numba
@njit
def loop_function_numba(a, b):
    c = np.zeros(a.shape, dtype=np.float32)
    for i in np.arange(c.shape[0]):
        if a[i] < b[i]:
            c[i] = 1.0
        else:
            c[i] = 2.0
    return c, a + b
A loop funciton over 100 iterations (Comp – includes the function compilation time, Para – parallel processing)— Graph by Author

Yet again, Numba comes out on top.

Interestingly, even including the compilation time in the Numba run still has Numba coming out faster than Julia. Obviously, if the iterations were less, this lead would diminish, and then ultimately reverse. However, the execution stage is definitely quicker.

You will also note that parallel processing can help significantly in the case of both Julia and Numba. Some slight adjustments to the function must be made but nothing too extreme.

For Julia it is just a case of adding Threads.@threads in front of the for loop:

# parallelised Julia function
function loop_function_para(a::Vector{Float32}, b::Vector{Float32})
    c::Vector{Float32} = zeros(Float32, size(a))
    Threads.@threads for i = 1:size(c)[1]
        a[i] < b[i] ? c[i] = 1.0 : c[i] = 2.0
 end
 return c, a + b
end

For Numba it just requires adding parallel=True to the decorator and swapping out np.arange for prange(i.e. parallel range).

@njit(parallel=True)
def loop_function_numba_par(a, b):
    c = np.zeros(a.shape, dtype=np.float32)
    for i in prange(c.shape[0]):
        if a[i] < b[i]:
            c[i] = 1.0
        else:
            c[i] = 2.0
    return c, a + b

Another few additional microseconds were gained via the use of Fastmath, which essentially reduces precision to gain execution speed. This could be very useful in the realms of machine/deep learning where high numeric precision is not necessarily required when training a model.

Again just a simple addition to the decorator:

@njit(parallel=True, fastmath=True)
def loop_function_numba_par_fast(a, b):
    c = np.zeros(a.shape, dtype=np.float32)
    for i in prange(c.shape[0]):
        if a[i] < b[i]:
            c[i] = 1.0
        else:
            c[i] = 2.0
    return c, a + b

Matrix manipulation

Matrix manipulation is an essential part of Data Science workflow and specifically deep learning/neural networks.

With that in mind, I thought it might be interesting to see how Julia and Numba cope with chaining some different matrix manipulations together. Including:

  • the ubiquitous dot product
  • transposition
  • reshaping
  • summation
  • square root
  • absolute value
  • average value
# the Python code for reference
@njit
def matrix_func(mat_a, mat_b):
    a = mat_a.T
    c = np.dot(mat_b,a)
    d = c.reshape(50,50,4)
    e = mat_b.reshape(50,4,50)
    f = d.reshape(200,50)
    g = e.reshape(50,200)
    h = np.dot(f,g)
    i = h.reshape(40000)
    result = 0.0
    for j in np.arange(i.shape[0]):
        result = result + (i[j] - np.sum(i)) / np.sqrt(abs(np.average(i)))
    return result
Matrix manipulation over 20 iterations (Comp – includes the function compilation time, Para – parallel processing) – Graph by Author

Julia comes out on top in this case, by quite a margin (approx. 10 times quicker). Why this is is difficult to say, and would require further investigation. However, I suspect it will be down to some of the limitations of Numba, which I will discuss further in the next section.

Another thing to note here is that due to the extended execution time of this function compared to Functions 1 and 2, the compilation time is already insignificant at just 20 iterations.

Ultimately though, Numba is as quick as they say, and it is easy to use in the majority of cases. It easily pushes Python and it's ecosystem into the territory of Julia, and in general bang up to date.

With some caveats…

The Limitations

Photo by RDNE Stock project on Pexels

It is fair to say that Numba can indeed keep up, and sometimes exceed Julia in terms of execution speed.

However, there is one major difference between Julia and Numba. Numba is an external library for a language, whereas the methods used in Julia are native methods integrated in to the core language.

Numba is an external library for a language, whereas the methods used in Julia are native methods integrated in to the core language.

What this essentially means is that with Julia you are very unlikely to hit incompatibility issues or limitations on method application. The same cannot be said for Numba. Not only are the limitations of the application of Numba explicitly defined in the official documentation:

Supported Python features Supported NumPy features

…but you are much more likely to find bugs due to incompatibility.

Some bugs I found

The code in this article is very limited. Pretty simple functions. However, I still had issues with Numba, and had to adjust my analysis as a result.

On the last benchmarks conducted on 2D arrays/matrices (Function 3), there were various adjustments that had to be made just to get the function to run when using Numba.

It should be noted that none of the following are a problem when using straight Python/NumPy without Numba.

  1. You cannot use np.reshape with a second argument (i.e. you cannot specify the re-order type ‘F', ‘C' etc.). This was an issue as Julia and Python use a different matrix index ordering as default, and attempting to keep things as comparable/fair as possible required looking into this second argument
  2. np.matmul is not supported and therefore you have to use np.dot. This is not an issue for 2D arrays like those used in this article, but these methods are not equivalent for higher dimensional arrays, and you therefore may have an issue if you rely on np.matmul for higher dimensional arrays.
  3. Open bug – using np.reshape after a transpose requires a copy to be taken, or it fails
  4. Open bug – no support for integer arrays with np.dot. Having been forced to abandonnp.matmul for np.dot (due to point 2), I now had another issue during testing before moving to floats

I lost quite a bit of time with the four points above, as it is not always clear whether you are doing something wrong (i.e. trying to implement something that is not supported), or dealing with a bug. If I wasn't dealing with "play" code for an article, and this was project code, it could become quite frustrating.

Conclusion

Photo by Ann H on Pexels

Numba is excellent, and most importantly everything people claim it is:

Fast and easy to implement

…and yes, to generalise, it is basically as fast as Julia. As long as it is used in the appropriate circumstances.

However, I can't overlook the fact that Numba has some serious limitations when compared directly to Julia. Of course, the importance of those limitations will vary depending on your own requirements, and particular constraints.

In the real world, all this basically means is that if your current projects, or infrastructure, rely on Python, and changing to a new language is too much (lack of experienced devs, too much legacy code, not enough budget etc.). Then due to the hard work and persistence of the devs behind libraries like NumPy and Numba, Python is still bang up to date in terms of speed and features in the field of Data Science. Numba, and NumPy, very effectively fill a gap.

However, when writing code in Julia you well are aware of the fact that the base language is heavily optimised without any external libraries. It is also designed with data science in mind from the ground up, as that is what the creators needed themselves:

For the work we do – scientific computing, machine learning, data mining, large-scale linear algebra, distributed and parallel computing – …

  • julialang.org – Jeff Bezanson, Stefan Karpinski, Viral B. Shah, Alan Edelman

You don't have to be aware, or knowledgeable, about external libraries and tools to ensure that your project is fast with Julia. It will be fast by default. This allows more thought to go into what you are trying to achieve, rather than constantly having to consider the best way to speed things up, or optimise.

That is why, if reasonably possible, it makes a lot of sense to switch over to Julia (in my opinion!).

Note: If you want to know more about the ins and outs of Julia for something like deep learning when compared to Python (TensorFlow), then be sure to take a look at this article:

Julia's Flux vs Python's TensorFlow: How Do They Compare?


If you found this article interesting or useful, remember to follow me, or sign up for my newsletter for more content like this.

If you haven't already, you could also consider subscribing to Medium. Your membership fee directly supports, not just me, but other writers you read too. You'll also get full unrestricted access to every story on Medium.

Using my referral link to sign up will grant me a small kickback with zero effect on your membership, so thank you if you choose to do so.

Join Medium with my referral link – Mike Clayton

Tags: Deep Dives Julia Machine Learning Numba Numpy

Comment