Is Matplotlib Still the Best Python Library for Static Plots?

Author:Murphy  |  View: 23057  |  Time: 2025-03-22 23:12:06

Visualisation

A Matplotlib scatter plot – Image by Author

Matplotlib is probably the first plotting library that every data scientist, or data analyst, comes across if they work in the Python programming language. It appears to be used everywhere.

…so, is it so ubiquitous because it is the best available? Or has it just been around in the industry a long time? What alternatives are there, and how do they measure up?


Introduction

You may note in the title of the article that I have specifically mentioned static plots.

Static vs Dynamic

Although dashboards, and interactive plots, are an extremely important aspect in exploring and presenting data, there will always be a requirement for static plots.

Reports, technical papers, articles, and anything in print are always going to require static plots. Being able to quickly and easily produce clear, logical and beautiful static plots therefore remains absolutely essential. With that in mind, this article will primarily focus on static plots.

Static plots pose additional challenges

Static plots require all the information to be displayed effectively in one view. There are no dynamic overlays, or zooming and panning, to help gain further information on the fly like you might see in a dashboard or interactive plot.

As the old saying goes "a picture can convey a thousand words"…but the CAN in that statement is very important!

As the old saying goes "a picture can convey a thousand words". However, the CAN in that statement is very important! If the plot is poorly constructed, the message definitely will not be conveyed effectively.

User vs Library

Producing a well thought out plot, to some extent, comes down to the skill of the practitioner, but it also relies on the usability and extensibility of the Plotting library in question.

The Plan

The aim of this article is to make a comparison of some of the most commonly used plotting libraries in terms of their usability and extensibility. The baseline in this particular case will be Matplotlib.

The first difficulty is deciding what to plot, as every plotting library has a plethora of chart types available. So in this instance I have decided to go with a scatterplot.

Why a scatterplot?

A scatterplot is probably the most basic plot types, but easily one of the most used.

However, due to its simplicity as a plot, it is possible to present multiple parameters effectively on the same plot. This adds to the work the plotting library needs to do to render the data clearly, without the confusion of a cluttered plot.

On one single plot it is possible to present four separate variables using the following plot features:

  • x-axis
  • y-axis
  • point colour
  • point size

Side note: in theory point colour could be swapped out for point type (i.e. cross, circle, star etc.), which would be more appropriate for black and white printing, for example. However, I think dealing with colour is more of a challenge for the libraries in this instance.

That is a total of four interrelated variables on a single plot. Sounds simple enough at first, but to be a clear plot it requires the library to:

  • produce two separate legends (point colour and point size), and position the legends sensibly on the plot so as not to obstruct the plotted data
  • translate data (in this case strings) into colour maps
  • generate a sensible range of point sizes for the data so levels are clearly distinguishable without obscuring other data

What about all the other types of plot?

There are quite a lot of other types of plots! – Photo by Lukas on pexels.com

A very valid question.

Unfortunately, it would be a bit too involved to consider all the available plots form every library. However, I believe on the whole the comparisons made in this article will be relevant regardless of the exact plot you would like to produce.

The only significant feature not considered in this article that I think would warrant further investigation / comparison is the ability to produce multiple plots on the same chart. However, this would take a whole new article to cover effectively. For now, this is considered outside of the scope of what will be discussed.

How to decide success?

Your circumstances will dictate whether the priority is concise coding (i.e. speed and efficiency), or the ability to have fine grained control.

I think both scenarios are important, and therefore the data will be plotted twice by each library.

  • The first plot (a simple plot) – will use the absolute minimum input, just enough to get the data plotted. The idea is to see how well the library is tuned to plot data on it's defaults, with minimum input from the user.
  • The second plot (a better plot) – will then be produced with the aim of improving the plot to be more effectively laid out, and provide a better presentation of the data. How much extra code and effort is required to generate a better plot? Is it even possible to achieve the end result you want using that particular library?

Based on these two tests, the aim is to illustrate how proficient each library is out of the box, and then how customisable it can be if required.

It should also allow a judgement to be made on which library might be worth the effort to learn as a great all rounder.

The Data

For the purposes of this article a completely synthetic dataset has been generated which compares the earnings of three different industries based on work experience and age. The data is tabulated, and has 100 entries. A sample is detailed below:

Data for the article – age [years], field, experience[years], earnings[USD] – Table by Author

Full access to the dataset in csv format is available on GitHub here:

notebooks/datasets/earnings_syn at main · thetestspecimen/notebooks

The data includes smaller numbers (ages), much larger numbers with quite a large range (earnings), and even strings (field). This should provide some degree of challenge to the different libraries in terms of scaling, and appropriate unit conversions (i.e. strings to colours, and point scaling).

Please note that in the upcoming code sections the tabulated data will be held in a Pandas DataFrame called "earnings":

earnings = pd.read_csv('earnings.csv')

Reference Notebooks

The full code to load the data, and produce all of the plots detailed in this article, are available in a Jupyter notebook here:

notebooks/matplotlib_article.ipynb at main · thetestspecimen/notebooks

If you prefer to dive right in then you can also open the notebook in Colab:

Matplotlib

As detailed in the "Plan" section earlier in the article, Matplotlib will be used as a baseline.

Background

Matplotlib has been around since 2003, which at the time of writing makes the library just over 20 years old!

It is fairly ubiquitous in the field of data science, and it would be fair to say that a lot of people have a love-hate relationship with it. It is very flexible, but can also get quite involved.

A quick recommendation

One of the major reasons for confusion (or straight up hate!), at least in my opinion, is that Matplotlib actually has two different interfaces:

  • the "pyplot" interface, which is intended to be simpler to use for quick plots
  • the "Axes" interface, which is much more flexible, and uses an Object Oriented Programming (OOP) approach

The "Axes" interface is actually pretty easy to use, and much more flexible. I would therefore encourage anyone using Matplotlib to just learn "Axes" interface (also sometimes referred to as an API [Application Programming Interface]).

For more on the differences between the two interfaces (or APIs), please see here:

API Reference – Matplotlib 3.8.2 documentation

A simple plot

The simple plot is intended to be the least amount of code to get a half decent plot made.

Here is the outcome with Matplotlib:

# matplotlib cannot deal with strings automatically, so we define a colour mapping here
colour_map = {'Management':'tab:orange', 'Creative':'tab:red', 'Technical':'tab:grey'}

plt.scatter(earnings['age'], earnings['experience'], c=earnings['field'].map(colour_map), s=earnings['earnings']/200)
plt.show()
A very basic Matplotlib plot – Image by Author

Looks pretty, but the reality is that it isn't very good.

There are no legends, no titles, no grid lines, and the colours needed to be mapped manually. Not only that, but the point sizes have had to be scaled down manually, otherwise the plot would be absolutely unreadable due to the enormous point sizes obscuring the whole plot.

Let's see what it takes to get something more acceptable.

A better plot

This time the "Axes" OOP interface (or API) will be used.

# matplotlib cannot handle text labels, so encode them as integers
field_encoder = LabelEncoder()
field_labels = field_encoder.fit_transform(earnings['field'])

# create the figure
fig, ax = plt.subplots(figsize=(8,5), dpi=100, layout='constrained')
scatter = ax.scatter(earnings['age'], earnings['experience'], c=field_labels, s=earnings['earnings']/200, alpha=0.7, edgecolors='none', cmap="Set1")
ax.set_title('Earnings')
ax.set_xlabel('Age [Years]')
ax.set_ylabel('Experience [Years]')
ax.set_xlim([10, 70])
ax.set_ylim([-5, 55])
ax.grid(visible=True, which='major', axis='both')

# COLOUR LEGEND
# Extract the legend elements automatically generated by matplotlib
# and convert the labels to the correct text labels
handles, labels = scatter.legend_elements()
label_ints = [int(''.join(i for i in x if i.isdigit())) for x in labels]
labels = field_encoder.inverse_transform(label_ints).tolist()
colour_legend = ax.legend(handles=handles, labels=labels, loc="upper left", title="Field", bbox_to_anchor=(0, 0.85))
ax.add_artist(colour_legend)

# EARNINGS LEGEND
entries = 6
kw = {"prop":"sizes", "num":entries, "color":scatter.cmap(0.9), "fmt":"${x:2g}k", "func":lambda s: s*200/1000}
earnings_legend = ax.legend(*scatter.legend_elements(**kw), loc="upper left", mode="expand", ncols=entries, borderpad=1.7, labelspacing=2, handletextpad=1.2)

plt.show()
A clearer Matplotlib scatterplot – Image by Author

That is a significant amount of code for one plot, and the truth is some of it is fairly obscure, and can take some real digging in the docs.

In particular generating both the colour and point size legends took an excessive amount of gymnastics, for a rather unspectacular result.

  • A whole section of code to make sure the colour legend has the correct labels, and is placed sensibly
  • Another section of code to create the point size legend. Including slightly obscure requirements for reversing the point scaling to ensure consistent scaling between the legend points and the plotted data points

The legend point scaling was also somewhat opaque in terms of guidance in the official documentation, which didn't help a great deal.

This plot will serve as a comparison point as we dive into the other libraries. It is not a requirement to match it exactly, but a relative level of clarity is required.

Seaborn

Seaborn is generally considered an excellent library for generating pretty statistical plots quickly and efficiently.

As per the official website:

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

seaborn.pydata.org

As stated in the quote above, it is based on Matplotlib, which has some interesting implications.

A simple plot

sns.scatterplot(x="age", y="experience", hue="field", size="earnings", palette="tab10", data=earnings)
plt.show()
A simple scatterplot created by seaborn – Image by Author

That is basically one line to generate the plot. Impressive!

However, there are obvious problems here.

  • the legend actually covers up some of the data
  • the auto-selection of point sizes is not great. It is reasonably difficult to see the difference between the extremes at 300000 and 50000, but the difference between two adjacent levels (50000 vs 100000) is almost impossible to discern
  • no grid lines make the plot difficult to read

A better plot

As seaborn is based on Matplotlib, when it is required to start configuring the plot more precisely, it will basically require the same syntax and adjustments as you find in Matplotlib.

To illustrate this, these two block of code produce exactly the same plot:

fig, ax = plt.subplots()
ax.set_title('Earnings')
ax.set_xlabel('Age [Years]')
ax.set_ylabel('Experience [Years]')
ax.set_xlim([10, 70])
ax.set_ylim([-5, 55])
ax.grid(visible=True, which='major', axis='both')

sns.scatterplot(x="age", y="experience", hue="field", size="earnings", sizes=(50, 1500), alpha=0.7, palette="tab10", data=earnings, ax=ax)
plt.show()
s_plt = sns.scatterplot(x="age", y="experience", hue="field", size="earnings", sizes=(50, 1500), alpha=0.7, palette="tab10", data=earnings)

s_plt.set_title('Earnings')
s_plt.set_xlabel('Age [Years]')
s_plt.set_ylabel('Experience [Years]')
s_plt.set_xlim([10, 70])
s_plt.set_ylim([-5, 55])
s_plt.grid(visible=True, which='major', axis='both')
plt.show()

The first sets up a plot in Matplotlib and then plots "on it" using seaborn. The second produces the plot in seaborn with adjustments using Matplotlib syntax applied afterwards.

The result is identical:

Various methods can create the same plot in seaborn – Image by Author

…and with a little further customisation the plot can be spaced out a little using Matplotlib syntax:

fig, ax = plt.subplots(figsize=(8,5), dpi=100, layout='constrained')
ax.set_title('Earnings')
ax.set_xlabel('Age [Years]')
ax.set_ylabel('Experience [Years]')
ax.set_xlim([10, 70])
ax.set_ylim([-5, 55])
ax.grid(visible=True, which='major', axis='both')

sns.scatterplot(x="age", y="experience", hue="field", size="earnings", sizes=(50, 1500), alpha=0.7, palette="tab10", data=earnings, ax=ax)
plt.show()
The final seaborn scatterplot – Image by Author

As seaborn is based on Matplotlib, it is possible to tailor the plots using the exact same methods as used in Matplotlib. However, due to way some items are rendered it may not be as easy to fully tailor the plot.

For example, in the plot above the legend is rendered as a single "legend item" (i.e. the ‘field' and ‘earnings' legends combined). I couldn't find an easy way to fully tailor the legend to avoid the overlap of the earnings circles. (The overlap is caused by scaling the point sizes up to a more appropriate size on the main graph.)

I'm not saying it is impossible, as I am aware of how to do this in a pure Matplotlib plot with separate legends. However, it certainly isn't obvious how a more precise customisation of the combined legend could be achieved in this instance. (If someone has the solution, please let me know! I'm sure there must be one.)

Regardless, due to the reliance on Matplotlib the solution is likely to be quite involved, which is not ideal.

More crossover from Matplotlib

There are other areas where crossover from Matplotlib is less than ideal.

Essentially, to effectively configure seaborn plots it is required to have some proficiency in Matplotlib.

For example, it is possible to produce a scatterplot using "relplot", which is another plot type available in seaborn.

However, the methods for even simple things, like adding a title, are not the same as those in "scatterplot".

relplot = sns.relplot(x="age", y="experience", hue="field", size="earnings", kind='scatter', sizes=(50, 1500), alpha=0.7, palette="tab10", height=5, aspect=1.6, data=earnings)
relplot.fig.suptitle('Earnings') # this line uses a different method to 'scatterplot'
relplot.set_xlabels('Age [Years]')
relplot.set_ylabels('Experience [Years]')
plt.show()
A scatterplot showing the use of the "relplot" method – Image by Author

This is basically a hangover from Matplotlib's structure. Relplot is "Figure level", and scatterplot is "Axis level". I won't bore you with the detail of exactly what that means, but suffice to say you have to interact with them differently, which can get quite confusing.

There is slightly more customisation in the native method for relplot than there is for scatterplot (namely, plot height and aspect ratio in this particular instance), which I suppose is the general idea of making it available as a plot type. However, I think the confusion it generates subtracts from the overall experience of using the seaborn library.

In essence, to effectively configure seaborn plots, it is required to have some proficiency in Matplotlib.

All in all, in my opinion, although a simple plot is much easier to generate, customisation can be more confusing, and maybe even more difficult / frustrating, than working directly in Matplotlib.

plotnine

plotnine is based on the library ggplot2, which is a plotting library based in the R language. The library uses the "Grammar of Graphics" philosophy.

Grammar of Graphics – a general scheme for data visualization which breaks up graphs into semantic components such as scales and layers.

wikipedia.org

Ggplot2 has an excellent reputation (both within the R community, and beyond) of producing elegant plots with simple intuitive inputs. plotnine is also an excellent and faithful Python implementation of the R original.

…and guess what the plotting backend of plotnine utilises. Yeah, Matplotlib!

However, even though the backend is based on Matplotlib (just like seaborn), the coding syntax is completely different than either Matplotlib or seaborn.

A simple plot

Let's see the first simple example to see how different the syntax really is:

(ggplot(earnings)
 + aes(x='age', y='experience', color='field', size='earnings')
 + geom_point()
)
The simple scatterplot by plotnine is actually quite accomplished – Image by Author

Incredibly simple and intuitive inputs (data + plot type). Almost like building blocks.

The result is also by far and away the most impressive so far considering the very minimal code input. The plot is:

  • clear and well scaled (including the x-axis not extending to zero)
  • axis names and legend names are auto generated
  • automatically added major and minor grids
  • legends are clear and well positioned

The only negative is potentially the ‘earnings' legend could have a few more levels, and the point sizes could be slightly larger. However, these really are minor nit-picks.

A better plot

Considering the excellent starting point, let's see how easy it is to tailor the graph.

(ggplot(earnings)
 + aes(x='age', y='experience', color='field', size='earnings')
 + geom_point(alpha=0.7)
 + scale_x_continuous(limits = (20, 65))
 + scale_y_continuous(limits = (0, 40))
 + labs(title='Earnings', x='Age [Years]', y='Experience [Years]', color='Field', size='Earnings')
 + scale_color_brewer(type="qual", palette="Set1")
 + theme_light()
 + theme(title = element_text(hjust = 0.5), figure_size = (8, 5), legend_key = element_rect(color = "White"))
 + scale_size_continuous(breaks = (50000, 100000, 150000, 200000, 250000, 300000),
                         labels = ("$50k","$100k","$150k","$200k","$250k","$300k"),
                         range = (2,22),
                         limits = (8000,310000))
)
A more refined plot generated by plotnine – Image by Author

Although the code required seems to be significantly more than than the simple version, in reality it is logical and very easy to follow. Just "add" the element changes that are required. A simple modular approach that I think suits plot building very well.

A simple modular approach that I think suits plot building very well.

The result is arguably the best looking plot yet. No overlapping legend points like seaborn, even though the point sizes were increased, and it certainly doesn't require the coding gymnastics required in native Matplotlib.

That probably explains the well deserved reputation that the ggplot2 and it's philosophy of "Grammar of Graphics" has.

The only thing I can think to mention as a negative is that if you really wanted do some very specific layout changes, I think you would eventually hit a limit of what is possible. For example, could I manage to move the legends to be laid out exactly as I have them in the Matplotlib example. I seriously doubt it.

Ultimately, plotnine cannot, by definition, be more flexible than Matplotlib, as it is based on it.

Vega-Altair

Vega-Altair is a great library for simple intuitive plotting, and one of the few that are not based on Matplotlib.

This is also the first library in this article that has the added advantage of being interactive, if you need it to be.

A simple plot

alt.Chart(earnings).mark_circle().encode(
    x='age',
    y='experience',
    color='field',
    size='earnings',
)
A simple scatterplot output by Vega-Altair – Image by Author

An impressive start. I would say this is roughly on par with plotnine:

  • the initial input required is just as terse as plotnine
  • the number of auto-generated legend entries is better than plotnine (i.e. there are more levels, which I think is more appropriate)
  • the range of point scales is on par with (or slightly better?) than plotnine (i.e. small to big is a wider range, and easier to differentiate), and definitely an improvement over seaborn
  • auto generated axis and legend titles

It does lack appropriate x-axis scaling (i.e. it unnecessarily goes to zero), and in general the point sizing could be increased overall.

All things considered I think plotnine does a better job for the simple graph, but it is close.

A better plot

Let's step things up a bit.

alt.Chart(earnings, title="Earnings").mark_circle().encode(
    alt.X('age', axis=alt.Axis(title='Age [Years]', tickMinStep = 5, tickCount=11), scale=alt.Scale(domain=(15,70))),
    alt.Y('experience', axis=alt.Axis(title='Experience [Years]', tickMinStep = 5), scale=alt.Scale(domain=(-5,45))),
    color='field',
    size=alt.Size('earnings', scale=alt.Scale(range=[50, 3000], domain=[8000, 308000])),
    tooltip = [alt.Tooltip('earnings')]
).properties(
    width=700,
    height=400
)
A more refined scatterplot generated by Vega-Altair – Image by Author

A different approach than plotnine, but comparatively compact code all the same. Adding a little more customisation is also relatively easy and intuitive, and adding an interactive element is straight forward:

tooltip = [alt.Tooltip('earnings')]
An example of the interactive elements of Vega-Altair – Video by Author

However, there were a few quirks that I think are a negative.

For example, setting "tickMinStep=5" causes the grid lines to be out of sync with the ticks. The only solution is to explicitly set the "tickCount" parameter. This seems an unnecessary requirement, and should be automatic.

I also see no easy way to customise the legend label text for the earnings legend. Although this doesn't really matter a great deal in this case, it would be a good feature to have. (Again if anybody knows how this is achieved please let me know!)

Overall it is a very impressive library, and should definitely be a consideration, especially if interactivity is a requirement.

Plotly

I don't really consider Plotly to be a serious consideration as far as static plotting is concerned, as it is predominantly meant for interactivity, and building dashboards. However, it is a very popular library, so I thought I would at least mention it.

A simple plot

You can quite easily produce a simple plot.

fig = px.scatter(earnings, x="age", y="experience", color="field",
                 size='earnings',hover_data=['earnings'])
fig.show()
A basic scatterplot generated by Plotly – Image by Author

Very respectable as a simple chart. Missing a legend for the point scales, but other than that easily as good as any other library that has been mentioned. With added interactivity:

Interactivity with Plotly is quite extensive, as can be seen by the data available using tooltips – Video by Author

However, at least in my experience, significant customisations to the appearance of the plot is generally complicated, and quite involved. Documentation in this area is also lacking in my opinion.

I don't really see this as a failing of the library, as that isn't what it was designed, or created, for. If you need to generate interactive plots or dashboards it should definitely be on your list.

Discussion

Photo by fauxels on pexels.com

So, is Matplotlib still the best library for static plots?

As with many things, this depends entirely on your requirements. If you have very specific needs, or like to be able to precisely configure every element of your plot, then I would argue Matplotlib is still far and away the single best library available for plotting in the world of Python.

Matplotlib is built in such a way that it is seemingly extensible to the extreme, as long as you are willing to delve deep into an extensive set of documentation. This extensibility is likely in part down to the object oriented approach that has been adopted, a tried and tested method in the software development industry.

It also speaks volumes that on this list there are two other libraries that are based on Matplotlib. Suggesting it really is a very solid and reliable library.

However, more often than not in the field of data science, such exacting standards are not always necessary. As long as the result is clear and interpretable, and maybe somewhat pretty.

I would therefore argue that on the whole, it is better to consider one of the other libraries featured in this article, as they all reach acceptable levels of precision without resorting to coding gymnastics.

Seaborn

Seaborn scatterplot – Image by Author

A great option if you are already familiar with Matplotlib, and you want pretty and easy to produce plots for basic circumstances.

However, if you have no knowledge of Matplotlib at all, then maybe give this one a miss if you require even slight customisations…or learn Matplotlib first!

A great sister library to Matplotlib, but certainly not a replacement.

plotnine

plotnine scatterplot – Image by Author

In my opinion, the ultimate winner in terms of fast, accurate, pretty and simple to construct static plots.

The syntax may be different than anything you have used before (unless you are porting over from R), but it is so intuitive that I don't think it really matters.

Unless you need the extreme levels of customisation offered by Matplotlib, it is a solid choice for most people.

Vega-Altair

Vega-Altair scatterplot – Image by Author

A close second to plotnine. On par in terms of usability and simplicity, but had some unnecessary quirks that set it back a little.

However, if some interactivity is required, this would be a solid choice as an accomplished all rounder to have in your arsenal.

Plotly

Plotly scatterplot with interactivity – Image by Author

As previously mentioned, I think Plotly is out of the running for static plotting, but a serious consideration if interactivity and dashboards are required.

Conclusion

Matplotlib scatterplot – Image by Author

Well there you have it.

If you need a static plotting library I think it will be hard to beat plotnine. Simple, readable code, pretty plots, and feature rich.

That is unless you need the ultimate in extensibility, in which case Matplotlib is indeed still the master!


A final note

I'm aware there are other plotting libraries out there. If you think that there might be a contender in the static plots realm I missed, please let me know which one and why. I'm always happy to pick up new tools.

Tags: Data Visualization Deep Dives Matplotlib Plotting Visualization

Comment