5 Steps to Build Beautiful Bar Charts with Python

Motivation
Telling a compelling story with data gets way easier when the charts supporting this very story are clear, self-explanatory and visually pleasing to the audience.
In many cases, substance and form are just equally important. Great data poorly presented will not catch the attention it deserves while poor data presented in a slick way will easily be discredited.
Matplotlib makes it quick and easy to plot data with off-the-shelf functions but the fine tuning steps take more effort. I spent quite some time researching best practices to build compelling charts with Matplotlib, so you don't have to.
In this article I focus on bar charts and explain how I stitched together the bits of knowledge I found here and there to go from this…

… to that:

0 The Data
To illustrate the methodology, I used a public dataset about Airlines delay in US domestic flights in 2008:
2008, "Data Expo 2009: Airline on time data", https://doi.org/10.7910/DVN/HG7NV7, Harvard Dataverse, V1 Public domain CC0 1.0
After importing the necessary packages to read the data and build our graphs, I simply grouped the data by months and calculated the mean delay, using the code below:
import pandas as pd
import Matplotlib.pyplot as plt
import matplotlib as mpl
from matplotlib.ticker import MaxNLocator
df = pd.read_csv('DelayedFlights.csv')
df = df[['Month', 'ArrDelay']] # Let's only keep the columns useful to us
df = df[~df['ArrDelay'].isnull()] # Get rid of cancelled and diverted flights
# Group by Month and get the mean
delay_by_month = df.groupby(['Month']).mean()['ArrDelay'].reset_index()
The dataset used throughout the article to build the different versions of the bar chart is as follows:

1 The Basic Plot
To be fair, with two lines of code you can already build a bar chart and get some basic insights from it. Admittedly this chart is not the most beautiful, nor is it the most useful as key information is lacking, but you can already tell that traveling in December will likely result in a delayed flight.
Python"># Create the figure and axes objects, specify the size and the dots per inches
fig, ax = plt.subplots(figsize=(13.33,7.5), dpi = 96)
# Plot bars
bar1 = ax.bar(delay_by_month['Month'], delay_by_month['ArrDelay'], width=0.6)

2 The Essentials
Let's add a few vital things to our chart to make it more readable to the audience.
-
Grids To improve its readability the grids of a graph are essential. Their transparency is set to 0.5 so they don't interfere too much with the data points.
-
X-axis and Y-axis reformatting I voluntarily added more parameters than necessary here to have a more comprehensive view of the fine tuning possibilities. For example, the x-axis did not need a major_formatter and a major_locator object as we are only setting up labels, but if the reader's x-axis consists of other figures, than this can come in handy.
-
Bar labels Bar labels are added on top of each bar to make the comparison between close data points easier and give more details about their actual values.
# Create the grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)
# Reformat x-axis label and tick labels
ax.set_xlabel('', fontsize=12, labelpad=10) # No need for an axis label
ax.xaxis.set_label_position("bottom")
ax.xaxis.set_major_formatter(lambda s, i : f'{s:,.0f}')
ax.xaxis.set_major_locator(MaxNLocator(integer=True))
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)
labels = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
ax.set_xticks(delay_by_month['Month'], labels) # Map integers numbers from the series to labels list
# Reformat y-axis
ax.set_ylabel('Delay (minutes)', fontsize=12, labelpad=10)
ax.yaxis.set_label_position("left")
ax.yaxis.set_major_formatter(lambda s, i : f'{s:,.0f}')
ax.yaxis.set_major_locator(MaxNLocator(integer=True))
ax.yaxis.set_tick_params(pad=2, labeltop=False, labelbottom=True, bottom=False, labelsize=12)
# Add label on top of each bar
ax.bar_label(bar1, labels=[f'{e:,.1f}' for e in delay_by_month['ArrDelay']], padding=3, color='black', fontsize=8)

3 The Professional Look
Adding a few more features to our graph will make it look way more professional. They will go on top of any graphs (not only bar charts) and are independent of the data we are using in this article. Thanks to the code snippet below, these adjustments will take little to no effort to implement. Author's advice: save it and re-use it at will. The reader can tweak them to create their own visual identity.
-
Spines The spines make up the box visible around the graph. They are removed, except for the right one which is set to be a bit thicker.
-
Red line and rectangle on top A red line and rectangle are added above the title to nicely isolate the graph from the text above it.
-
Title and subtitle What is a graph without a title to introduce it? The subtitle can be used to further explain the content or even present a first conclusion.
- SourceA must have, in all charts ever produced.
-
Margin adjustments The margins surrounding the graph area are adjusted to make sure all the space available is used.
-
White background Setting a white background (from transparent by default) will be useful when sending the chart via emails, Teams or any other tool, where a transparent background can be problematic.
# Remove the spines
ax.spines[['top','left','bottom']].set_visible(False)
# Make the left spine thicker
ax.spines['right'].set_linewidth(1.1)
# Add in red line and rectangle on top
ax.plot([0.12, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#E3120B', linewidth=.6)
ax.add_patch(plt.Rectangle((0.12,.98), 0.04, -0.02, facecolor='#E3120B', transform=fig.transFigure, clip_on=False, linewidth = 0))
# Add in title and subtitle
ax.text(x=0.12, y=.93, s="Average Airlines Delay per Month in 2008", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.12, y=.90, s="Difference in minutes between scheduled and actual arrival time averaged over each month", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)
# Set source text
ax.text(x=0.1, y=0.12, s="Source: Kaggle - Airlines Delay - https://www.kaggle.com/datasets/giovamata/airlinedelaycauses", transform=fig.transFigure, ha='left', fontsize=10, alpha=.7)
# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.85, wspace=None, hspace=None)
# Set a white background
fig.patch.set_facecolor('white')

4 The Color Gradient
The graph as we left it in the previous section is neat and tidy and ready to be included in a presentation. Playing on the color of the bars and adding a gradient to better visualize the variations is not essential but will add an appealing feature to it.
This use case does not necessarily have the best documentation online but is actually not too hard to implement with the LinearSegmentedColormap and Normalize functions of Matplotlib.
# Colours - Choose the extreme colours of the colour map
colours = ["#2196f3", "#bbdefb"]
# Colormap - Build the colour maps
cmap = mpl.colors.LinearSegmentedColormap.from_list("colour_map", colours, N=256)
norm = mpl.colors.Normalize(delay_by_month['ArrDelay'].min(), delay_by_month['ArrDelay'].max()) # linearly normalizes data into the [0.0, 1.0] interval
# Plot bars
bar1 = ax.bar(delay_by_month['Month'], delay_by_month['ArrDelay'], color=cmap(norm(delay_by_month['ArrDelay'])), width=0.6, zorder=2)

5 The Final Touch
To get to the end result, introduced at the beginning of the article, the only thing left to do is implementing these few extra components:
-
Average data line Showing the average data line on the graph is a useful way to help the audience quickly figure out what is going on.
-
Second color scale Thanks to a second color scale we highlight data above average (or any threshold) to make the visualization easier to grasp in a short amount of time.
-
Legend When we added a second color scale, we introduced the need for a legend on our chart.
# Find the average data point and split the series in 2
average = delay_by_month['ArrDelay'].mean()
below_average = delay_by_month[delay_by_month['ArrDelay']=average]
# Colours - Choose the extreme colours of the colour map
colors_high = ["#ff5a5f", "#c81d25"] # Extreme colours of the high scale
colors_low = ["#2196f3","#bbdefb"] # Extreme colours of the low scale
# Colormap - Build the colour maps
cmap_low = mpl.colors.LinearSegmentedColormap.from_list("low_map", colors_low, N=256)
cmap_high = mpl.colors.LinearSegmentedColormap.from_list("high_map", colors_high, N=256)
norm_low = mpl.colors.Normalize(below_average['ArrDelay'].min(), average) # linearly normalizes data into the [0.0, 1.0] interval
norm_high = mpl.colors.Normalize(average, above_average['ArrDelay'].max())
# Plot bars and average (horizontal) line
bar1 = ax.bar(below_average['Month'], below_average['ArrDelay'], color=cmap_low(norm_low(below_average['ArrDelay'])), width=0.6, label='Below Average', zorder=2)
bar2 = ax.bar(above_average['Month'], above_average['ArrDelay'], color=cmap_high(norm_high(above_average['ArrDelay'])), width=0.6, label='Above Average', zorder=2)
plt.axhline(y=average, color = 'grey', linewidth=3)
# Determine the y-limits of the plot
ymin, ymax = ax.get_ylim()
# Calculate a suitable y position for the text label
y_pos = average/ymax + 0.03
# Annotate the average line
ax.text(0.88, y_pos, f'Average = {average:.1f}', ha='right', va='center', transform=ax.transAxes, size=8, zorder=3)
# Add legend
ax.legend(loc="best", ncol=2, bbox_to_anchor=[1, 1.07], borderaxespad=0, frameon=False, fontsize=8)

6 Final Thoughts
The intent of this article was to share the knowledge gathered here and there to build a more compelling bar chart using Matplotlib. I tried to make it as practical as possible with re-usable code snippets.
I am sure there are other adjustments to be made that I did not think of. If you have any improvement ideas, feel free to comment and make this article more useful to all!
This article only focused on bar charts, stay tuned for more!
Thanks for reading all the way to the end of the article! Feel free to leave a message below, or reach out to me through LinkedIn if you have any questions / remarks!