Visualize Data Ranges with Matplotlib

Plotting discrete data is straightforward; representing ranges of data is more involved. Fortunately, Python's matplotlib library has a built-in function, fill_between()
, that lets you easily visualize data ranges. In this Quick Success Data Science project, we'll use it to benchmark the National Oceanic and Atmospheric Administration's annual hurricane outlook.
The Dataset
Every May, NOAA releases its "Atlantic Hurricane Outlook" report for the June-November hurricane season. These outlooks include predicted ranges for named storms, hurricanes, and major hurricanes (defined as Category 3 and higher). You can find an example report for 2021 here [1]. NOAA/National Weather Service data is provided by the US government as open data, free to use for any purpose.
In order to benchmark the accuracy of these forecasts, we'll use the annual hurricane season summaries provided by Wikipedia. These summaries provide the actual number of storms and hurricanes for each year. You can find the 2021 season entry here [2]. Wikipedia pages are provided under a CC BY-SA 4.0 license.
Wikipedia also includes lists for _La Niña and El Niño_ events [3][4]. These represent weather patterns that occur in the Pacific Ocean every few years. During La Niña years, the water in the eastern Pacific is colder than normal, cooling the air above it. The opposite occurs in El Niño years.
The La Niña pattern favors stronger hurricane activity in the Atlantic basin while El Niño suppresses hurricane development [5]. To check this, we'll also color-code our plot for these events.
For convenience, I've already compiled all this information for the years 2001–2022 and stored it as a CSV file in this Gist.
NOAA issues an updated hurricane forecast every August, so you need to take care when selecting data and referencing predictions. We'll be using the May outlooks.
Installing Libraries
We'll use pandas for data handling and matplotlib for plotting. Install them with either:
conda install matplotlib pandas
or
pip install matplotlib pandas
The Code
The following code was written in JupyterLab and is described by cell.
Importing Modules
Besides performing data analysis and plotting, we're going to make a custom marker to represent a hurricane. To do this, we'll need to import NumPy, Python's numerical analysis package, and a matplotlib module known as mpath
used for working with polylines.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.path as mpath
import pandas as pd
Loading the Dataset
The CSV file contains low and high values for both predicted hurricanes (H) and major hurricanes (MH). It also includes the actual number of hurricanes and major hurricanes, and whether or not the season fell in a La Niña or El Niño event. A transitional year is labeled as a "Weak Event."
df = pd.read_csv('https://bit.ly/44YgahT')
df.head(3)

Defining a Function to Draw a Hurricane Marker
While we could use a simple circle to post the actual number of hurricanes on our scatter plot, wouldn't a classic hurricane icon look so much better?
Unfortunately, matplotlib doesn't come with a hurricane marker. However, code to draw a hurricane marker was provided in a Stack Overflow answer which I've reproduced below (Stack Overflow content is cc-wiki licensed) [6].
This function uses matplotlib's mpath
module, which returns a [
object representing a series of line and curve segments. How this code works isn't important for this project, but if you want to see a detailed explanation, visit the Stack Overflow link at the start of the snippet.
# The following code was adapted from Stack Overflow:
# https://stackoverflow.com/questions/44726675/custom-markers-using-python-matplotlib
# Asked by: https://stackoverflow.com/users/5689281/kushal
# Answered by: https://stackoverflow.com/users/4124317/importanceofbeingernest
def get_hurricane_symbol():
"""Return a hurricane warning symbol as a matplotlib path."""
# Create a numpy array for the symbol's coordinates and codes:
coordinates = np.array([[2.444, 7.553],
[0.513, 7.046],
[-1.243, 5.433],
[-2.353, 2.975],
[-2.578, 0.092],
[-2.075, -1.795],
[-0.336, -2.870],
[2.609, -2.016]])
# Shift the x-coordinates:
coordinates[:, 0] -= 0.098
# Define path codes:
codes = [1] + [2] * (len(coordinates) - 2) + [2]
# Duplicate and reverse the coordinates:
coordinates = np.append(coordinates, -coordinates[::-1], axis=0)
# Duplicate the codes:
codes += codes
# Create and return the matplotlib path:
return mpath.Path(3 * coordinates, codes, closed=False)
Plotting Actual Hurricanes vs. Predicted Hurricanes
The code below uses the matplotlib [fill_between()](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.fill_between.html)
method to capture NOAA's predicted number of hurricanes for each year. It requires the DataFrame column name for the x
argument, a minimum value for y1
, and a maximum value for y2
. Adding a label
argument ensures that the range shading will be referenced in the legend.
# Call the function to build the hurricane marker:
symbol = get_hurricane_symbol()
# Initialize the figure:
plt.figure(figsize=(10, 4))
# Plot the actual number of hurricanes per year:
plt.plot(df.Year, df['Actual H'],
label='Actual Value',
marker=symbol,
markersize=17,
c='darkblue',
linestyle='None',
lw=1)
# Shade NOAA's predicted range of hurricanes for each year:
plt.fill_between(x=df.Year,
y1=df['Predicted H Low'],
y2=df['Predicted H High'],
alpha=0.3,
label='Predicted Range')
plt.xlabel('Year')
plt.ylabel('Number of Hurricanes')
plt.legend(loc='lower right')
plt.grid(True, c='lightgrey', alpha=0.5)
plt.title('Actual Number of Atlantic Hurricanes vs.
NOAA May Prediction (2001-2022)');
# Optional code to save the figure:
# plt.savefig('range_plot.png', bbox_inches='tight', dpi=600)

This simple yet elegant plot is filled with useful information. For instance, over the last 22 years, the actual number of hurricanes has landed within the predicted ranges 11 times. This is the same accuracy as flipping a coin. Lately, NOAA has started using wider ranges which increases accuracy but decreases precision.
Changing the Fill Style
I really like the previous fill style for the predicted range, but there are alternatives. In this example, we pass a step
argument to the fill_between()
method. Now, instead of a continuous polygon, we get discrete vertical bars.
plt.figure(figsize=(10, 4))
plt.plot(df.Year, df['Actual H'],
label='Actual Value',
marker=symbol,
markersize=17,
c='darkblue',
linestyle='None',
lw=1)
plt.fill_between(x=df.Year,
y1=df['Predicted H Low'],
y2=df['Predicted H High'],
step='mid',
alpha=0.3,
label='Predicted Range')
plt.xlabel('Year')
plt.ylabel('Number of Hurricanes')
plt.legend(loc='lower right')
plt.grid(True, c='lightgrey', alpha=0.5)
plt.title('Actual Number of Atlantic Hurricanes vs.
NOAA May Prediction (2001-2022)');

Adding El Niño and La Niña Events
To evaluate the impact of El Niño and La Niña events on the number and intensity of hurricanes, let's make use of the "Event" column of the DataFrame.
First, we need to make a dictionary that maps the event to a color. Since La Niña represents a cooling event, we'll use blue. El Niño warming events will be red, and weak events will be nondescript grey.
We'll add a separate custom legend for the events just beneath the figure title. Note the use of $u25CF$
to draw circles. This is a symbol from the handy STIX font collection.
# Plot the predicted ranges and color the actual values by event.
# Define a dictionary to map text colors to matplotlib colors:
color_mapping = {'Nina': 'blue',
'Nino': 'red',
'Weak Event': 'grey'}
# Map the Event column to colors. Use black if x not found:
df['colors_mapped'] = df['Event'].apply(lambda x: color_mapping.get(x, 'k'))
plt.figure(figsize=(10, 4))
plt.scatter(df.Year, df['Actual H'],
label='Actual Value',
marker=symbol,
s=300,
c=df.colors_mapped,
linestyle='None',
lw=1)
plt.fill_between(x=df.Year,
y1=df['Predicted H Low'],
y2=df['Predicted H High'],
alpha=0.3,
label='Predicted Range')
plt.xlabel('Year')
plt.ylabel('Number of Hurricanes')
plt.legend(loc='lower right')
plt.grid(True, c='lightgrey', alpha=0.5)
# Add event legend as title:
plt.suptitle('Actual Number of Atlantic Hurricanes vs. NOAA May Prediction (2001-2022)')
plt.figtext(0.4, 0.9, '$u25CF$ La Nina', fontsize='medium', c='b', ha ='right')
plt.figtext(0.5, 0.9, '$u25CF$ El Nino', fontsize='medium', c='r', ha ='center')
plt.figtext(0.6, 0.9, '$u25CF$ Weak Event', fontsize='medium', c='grey', ha ='left');

These results appear to support the theory that El Niño events suppress hurricane formation in the Atlantic, at least versus La Niña events. To see if they also impact hurricane intensity, let's plot the major hurricane data.
Plotting Major Hurricanes
Major hurricanes are defined as those rated Category 3 or higher. The following code updates the plot for these values.
plt.figure(figsize=(10, 4))
plt.scatter(df.Year, df['Actual MH'],
label='Actual Value',
marker=symbol, s=300,
c=df.colors_mapped,
linestyle='None',
lw=1)
plt.fill_between(x=df.Year,
y1=df['Predicted MH Low'],
y2=df['Predicted MH High'],
alpha=0.3,
label='Predicted Range')
plt.xlabel('Year')
plt.ylabel('Number of Major Hurricanes (Cat 3+)')
plt.legend(loc='lower right')
plt.grid(True, c='lightgrey', alpha=0.5)
# Add event legend as title:
plt.suptitle('Actual Number of Major Atlantic Hurricanes vs. NOAA May Prediction (2001-2022)')
plt.figtext(0.4, 0.9, '$u25CF$ La Nina', fontsize='medium', c='b', ha ='right')
plt.figtext(0.5, 0.9, '$u25CF$ El Nino', fontsize='medium', c='r', ha ='center')
plt.figtext(0.6, 0.9, '$u25CF$ Weak Event', fontsize='medium', c='grey', ha ='left');

With the exception of 2004, which some sources classify as a weak event, this chart supports the idea that hurricane formation is suppressed during El Niño events [7]. Forecast accuracy is also slightly better for major hurricanes, with 13 of 22 falling within the predicted range.
Drawing Ranges Using Vertical Lines
Another way to plot ranges is to use matplotlib's vlines()
method to draw vertical lines. This is an attractive alternative to the fill_between()
method, though it's more labor-intensive and doesn't automatically include the range in the legend.
# Redraw plot with vertical lines for ranges:
plt.figure(figsize=(10, 4))
# Use a scatter plot for actual values:
plt.scatter(df.index, df['Actual H'],
label='Actual Value',
marker=symbol,
c='darkblue',
s=350)
# Draw vertical lines for the predicted ranges:
for i, row in df.iterrows():
plt.vlines(x=i,
ymin=row['Predicted H Low'],
ymax=row['Predicted H High'],
alpha=0.4,
lw=6,
zorder=0)
x = range(len(df))
plt.xticks(x, df.Year, rotation=90)
plt.xlabel('Year')
plt.ylabel('Number of Hurricanes')
plt.legend(loc='lower right')
plt.grid(True, color='lightgray', alpha=0.5)
plt.title('Actual Number of Atlantic Hurricanes vs. NOAA May Prediction');

Evaluating the Atlantic Multidecadal Oscillation
We've now covered the fill_between()
method, but since we've got all this data at hand, let's take a moment to examine an interesting theory on hurricane formation involving the Atlantic Multidecadal Oscillation (AMO) [8].
The AMO is a feature defined by decades-long variability in North Atlantic Sea surface temperatures. Little is known about the AMO; it may represent a persistent periodic climate driver or just a transient feature [9].
The AMO index is calculated by subtracting the global mean sea surface temperature (SST) anomalies from the North Atlantic SST anomalies [9]. When the AMO index is high, sea surface temperatures are warmer than usual, potentially contributing to increased hurricane activity and intensity.
Because this is a long-wavelength phenomenon, we'll need a database that counts hurricanes back to 1920 or so. I've already recorded Wikipedia's hurricane list for this timeframe and stored it at this Gist.
It should be noted that storm counts before the use of airplanes (in the mid-1940s) and satellite data (in the mid-1960s) are less reliable. For example, count estimates between 1886 and 1910 are believed to have an undercount bias of zero to four storms per year [10].
In the following plot, the AMO index boundaries are taken from Wikipedia and NOAA [8][11].
# Load the 1920-2022 hurricane dataset:
df = pd.read_csv('https://bit.ly/3sZnvQX')
# Plot major hurricanes per year with regression line and AMO shading:
plt.figure(figsize=(10, 4))
plt.plot(df.Year, df.MH,
label='Actual Value',
marker=symbol,
markersize=17,
c='darkblue',
linestyle='None',
lw=1)
plt.xlabel('Year')
plt.xticks(range(1920, 2021, 10))
plt.ylabel('Number of Major Hurricanes (Cat 3+)')
plt.grid(True, c='lightgrey', alpha=0.5)
plt.title('Number of Major Atlantic Hurricanes by Year 1920-2022',
fontsize=18)
# Add a shaded span for AMO highs:
plt.text(1940, 6.5, 'AMO High', c='firebrick')
plt.axvspan(1926, 1964,
color='red',
alpha=0.2)
plt.text(2005, 6.5, 'AMO High', c='firebrick')
plt.axvspan(1995, 2022,
color='red',
alpha=0.2)
# Calculate m (slope) and b (intercept) of linear regression line:
m, b = np.polyfit(df.Year, df.MH, 1)
# Add linear regression line to plot:
plt.plot(df.Year, m*df.Year+b, c='darkblue', ls=':');

Here's the same data presented as a bar chart:

And here's the scatterplot for all Atlantic hurricanes over this time period. The AMO effect is less obvious for the frequency of storms.

Although scientists recognize the apparent relationship between the AMO index and the number of major hurricanes, there's not enough data at present to draw firm conclusions. As you might expect, the most popular explanation for the increase in major hurricanes in the most recent AMO high is anthropogenic Climate Change.
Summary
The matplotlib fill_between()
method is a handy way to display a range of values on a plot. In this project, we used it to show NOAA's annual Hurricane forecasts versus the actual outcomes. In addition, we used matplotlib's mpath
module to draw a custom marker to represent hurricanes. The result was an attractive and easy-to-parse infographic.
We also added El Niño, La Niña, and AMO events to our plots. The results supported established observations that El Niño seems to suppress Atlantic hurricanes, and high AMO index events seem to promote them.
Citations
- Climate Prediction Center Internet Team, 2001, "NOAA 2021 Atlantic Hurricane Season Outlook," Climate Prediction Center – Atlantic Hurricane Outlook (noaa.gov)
- Wikipedia contributors, "2021 Atlantic hurricane season," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=2021_Atlantic_hurricane_season&oldid=1175731221 (accessed September 19, 2023).
- Wikipedia contributors, "El Niño," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=El_Ni%C3%B1o&oldid=1174548902 (accessed September 19, 2023).
- Wikipedia contributors, "La Niña," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=La_Ni%C3%B1a&oldid=1174382856 (accessed September 19, 2023).
- Bell, Gerry, 2014, "Impacts of El Niño and La Niña on the hurricane season," NOAA Climate.gov, Impacts of El Niño and La Niña on the hurricane season | NOAA Climate.gov.
- ImportanceOfBeingErnest, "Custom Markers using Matplotlib," Stack Overflow, June 24, 2017, Custom markers using Python (matplotlib) – Stack Overflow (accessed September 19, 2023).
- Null, Jan, 2023, "El Niño and La Niña Years and Intensities," Golden Gate Weather Services, El Niño and La Niña Years and Intensities (ggweather.com) (accessed September, 19, 2023).
- Wikipedia contributors, "Atlantic multidecadal oscillation," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Atlantic_multidecadal_oscillation&oldid=1175329341 (accessed September 19, 2023).
- Knudsen, M., Seidenkrantz, MS., Jacobsen, B. et al., "Tracking the Atlantic Multidecadal Oscillation through the last 8,000 years," Nat Commun 2, 178 (2011). https://doi.org/10.1038/ncomms1186.
- Wikipedia contributors, "List of Atlantic hurricane records," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=List_of_Atlantic_hurricane_records&oldid=1168214070 (accessed September 19, 2023).
- NOAA, 2017, "Atlantic Multidecadal Oscillation Low-Frequency Climate Mode," Atlantic Oceanographic and Meteorological Laboratory, Gulf of Mexico ESR (noaa.gov).
Thanks!
Thanks for reading. My goal is to help you hone your Python skills and have fun doing it. Follow me for more Quick Success Data Science projects in the future.