Make a Punchcard Plot with Seaborn

A punchcard plot, also called a table bubble chart, is a type of visualization for highlighting cyclical trends in data. It displays data in a rigid matrix or grid format, usually composed of days of the week versus hours of the day. Circles represent data points at the intersections of the rows and columns and their size conveys the data value. Color can be used to include additional information.

The name "punchcard" is an allusion to old-timey "timecards" that workers would stamp or "punch" in a machine to record their comings and goings.
To build a punchcard plot, you need timestamped data. In this Quick Success Data Science project, we'll use a Kaggle dataset to track the times when bicycles are rented in Washington, D.C.
The Dataset
The Kaggle Bike Sharing in Washington D.C. Dataset contains the hourly and daily count of bikes rented in 2011 and 2012 in the Capital bikeshare system in Washington, D.C. [1]. This data is released under a CC0 1.0 license. For details about the dataset contents visit the readme file.
For convenience, I've already downloaded this data to a public Gist.
Installing Libraries
Besides Python, you'll need the pandas data analysis library and the seaborn plotting library. You can install them with:
conda install pandas seaborn
or
pip install pandas seaborn
The Code
The following commented code was written in JupyterLab and is described by cell.
Importing Libraries and Loading the Data
After importing matplotlib and seaborn for plotting and pandas for data analysis, we'll read the CSV file of rental data into a pandas DataFrame, keeping only the columns for the season of the year, the weekday, the hour, and the count (number of rentals).
The weekdays are stored numerically (starting with 0 for Sunday). For readability, we'll map these to the name of the day and make a new column, called "Day" to hold the names. Also, for readability, we'll rename the "hrs" and "cnt" columns, capitalizing "Hours" as it will be used for a figure label.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
# Load the data:
df = pd.read_csv('https://bit.ly/3raML6h',
usecols=['season', 'weekday', 'hr', 'cnt'])
# Define a dictionary to map numerical values to day names
day_mapping = {0: 'Sunday', 1: 'Monday',
2: 'Tuesday', 3: 'Wednesday',
4: 'Thursday', 5: 'Friday',
6: 'Saturday'}
# Map weekday numbers to names and create a new column:
df['Day'] = df['weekday'].map(day_mapping)
df = df.rename(columns={'hr': 'Hour', 'cnt': 'count'})
df.head(3)

Creating a DataFrame for the Summer Months
For this analysis, we'll focus on the summer season, which will provide good examples of both the recreational and business use of bicycles. The seasons are labeled numerically in the dataset, with summer labeled 2.
First, we'll make a new DataFrame with just the summer data, then use pandas' groupby()
and sum()
methods to aggregate the rentals by day and by hour. Since there are 7 days and 24 hours in a day, this new DataFrame will have 168 rows.
# Create a new DataFrame for the summer season:
df_summer = (df[(df['season'] == 2)].copy()
.groupby(['Day', 'Hour'])['count']
.sum().reset_index())
Making the Punchcard Plot
To make the punchcard plot we'll use seaborn's scatterplot()
method. The size of the circular markers is controlled by the "count" column. You can play with the sizes
and the figsize
arguments to find combinations that are aesthetically pleasing.
While we have the option of converting the hours to datetime, so that "10" becomes "10:00," I feel this clutters the x-axis without adding much value.
In addition, the days of the week are plotted in alphabetical order. I prefer this as it keeps the weekend days grouped together, rather than split out at the top and bottom of the plot. It also plots Friday adjacent to Monday, which is useful for easily comparing behaviors at the start and end of the work week.
fig, ax = plt.subplots(figsize=(8, 5))
ax = sns.scatterplot(data=df_summer,
x='Hour',
y='Day',
size='count',
color='k',
alpha=0.7,
sizes=(1, 300))
ax.legend(loc='right',
shadow=True,
bbox_to_anchor=(1.2, 0.8),
ncol=1)
ax.set_xticks(range(24))
ax.set_xticklabels(list(range(24)))
ax.set_title('Washington D.C. Summer Bike Share Rentals (2011-12)');
# Optional code to add a grid:
# sns.set_style('whitegrid')
# ax.grid(True)
# Optional code to save figure as an image:
# plt.savefig('file_name.png', bbox_inches='tight', dpi=600)

This is a cool-looking plot with several easy-to-distinguish patterns. First, renting behaviors on the weekend are notably different than during the work week. Second, the weekdays show little variation. Friday and Monday, which are conveniently adjacent, show very similar trends. Third, morning and evening rush hours are obvious during the work week.
We can make this plot even easier to read by highlighting events such as rush hours and the weekend.
Highlighting Rush Hour
According to the internet, rush hour in Washington, D.C. is between 6:00 and 9:00 a.m. in the mornings and 4:00 and 7:00 p.m. in the evening. To highlight these periods, in the previous code, boost the scatterplot()
method's alpha
argument to 1
and then add the following code to the bottom and rerun.
# Add shading for rush hour:
ax.text(x=6, y=0.6, s='Rush Hour', c='firebrick')
ax.axvspan(xmin=6, xmax=9, facecolor='red', alpha=0.3)
ax.text(x=16, y=0.6, s='Rush Hour', c='firebrick')
ax.axvspan(xmin=16, xmax=19, facecolor='red', alpha=0.3);

Highlighting the Weekends
To highlight the weekend days, add the code below and replot the figure.
# Add shading for weekend:
ax.axhspan(ymin='Sunday', ymax='Saturday', fc='grey', alpha=0.3)
ax.text(x=1, y=2.6, s='Weekend', c='k');

While this plot is useful, it's difficult to tell which days had the highest and lowest rental rates. To check this, we can make a new DataFrame for the summer, aggregating only on days.
# Create a new dataframe for summer season:
df_summer_days = (df[(df['season'] == 2)].copy()
.groupby(['Day'])['count'].sum().reset_index())
df_summer_days = df_summer_days.sort_values('count')
print(df_summer_days)

Now we can quantitatively judge the differences between the days.
sns.barplot(data=df_summer_days, x='Day', y='count', color='grey');

Interpreting and Using the Data
An interesting thing to note in the punchcard plot is that more bikes are rented during the afternoon rush hour than during the morning rush hour. These additional riders must represent people running errands or riding for recreation.
During the work week, the best time to perform maintenance on bikes would be between 9:00 a.m. and 4:00 p.m., when commuters are at work. And to increase revenue, you'd want to do an analysis on lowering rental costs during off-peak hours to incentivize bike utilization.
Looking at the "summer days" DataFrame, we can see that bikes are rented the most on Saturdays, with a drop-off on Sundays. This presents an opportunity to increase bike use on Sundays through promotions or price cuts. Likewise, bike rentals steadily increase during the work week, again suggesting the need to incentivize the early part of the week.
Summary
Punchcard plots represent an interesting way to visualize timestamped data. Just as traditional timecards let you monitor employee work habits, punchcard plots help you detect cyclical trends with just a glance.
In this project, we used seaborn's scatterplot()
method to make a punchcard plot. A nice thing about seaborn is that it's built on matplotlib and can take advantage of matplotlib's advanced customization options. In this case, we assisted the data analysis process by using shading and text annotation to highlight important time periods such as weekends and rush hours.
Citations
- Fanaee-T, Hadi, and Gama, Joao, "Event labeling combining ensemble detectors and background knowledge", Progress in Artificial Intelligence (2013): pp. 1–15, Springer Berlin Heidelberg, doi:10.1007/s13748–013–0040–3.
Thanks!
Thanks for reading and please follow me for more Quick Success Data Science projects in the future.