I've been meaning to learn D3 for a while. To be honest, D3 has always been an overkill for the types of problems I've worked on (where visualizing the data was just a means to an end, not the final product itself). As a Python developer I often use tools like matplotlib, plotly, seaborn, pandas (or geopandas), and bokeh to "get the job done". Recently, however, I've been spending time creating data visualizations just for fun and it seems like the perfect time to start learning D3.
In this article I'll show you how I created a graphic like the one above for 5 peaks (Everest, Ama Dablam, Cho Oyu, Lhotse, and Manaslu) using Python, D3, and Illustrator. I will go over:
Inspiration.
Getting the data.
Initial data preparation.
Selecting 5 peaks to visualize.
Preparing the data for plotting.
Creating an SVG with D3.
Saving the SVG and importing into Illustrator.
Working with the SVG in Illustrator.
Adding final touches.
Lessons learned.
1. Inspiration
This visualization was inspired by "Gisa's Timeline" created by Barbara Rebolledo. I was looking for a nonstandard way to visualize the number of deaths during Himalayan expeditions and thought Barbara's timeline looked interesting (and it provided the perfect excuse to use D3 since creating something like that in Python would've been a nightmare).
The Himalayan Database is a compilation of records for all expeditions that have climbed in the Nepal Himalaya.
Specifically, I extracted information on Himalayan expeditions by following the instructions on the Himalayan Database website. The dataset is a small CSV file (with a little under 11,200 rows) containing expedition records. Here are the first 5 rows:
expid peakid year season host route1 route2 route3 route4 nation leaders sponsor success1 success2 success3 success4 ascent1 ascent2 ascent3 ascent4 claimed disputed countries approach bcdate smtdate smttime smtdays totdays termdate termreason termnote highpoint traverse ski parapente camps rope totmembers smtmembers mdeaths tothired smthired hdeaths nohired o2used o2none o2climb o2descent o2sleep o2medical o2taken o2unkwn othersmts campsites accidents achievment agency comrte stdrte primrte primmem primref primid chksum
0 ANN260101 ANN2 1960 1 1 NW Ridge-W Ridge NaN NaN NaN UK J. O. M. Roberts NaN True False False False 1st NaN NaN NaN False False India, Nepal Marshyangdi->Hongde->Sabje Khola 3/15/60 5/17/60 1530.0 63 0 - - 1 NaN 7937 False False False 6 0 10 2 0 9 1 0 False True False True False True False False False Climbed Annapurna IV (ANN4-601-01) BC(15/03,3350m),ABC(4575m),C1(5365m),C2(5800m)... NaN NaN NaN False False False False False NaN 2442047
1 ANN269301 ANN2 1969 3 1 NW Ridge-W Ridge NaN NaN NaN Yugoslavia Ales Kunaver Mountaineering Club of Slovenia True False False False 2nd NaN NaN NaN False False NaN Marshyangdi->Hongde->Sabje Khola 9/25/69 10/22/69 1800.0 27 31 10/26/69 1 NaN 7937 False False False 6 0 10 2 0 0 0 0 False False True False False False False False False Climbed Annapurna IV (ANN4-693-02) LowBC(25/09,3950m),BC(27/09,4650m),C1(27/09,53... Draslar frostbitten hands and feet NaN NaN False False False False False NaN 2445501
2 ANN273101 ANN2 1973 1 1 W Ridge-N Face NaN NaN NaN Japan Yukio Shimamura Sangaku Doshikai Annapurna II Expedition 1973 True False False False 3rd NaN NaN NaN False False NaN Marshyangdi->Pisang->Salatang Khola 3/16/73 5/6/73 2030.0 51 0 - - 1 NaN 7937 False False False 5 0 6 1 0 8 0 0 False False True False False False False False False NaN BC(16/03,3300m),C1(21/03,4200m),C2(10/04,5000m... NaN NaN NaN False False False False False NaN 2446797
3 ANN278301 ANN2 1978 3 1 N Face-W Ridge NaN NaN NaN UK Richard J. Isherwood British Annapurna II Expedition False False False False NaN NaN NaN NaN False False NaN Marshyangdi->Pisang->Salatang Khola 9/8/78 10/2/78 NaN 24 27 10/5/78 4 Abandoned at 7000m (on A-IV) due to bad weather 7000 False False False 0 0 2 0 0 0 0 0 True False True False False False False False False NaN BC(08/09,5190m),xxx(02/10,7000m) NaN NaN NaN False False False False False NaN 2448822
4 ANN279301 ANN2 1979 3 1 N Face-W Ridge NW Ridge of A-IV NaN NaN UK Paul Moores NaN False False False False NaN NaN NaN NaN False False NaN Pokhara->Marshyangdi->Pisang->Sabje Khola - - 10/18/79 NaN 0 0 10/20/79 4 Abandoned at 7160m due to high winds 7160 False False False 0 0 3 0 0 0 0 0 True False True False False False False False False NaN BC(3500m),ABC,Biv1,Biv2,Biv3,Biv4,Biv5,xxx(18/... NaN NaN NaN False False False False False NaN 2449204
3. Initial Data Preparation
I wanted to show the evolution of the number of deaths over time and, if possible, I also wanted to add information on summit success rate and death rate (these ended up being just small dots in the final visualization). I decided to focus on the following columns:
year – When did the expedition take place?
expid— The expedition ID (unique when combined with year).
peakid – Peak ID.
totmembers + tothired – The number of members in the expedition.
smtmembers + smthired – The number of members that summited.
mdeaths + hdeaths – The number of members that passed away.
Including expid is useful because it helps us get additional information about an expedition whenever there is needed for clarification. For example, there are expeditions with 0 members. I assume this is an error, but it's also possible that these expeditions aren't expeditions at all and instead represent some other form of record. We can confirm that our intuition is correct by looking up some of these expeditions online. Let's take expedition ANNS7130, for example. This expedition appears to have 0 members. However, the Himalayan Database Online shows that there is exactly one member: Tomoyo Minegishi.
The expedition has one member.Number of members listed as 0 in the "Total Mbrs" field.
Clearly there is an issue with the dataset and I decided to drop these records (expeditions with 0 members) from the analysis.
After dropping NaNs, grouping by year and peakid, counting the total number of members, summits, and deaths, and dropping any (year, peakid) combinations with 0 members (there were only 23 such combinations), this is what the data (exp_df) looked like:
no_summits is the number of members that summited.
no_deaths is the number of member deaths.
no_exped is the number of expeditions.
There are 406 peaks left in the database. I thought picking just a few of them would be best for the type of plot I wanted to create.
4. Selecting 5 Peaks To Visualize
Looking at the number of expeditions for each peak I saw that just a handful of peaks contained the bulk of the expeditions since 1905:
>>> key_exp = exp_df.groupby(by='peakid')[['no_exped']].sum().reset_index()
>>> key_exp.no_exped.describe()
mean 27.485222
min 1.000000
25% 2.000000
50% 3.000000
75% 8.000000
max 2303.000000
75% of all peaks have fewer than 8 expeditions since 1905! (at least according to the Himalayan Database). For example, here are 10 peaks with only one expedition since 1905:
I decided to focus on the 5 peaks with the most expeditions: Everest, AmaDablam, Cho Oyu, Manaslu, and Lhotse.
>>> key_exp.sort_values(by='no_exped', inplace=True, ascending=False, ignore_index=False)
>>> key_exp.iloc[:5, :]
peakid no_exped
84 EVER 2303 # Everest
1 AMAD 1525 # Ama Dablam
45 CHOY 1350 # Cho Oyu
233 MANA 754 # Manaslu
210 LHOT 497 # Lhotse
5. Preparing The Data For Plotting
To make my life easier when plotting, I make a few changes to the DataFrame.
Create all year/peak combinations for the 5 chosen peaks
I chose to remove years before 1921 because there weren't any expeditions before that for the 5 chosen peaks. After adding all (year, peak) combinations to these peaks we'll have introduced some NaN values which can be replaced with 0:
year peakid no_summits no_members no_deaths no_exped
0 1921 AMAD NaN NaN NaN NaN
1 1921 CHOY NaN NaN NaN NaN
2 1921 EVER 0.0 30.0 2.0 1.0
3 1921 LHOT NaN NaN NaN NaN
4 1921 MANA NaN NaN NaN NaN
At this point our dataset is a DataFrame with a little over 500 rows.
NOTE: Adding all year and peakid combinations was not strictly necessary but I wasn't sure whether I wanted to include years and peaks with no expeditions in the visualization. I decided to leave all combinations in the to start and make a decision after seeing the visualizations
Add "is_good_seas" flag
I added a column called is_good_seas ("seas" stands for "season") with values that will be set to True whenever a (year, peak) combination has at least one expedition but no deaths (i.e., is a "good" season):
"Success rate" is defined as succrate = no_summits / no_members, and "death rate" is simply deathrate = no_deaths / no_members. I added these rates as new columns in the DataFrame along with two other columns: a column flagging when the death rate was higher than 10%, and a column flagging when the success rate was higher than 70% (these are numbers I played with after creating a first draft of the visualization). This is what the DataFrame looked like at this point:
Dropping unnecessary columns is not strictly necessary, but it helps keep things clean. To this end, I removed the no_summits, no_members, succrate, and deathrate columns. I also sorted by year and peakid (ascending) and added a temporal index (idx) to each peakid:
The idx column serves the same conceptual purpose as the year column, but I thought it might be useful when plotting.
In the end I decided to remove records where there were no expeditions whatsoever (this is something I decided to do after creating a first draft of the plot, where I realized that including these records resulted in the plot looking too cluttered). After filtering, I dropped the no_exped column:
I wanted the thickness of each square in the plot to represent number of deaths relative to every other year and peak combination. In other words, I wanted peaks with fewer expeditions (and therefore fewer deaths) to be made up of thinner squares than peaks with more expeditions (and therefor more deaths). This means that Everest's plot would be made up of nice thick squares, but the plots for the other 4 peaks will be made up of thin (barely visible) squares. To address this issue, I decided to log-transform the number of deaths across all 5 peaks and years. I also normalized the non-zero values (after log-transforming) to be in the interval [0.5, 3] (because this value is meant to be uses as line-thickness when plotting).
After log-transforming and normalizing I had something like this:
I decided to create a plot for each peak separately. To make things specific, let's assume I'm creating the plot for Manaslu (the code will be reused for the other peaks by simply changing the path to the CSV file with the data).
Basic setup
I started by creating a bare bones HTML file called index.html (in the same folder as the CSV files created above) that includes the D3 library:
Squares With D3
Next, I created an SVG container (you can think of this as the canvas on which the visual elements will be drawn) and started a tag where the only thing I did was select the container by its id:
Squares With D3
If you open index.html in a web browser, you should see a blank page.
Adding a background color
Adding a background color is easy: simply draw a rectangle with the desired color and make sure it covers the entire SVG. Specifically (omitting everything outside the tag):
If you open index.html in a web browser, you should see this.
Now we're ready to start adding some data to our SVG.
Adding peak name
Let's add the name of the peak towards the top left of the SVG. First, define some constants:
x0 and y0: used to specify where to place the peak name.
blackColor: used to specify text color.
Then, load the CSV file, store the peak name as a variable called peakName, and add it to the SVG:
The SVG should now look like this:
IMPORTANT: If your plot is suddenly blank, there could be issues loading the manaslu.csv due to blocking by CORS policy. If this happens, open up a terminal and start a simple HTTP server (you can do this by typing python3 -m http.server), then opening localhost:8000/ in your browser, navigating to the index.html file, and opening it.
Next, we'll draw the squares.
Logic for drawing squares
I wanted each square to be a closed path. The path would be composed of two vertical lines and two diagonal lines. Starting from the year 1921, I would decide whether or not to draw a square for that year (idx = 0) using the following logic:
Draw a red square if there were deaths that year. The line thickness should be determined by the no_deaths column.
Draw a black square with line thickness 0.25 if there were expeditions but no deaths (black squares represent "good" seasons) that year.
Don't draw anything for years with no expeditions (these years were removed from the DataFrame so enforcing this requirement is simple).
Then, take a step to the right and move on to the next year (idx = 1).
Repeat.
Drawing the squares
I started by defining a few additional constants to specify line lengths, the angle of the diagonal lines, step size when moving to the right, and the specific red color I wanted to use (omitting everything outside the tag):
Next, iterate through every row in data (remember that rows are already sorted ascending by year/idx) and drawing a square using the logic from the previous section. This can be done by adding the following code:
// Add this after the code for adding peak name
// and sitll inside d3.csv("manaslu.csv").then(data => {})
// Iterate through each row in the data for squares
data.forEach(row => {
// Extract is_good_seas from CSV and convert to boolean
const isGoodSeason = row.is_good_seas === "True";
// Calculate key values for square coordinates
const x = x0 + row.idx * translationStep;
const x2 = x + diag_len * Math.cos((-angle) * (Math.PI / 180));
const y2 = y0 + vert_len + diag_len * Math.sin(angle * (Math.PI / 180));
// Draw square
svg.append("path")
.attr("d", d3.line().curve(d3.curveLinearClosed)([
[x, y0],
[x, y0 + vert_len],
[x2, y2],
[x2, y2 - vert_len],
]))
.style("stroke", seasonColors[isGoodSeason])
.style("stroke-width", isGoodSeason ? 0.25 : row.no_deaths)
.style("fill", backgroundColor)
});
This is the SVG we have at this point:
Adding red/black dots for flagging death rate and success rate
We've finished drawing the red/black squares. However, I wanted to add a small black dot below each square whenever that year had a success rate greater than 70%, and a small red dot whenever that year had a death rate greater than 10%. Doing this is straightforward. Simply add this code after drawing the squares:
// Add this after the code for drawing the squares
// and still inside data.forEach(row => {})
// Check if "high_deathrate" is True, then add a red dot below the square
if (row.high_deathrate === "True") {
svg.append("circle")
.attr("cx", x2)
.attr("cy", y2 + 10)
.attr("r", 2.5)
.style("stroke", redColor)
.style("fill", backgroundColor);
}
// Check if "high_succrate" is True, then add a black dot below the square
const secondCircleOffset = row.high_deathrate === "False" ? 10 : 20;
if (row.high_succrate === "True") {
svg.append("circle")
.attr("cx", x2)
.attr("cy", y2 + secondCircleOffset)
.attr("r", 2.5)
.style("fill", blackColor);
}
Note that I added some logic to take care of cases where both high_succrate == True and high_deathrate == True. Specifically, this line:
row.high_deathrate === "False" ? 10 : 20;
would move the black dot down whenever a red dot was already drawn (it turns out this case never occurred, and I didn't get to see this in action).
This is what the final SVG looks like:
At this point we've finished our work with D3. We're now ready to save our SVG and start working with it in Illustrator.
7. Saving The SVG & Importing It Into Illustrator
Before we're able to work with the SVG in Illustrator we need to save it.
Saving the SVG
If you're using Chrome, you can right click on your SVG and click on "Inspect" to open Chrome developer tools:
Then, find the SVG element in the "Elements" tab of the developer tools, right click on it, and select Copy > Copy element:
Next, open a text editor and paste the contents. Save the file and make sure to use .svg as the file extension:
manaslu.svg
What if I'm not using Chrome?
Other browsers have similar functionality. However, if this doesn't work (for whatever reason) another option is to add a button to your HTML file that allows you to download the SVG when the button is clicked.
Opening the SVG in Illustrator
If you open manaslu.svg in illustrator you might see something like this:
Honestly, I'm not sure why the background black, but changing the color back to what it should be is easy (just three clicks):
8. Working With The SVG In Illustrator
Adobe Illustrator is a powerful vector graphics editor that allows users to create and manipulate digital artwork. Unlike presentation software such as PowerPoint, Illustrator is specialized for graphic design and illustration. Think of Illustrator as a digital canvas where you can create intricate designs, logos, icons, and illustrations with precision.
I won't go over the entire Illustrator process I followed but there are a few key things you can do in Illustrator that I want to highlight (to give you a sense of what's possible if you've never used Illustrator before).
Locking objects
I like to lock the background so that it can't be moved or modified. Simply select the background and go to Object > Lock > Selection. This is a great feature when there are elements that you absolutely don't want to be messing with.
Grouping objects
Like in PowerPoint, you can group objects in Illustrator. This is very useful because it helps you avoid accidentally moving squares independently and thus "fudging the data". Essentially, it helps prevent doing things like this:
Without grouping the squares and dots it's very easy to accidentally do something like this. Technically this is still possible to do even if you group the objects, so grouping helps but doesn't prevent accidentally messing up the data. It's important to keep this in mind and be careful when editing SVGs in Illustrator.
Selecting similar objects
Suppose I want to change the opacity of the fill of all squares to 20%, but I don't want to affect the opacity of the outline. Illustrator makes it very easy to do this. One way to achieve this effect is to select one of the squares, then go to Select > Same > Fill Color. This will select everything with the same fill color. Then you can edit the opacity of the fill color from the Appearance panel:
9. Final Touches
I'm a sucker for textures so I decided to open the Illustrator file and add a paper texture. The basic steps are as follows:
Open the Illustrator file.
Download the image of a texture (Unsplash has lots of free options).
Convert the texture to black and white and adjust brightness and contrast to isolate the texture.
Drag the texture on top of your image in Illustrator.
Change the "transparency" mode to achieve the desired effect.
(Left) Original texture by Kiwihug on Unsplash. (Right) Desaturating texture and isolating the desired texture.
Exporting
Because I'm sharing the final image online and the image contains colors with opacity, I decided to export the image as a PNG. I chose the "Type Optimized" Anti-aliasing setting to help maintain sharpness in the text.
This is what the design looks like straight out of Illustrator:
The final images
Here's what the finale image for Everest looks like:
If you're interested, the 5 final images are available on my website.
10. Lessons Learned
Not colorblind friendly
I shared the final visualizations with a friend and was quickly reminded that they have color vision deficiency (CVP)! This is what the visualizations probably looked like to them (depending on the type of CVP):
CVP type: Protanopia.CVP type: Deuteranopia.
In hindsight, I should've picked a different color palette. Adobe Color provides excellent tools for constructing color palettes that are accessible to people with CVP:
The Adobe Color website shows you what a color palette looks like to people with different types of CVP and highlights potential issues.
Editing in a low brightness interface
In the past I've learned the hard way about the value of:
A well-calibrated monitor.
Being able to precisely control brightness (for consistency).
Unfortunately, I made the mistake of not having a look at the final graphic in Illustrator with a lighter interface background. This would've shown me that the image was a bit dark prior to exporting.
Things often look brighter when plotted against a dark background.
Final Comments
Creating the initial SVG with D3 made things a lot simpler than trying to create this kind of plot in Python directly.