Death In The Himalayas
Learning D3
I've been meaning to learn D3 for a while. To be honest, D3 has always been an overkill for the types of problems I've worked on (where visualizing the data was just a means to an end, not the final product itself). As a Python developer I often use tools like matplotlib
, plotly
, seaborn
, pandas
(or geopandas
), and bokeh
to "get the job done". Recently, however, I've been spending time creating data visualizations just for fun and it seems like the perfect time to start learning D3.
In this article I'll show you how I created a graphic like the one above for 5 peaks (Everest, Ama Dablam, Cho Oyu, Lhotse, and Manaslu) using Python, D3, and Illustrator. I will go over:
- Inspiration.
- Getting the data.
- Initial data preparation.
- Selecting 5 peaks to visualize.
- Preparing the data for plotting.
- Creating an SVG with D3.
- Saving the SVG and importing into Illustrator.
- Working with the SVG in Illustrator.
- Adding final touches.
- Lessons learned.
1. Inspiration
This visualization was inspired by "Gisa's Timeline" created by Barbara Rebolledo. I was looking for a nonstandard way to visualize the number of deaths during Himalayan expeditions and thought Barbara's timeline looked interesting (and it provided the perfect excuse to use D3 since creating something like that in Python would've been a nightmare).
2. Getting The Data
The data I used was obtained from the Himalayan Database and is the same dataset I used for the article "Visualizing Everest Expeditions".
The Himalayan Database is a compilation of records for all expeditions that have climbed in the Nepal Himalaya.
Specifically, I extracted information on Himalayan expeditions by following the instructions on the Himalayan Database website. The dataset is a small CSV file (with a little under 11,200 rows) containing expedition records. Here are the first 5 rows:
expid peakid year season host route1 route2 route3 route4 nation leaders sponsor success1 success2 success3 success4 ascent1 ascent2 ascent3 ascent4 claimed disputed countries approach bcdate smtdate smttime smtdays totdays termdate termreason termnote highpoint traverse ski parapente camps rope totmembers smtmembers mdeaths tothired smthired hdeaths nohired o2used o2none o2climb o2descent o2sleep o2medical o2taken o2unkwn othersmts campsites accidents achievment agency comrte stdrte primrte primmem primref primid chksum
0 ANN260101 ANN2 1960 1 1 NW Ridge-W Ridge NaN NaN NaN UK J. O. M. Roberts NaN True False False False 1st NaN NaN NaN False False India, Nepal Marshyangdi->Hongde->Sabje Khola 3/15/60 5/17/60 1530.0 63 0 - - 1 NaN 7937 False False False 6 0 10 2 0 9 1 0 False True False True False True False False False Climbed Annapurna IV (ANN4-601-01) BC(15/03,3350m),ABC(4575m),C1(5365m),C2(5800m)... NaN NaN NaN False False False False False NaN 2442047
1 ANN269301 ANN2 1969 3 1 NW Ridge-W Ridge NaN NaN NaN Yugoslavia Ales Kunaver Mountaineering Club of Slovenia True False False False 2nd NaN NaN NaN False False NaN Marshyangdi->Hongde->Sabje Khola 9/25/69 10/22/69 1800.0 27 31 10/26/69 1 NaN 7937 False False False 6 0 10 2 0 0 0 0 False False True False False False False False False Climbed Annapurna IV (ANN4-693-02) LowBC(25/09,3950m),BC(27/09,4650m),C1(27/09,53... Draslar frostbitten hands and feet NaN NaN False False False False False NaN 2445501
2 ANN273101 ANN2 1973 1 1 W Ridge-N Face NaN NaN NaN Japan Yukio Shimamura Sangaku Doshikai Annapurna II Expedition 1973 True False False False 3rd NaN NaN NaN False False NaN Marshyangdi->Pisang->Salatang Khola 3/16/73 5/6/73 2030.0 51 0 - - 1 NaN 7937 False False False 5 0 6 1 0 8 0 0 False False True False False False False False False NaN BC(16/03,3300m),C1(21/03,4200m),C2(10/04,5000m... NaN NaN NaN False False False False False NaN 2446797
3 ANN278301 ANN2 1978 3 1 N Face-W Ridge NaN NaN NaN UK Richard J. Isherwood British Annapurna II Expedition False False False False NaN NaN NaN NaN False False NaN Marshyangdi->Pisang->Salatang Khola 9/8/78 10/2/78 NaN 24 27 10/5/78 4 Abandoned at 7000m (on A-IV) due to bad weather 7000 False False False 0 0 2 0 0 0 0 0 True False True False False False False False False NaN BC(08/09,5190m),xxx(02/10,7000m) NaN NaN NaN False False False False False NaN 2448822
4 ANN279301 ANN2 1979 3 1 N Face-W Ridge NW Ridge of A-IV NaN NaN UK Paul Moores NaN False False False False NaN NaN NaN NaN False False NaN Pokhara->Marshyangdi->Pisang->Sabje Khola - - 10/18/79 NaN 0 0 10/20/79 4 Abandoned at 7160m due to high winds 7160 False False False 0 0 3 0 0 0 0 0 True False True False False False False False False NaN BC(3500m),ABC,Biv1,Biv2,Biv3,Biv4,Biv5,xxx(18/... NaN NaN NaN False False False False False NaN 2449204
3. Initial Data Preparation
I wanted to show the evolution of the number of deaths over time and, if possible, I also wanted to add information on summit success rate and death rate (these ended up being just small dots in the final visualization). I decided to focus on the following columns:
year
– When did the expedition take place?expid
— The expedition ID (unique when combined withyear
).peakid
– Peak ID.totmembers + tothired
– The number of members in the expedition.smtmembers + smthired
– The number of members that summited.mdeaths + hdeaths
– The number of members that passed away.
Including expid
is useful because it helps us get additional information about an expedition whenever there is needed for clarification. For example, there are expeditions with 0 members. I assume this is an error, but it's also possible that these expeditions aren't expeditions at all and instead represent some other form of record. We can confirm that our intuition is correct by looking up some of these expeditions online. Let's take expedition ANNS7130
, for example. This expedition appears to have 0 members. However, the Himalayan Database Online shows that there is exactly one member: Tomoyo Minegishi.


Clearly there is an issue with the dataset and I decided to drop these records (expeditions with 0 members) from the analysis.
After dropping NaNs, grouping by year
and peakid,
counting the total number of members, summits, and deaths, and dropping any (year, peakid
) combinations with 0 members (there were only 23 such combinations), this is what the data (exp_df
) looked like:
>>> exp_df
year peakid no_summits no_members no_deaths no_exped
0 1905 KANG 0 9 5 1
1 1907 KABN 0 2 0 1
2 1909 JONG 0 1 0 1
3 1909 LNPO 1 1 0 1
4 1910 KANG 0 1 0 1
year
andpeakid
are defined as before.no_members
is the number of members.no_summits
is the number of members that summited.no_deaths
is the number of member deaths.no_exped
is the number of expeditions.
There are 406 peaks left in the database. I thought picking just a few of them would be best for the type of plot I wanted to create.
4. Selecting 5 Peaks To Visualize
Looking at the number of expeditions for each peak I saw that just a handful of peaks contained the bulk of the expeditions since 1905:
>>> key_exp = exp_df.groupby(by='peakid')[['no_exped']].sum().reset_index()
>>> key_exp.no_exped.describe()
mean 27.485222
min 1.000000
25% 2.000000
50% 3.000000
75% 8.000000
max 2303.000000
75% of all peaks have fewer than 8 expeditions since 1905! (at least according to the Himalayan Database). For example, here are 10 peaks with only one expedition since 1905:
>>> key_exp.tail(10)
peakid no_exped
252 NALS 1 # Nalakankar South
68 DHEC 1 # Dhechyan Khang
186 KUML 1 # Khumbutse
343 SAUL 1 # Saula
342 SATO 1 # Sat Peak
71 DOGA 1 # Dogari
340 SANK 1 # Sano Kailash
254 NAN2 1 # Nangamari II
75 DOR2 1 # Dorje Lakpa II
129 HMLE 1 # Himlung East
I decided to focus on the 5 peaks with the most expeditions: Everest, Ama Dablam, Cho Oyu, Manaslu, and Lhotse.
>>> key_exp.sort_values(by='no_exped', inplace=True, ascending=False, ignore_index=False)
>>> key_exp.iloc[:5, :]
peakid no_exped
84 EVER 2303 # Everest
1 AMAD 1525 # Ama Dablam
45 CHOY 1350 # Cho Oyu
233 MANA 754 # Manaslu
210 LHOT 497 # Lhotse
5. Preparing The Data For Plotting
To make my life easier when plotting, I make a few changes to the DataFrame.
Create all year/peak combinations for the 5 chosen peaks
I chose to remove years before 1921 because there weren't any expeditions before that for the 5 chosen peaks. After adding all (year
, peak
) combinations to these peaks we'll have introduced some NaN
values which can be replaced with 0:
year peakid no_summits no_members no_deaths no_exped
0 1921 AMAD NaN NaN NaN NaN
1 1921 CHOY NaN NaN NaN NaN
2 1921 EVER 0.0 30.0 2.0 1.0
3 1921 LHOT NaN NaN NaN NaN
4 1921 MANA NaN NaN NaN NaN
At this point our dataset is a DataFrame with a little over 500 rows.
NOTE: Adding all year
and peakid
combinations was not strictly necessary but I wasn't sure whether I wanted to include years and peaks with no expeditions in the visualization. I decided to leave all combinations in the to start and make a decision after seeing the visualizations
Add "is_good_seas" flag
I added a column called is_good_seas
("seas" stands for "season") with values that will be set to True
whenever a (year
, peak
) combination has at least one expedition but no deaths (i.e., is a "good" season):
year peakid no_summits no_members no_deaths no_exped is_good_seas
510 2023 AMAD 27.0 126.0 0.0 8.0 True
511 2023 CHOY 5.0 9.0 0.0 1.0 True
512 2023 EVER 677.0 1251.0 18.0 50.0 False
513 2023 LHOT 107.0 153.0 0.0 20.0 True
514 2023 MANA 8.0 44.0 0.0 5.0 True
Add "death rate" and "success rate" columns
"Success rate" is defined as succrate = no_summits / no_members
, and "death rate" is simply deathrate = no_deaths / no_members
. I added these rates as new columns in the DataFrame along with two other columns: a column flagging when the death rate was higher than 10%, and a column flagging when the success rate was higher than 70% (these are numbers I played with after creating a first draft of the visualization). This is what the DataFrame looked like at this point:
year peakid no_summits no_members no_deaths no_exped is_good_seas deathrate high_deathrate succrate high_succrate
200 1961 AMAD 4.0 5.0 0.0 1.0 True 0.000000 False 0.800000 True
450 2011 AMAD 284.0 402.0 1.0 79.0 False 0.002488 False 0.706468 True
466 2014 CHOY 231.0 328.0 0.0 45.0 True 0.000000 False 0.704268 True
477 2016 EVER 678.0 935.0 5.0 80.0 False 0.005348 False 0.725134 True
481 2017 CHOY 77.0 105.0 0.0 6.0 True 0.000000 False 0.733333 True
Drop unnecessary columns, sort, and add time idx
Dropping unnecessary columns is not strictly necessary, but it helps keep things clean. To this end, I removed the no_summits
, no_members
, succrate
, and deathrate
columns. I also sorted by year
and peakid
(ascending) and added a temporal index (idx
) to each peakid
:
year peakid no_deaths no_exped is_good_seas high_deathrate high_succrate idx
0 1921 AMAD 0.0 0.0 False False False 0
1 1922 AMAD 0.0 0.0 False False False 1
2 1923 AMAD 0.0 0.0 False False False 2
3 1924 AMAD 0.0 0.0 False False False 3
4 1925 AMAD 0.0 0.0 False False False 4
The idx
column serves the same conceptual purpose as the year
column, but I thought it might be useful when plotting.
In the end I decided to remove records where there were no expeditions whatsoever (this is something I decided to do after creating a first draft of the plot, where I realized that including these records resulted in the plot looking too cluttered). After filtering, I dropped the no_exped
column:
year peakid no_deaths is_good_seas high_deathrate high_succrate idx
37 1958 AMAD 0.0 True False False 37
38 1959 AMAD 2.0 False True False 38
40 1961 AMAD 0.0 True False True 40
57 1978 AMAD 0.0 True False False 57
58 1979 AMAD 1.0 False False False 58
Log-transform and normalize no_deaths
I wanted the thickness of each square in the plot to represent number of deaths relative to every other year
and peak
combination. In other words, I wanted peaks with fewer expeditions (and therefore fewer deaths) to be made up of thinner squares than peaks with more expeditions (and therefor more deaths). This means that Everest's plot would be made up of nice thick squares, but the plots for the other 4 peaks will be made up of thin (barely visible) squares. To address this issue, I decided to log-transform the number of deaths across all 5 peaks and years. I also normalized the non-zero values (after log-transforming) to be in the interval [0.5, 3] (because this value is meant to be uses as line-thickness when plotting).
After log-transforming and normalizing I had something like this:
year peakid no_deaths is_good_seas high_deathrate high_succrate idx
37 1958 AMAD 0.000000 True False False 37
38 1959 AMAD 1.099531 False True False 38
40 1961 AMAD 0.000000 True False True 40
57 1978 AMAD 0.000000 True False False 57
58 1979 AMAD 0.500000 False False False 58
The no_deaths
column values are in the interval [0.5, 3] or are equal to 0.
Add peak name and split into 5 CSV files
The Himalayan Database has a table for mapping peakid
to the peak name which I merged into the DataFrame:
year peakid no_deaths is_good_seas high_deathrate high_succrate idx pkname
0 1958 AMAD 0.000000 True False False 37 Ama Dablam
1 1959 AMAD 1.099531 False True False 38 Ama Dablam
2 1961 AMAD 0.000000 True False True 40 Ama Dablam
3 1978 AMAD 0.000000 True False False 57 Ama Dablam
4 1979 AMAD 0.500000 False False False 58 Ama Dablam
I then split the data into 5 CSV files: ama_dablam.csv
, cho_oyu.csv
, everest.csv
, lhotse.csv
, and manaslu.csv
, one for each peak:
# ama_dablam.csv (49 rows)
year peakid no_deaths no_exped is_good_seas high_deathrate high_succrate idx pkname
0 1958 AMAD 0.000000 1.0 True False False 37 Ama Dablam
1 1959 AMAD 1.099531 1.0 False True False 38 Ama Dablam
2 1961 AMAD 0.000000 1.0 True False True 40 Ama Dablam
3 1978 AMAD 0.000000 1.0 True False False 57 Ama Dablam
4 1979 AMAD 0.500000 4.0 False False False 58 Ama Dablam
# cho_oyu.csv (53 rows)
year peakid no_deaths no_exped is_good_seas high_deathrate high_succrate idx pkname
49 1951 CHOY 0.000000 1.0 True False False 30 Cho Oyu
50 1952 CHOY 0.000000 1.0 True False False 31 Cho Oyu
51 1954 CHOY 0.000000 2.0 True False False 33 Cho Oyu
52 1958 CHOY 0.500000 1.0 False False False 37 Cho Oyu
53 1959 CHOY 1.699062 1.0 False True False 38 Cho Oyu
# everest.csv (76 rows)
year peakid no_deaths no_exped is_good_seas high_deathrate high_succrate idx pkname
102 1921 EVER 1.099531 1.0 False False False 0 Everest
103 1922 EVER 2.183097 1.0 False True False 1 Everest
104 1924 EVER 1.699062 1.0 False False False 3 Everest
105 1933 EVER 0.000000 1.0 True False False 12 Everest
106 1934 EVER 0.500000 1.0 False True False 13 Everest
# lhotse.csv (52 rows)
year peakid no_deaths no_exped is_good_seas high_deathrate high_succrate idx pkname
178 1955 LHOT 0.0 1.0 True False False 34 Lhotse
179 1956 LHOT 0.0 1.0 True False False 35 Lhotse
180 1972 LHOT 0.0 1.0 True False False 51 Lhotse
181 1973 LHOT 0.0 1.0 True False False 52 Lhotse
182 1974 LHOT 0.5 2.0 False False False 53 Lhotse
# manaslu.csv (60 rows)
year peakid no_deaths no_exped is_good_seas high_deathrate high_succrate idx pkname
230 1950 MANA 0.0 1.0 True False False 29 Manaslu
231 1952 MANA 0.0 1.0 True False False 31 Manaslu
232 1953 MANA 0.0 1.0 True False False 32 Manaslu
233 1954 MANA 0.0 1.0 True False False 33 Manaslu
234 1955 MANA 0.0 1.0 True False False 34 Manaslu
6. Creating An SVG With D3
I decided to create a plot for each peak separately. To make things specific, let's assume I'm creating the plot for Manaslu (the code will be reused for the other peaks by simply changing the path to the CSV file with the data).
Basic setup
I started by creating a bare bones HTML file called index.html
(in the same folder as the CSV files created above) that includes the D3 library:
Squares With D3
Next, I created an SVG container (you can think of this as the canvas on which the visual elements will be drawn) and started a tag where the only thing I did was select the container by its
id
:
Squares With D3
If you open index.html
in a web browser, you should see a blank page.
Adding a background color
Adding a background color is easy: simply draw a rectangle with the desired color and make sure it covers the entire SVG. Specifically (omitting everything outside the tag):

Now we're ready to start adding some data to our SVG.
Adding peak name
Let's add the name of the peak towards the top left of the SVG. First, define some constants:
x0
andy0
: used to specify where to place the peak name.blackColor
: used to specify text color.
Then, load the CSV file, store the peak name as a variable called peakName
, and add it to the SVG:
The SVG should now look like this:

IMPORTANT: If your plot is suddenly blank, there could be issues loading the manaslu.csv
due to blocking by CORS policy. If this happens, open up a terminal and start a simple HTTP server (you can do this by typing python3 -m http.server
), then opening localhost:8000/
in your browser, navigating to the index.html
file, and opening it.
Next, we'll draw the squares.
Logic for drawing squares
I wanted each square to be a closed path. The path would be composed of two vertical lines and two diagonal lines. Starting from the year 1921, I would decide whether or not to draw a square for that year
(idx = 0
) using the following logic:
- Draw a red square if there were deaths that year. The line thickness should be determined by the
no_deaths
column. - Draw a black square with line thickness 0.25 if there were expeditions but no deaths (black squares represent "good" seasons) that year.
- Don't draw anything for years with no expeditions (these years were removed from the DataFrame so enforcing this requirement is simple).
- Then, take a step to the right and move on to the next
year
(idx = 1
). - Repeat.
Drawing the squares
I started by defining a few additional constants to specify line lengths, the angle of the diagonal lines, step size when moving to the right, and the specific red color I wanted to use (omitting everything outside the tag):
Next, iterate through every row in data
(remember that rows are already sorted ascending by year
/idx
) and drawing a square using the logic from the previous section. This can be done by adding the following code:
// Add this after the code for adding peak name
// and sitll inside d3.csv("manaslu.csv").then(data => {})
// Iterate through each row in the data for squares
data.forEach(row => {
// Extract is_good_seas from CSV and convert to boolean
const isGoodSeason = row.is_good_seas === "True";
// Calculate key values for square coordinates
const x = x0 + row.idx * translationStep;
const x2 = x + diag_len * Math.cos((-angle) * (Math.PI / 180));
const y2 = y0 + vert_len + diag_len * Math.sin(angle * (Math.PI / 180));
// Draw square
svg.append("path")
.attr("d", d3.line().curve(d3.curveLinearClosed)([
[x, y0],
[x, y0 + vert_len],
[x2, y2],
[x2, y2 - vert_len],
]))
.style("stroke", seasonColors[isGoodSeason])
.style("stroke-width", isGoodSeason ? 0.25 : row.no_deaths)
.style("fill", backgroundColor)
});
This is the SVG we have at this point:

Adding red/black dots for flagging death rate and success rate
We've finished drawing the red/black squares. However, I wanted to add a small black dot below each square whenever that year had a success rate greater than 70%, and a small red dot whenever that year had a death rate greater than 10%. Doing this is straightforward. Simply add this code after drawing the squares:
// Add this after the code for drawing the squares
// and still inside data.forEach(row => {})
// Check if "high_deathrate" is True, then add a red dot below the square
if (row.high_deathrate === "True") {
svg.append("circle")
.attr("cx", x2)
.attr("cy", y2 + 10)
.attr("r", 2.5)
.style("stroke", redColor)
.style("fill", backgroundColor);
}
// Check if "high_succrate" is True, then add a black dot below the square
const secondCircleOffset = row.high_deathrate === "False" ? 10 : 20;
if (row.high_succrate === "True") {
svg.append("circle")
.attr("cx", x2)
.attr("cy", y2 + secondCircleOffset)
.attr("r", 2.5)
.style("fill", blackColor);
}
Note that I added some logic to take care of cases where both high_succrate == True
and high_deathrate == True
. Specifically, this line:
row.high_deathrate === "False" ? 10 : 20;
would move the black dot down whenever a red dot was already drawn (it turns out this case never occurred, and I didn't get to see this in action).
This is what the final SVG looks like:

At this point we've finished our work with D3. We're now ready to save our SVG and start working with it in Illustrator.
7. Saving The SVG & Importing It Into Illustrator
Before we're able to work with the SVG in Illustrator we need to save it.
Saving the SVG
If you're using Chrome, you can right click on your SVG and click on "Inspect" to open Chrome developer tools:

Then, find the SVG element in the "Elements" tab of the developer tools, right click on it, and select Copy > Copy element:

Next, open a text editor and paste the contents. Save the file and make sure to use .svg
as the file extension:

What if I'm not using Chrome?
Other browsers have similar functionality. However, if this doesn't work (for whatever reason) another option is to add a button to your HTML file that allows you to download the SVG when the button is clicked.
Opening the SVG in Illustrator
If you open manaslu.svg
in illustrator you might see something like this:

Honestly, I'm not sure why the background black, but changing the color back to what it should be is easy (just three clicks):

8. Working With The SVG In Illustrator
Adobe Illustrator is a powerful vector graphics editor that allows users to create and manipulate digital artwork. Unlike presentation software such as PowerPoint, Illustrator is specialized for graphic design and illustration. Think of Illustrator as a digital canvas where you can create intricate designs, logos, icons, and illustrations with precision.
I won't go over the entire Illustrator process I followed but there are a few key things you can do in Illustrator that I want to highlight (to give you a sense of what's possible if you've never used Illustrator before).
Locking objects
I like to lock the background so that it can't be moved or modified. Simply select the background and go to Object > Lock > Selection. This is a great feature when there are elements that you absolutely don't want to be messing with.
Grouping objects
Like in PowerPoint, you can group objects in Illustrator. This is very useful because it helps you avoid accidentally moving squares independently and thus "fudging the data". Essentially, it helps prevent doing things like this:

Selecting similar objects
Suppose I want to change the opacity of the fill of all squares to 20%, but I don't want to affect the opacity of the outline. Illustrator makes it very easy to do this. One way to achieve this effect is to select one of the squares, then go to Select > Same > Fill Color. This will select everything with the same fill color. Then you can edit the opacity of the fill color from the Appearance panel:

9. Final Touches
I'm a sucker for textures so I decided to open the Illustrator file and add a paper texture. The basic steps are as follows:
- Open the Illustrator file.
- Download the image of a texture (Unsplash has lots of free options).
- Convert the texture to black and white and adjust brightness and contrast to isolate the texture.
- Drag the texture on top of your image in Illustrator.
- Change the "transparency" mode to achieve the desired effect.

Exporting
Because I'm sharing the final image online and the image contains colors with opacity, I decided to export the image as a PNG. I chose the "Type Optimized" Anti-aliasing setting to help maintain sharpness in the text.

This is what the design looks like straight out of Illustrator:

The final images
Here's what the finale image for Everest looks like:

If you're interested, the 5 final images are available on my website.
10. Lessons Learned
Not colorblind friendly
I shared the final visualizations with a friend and was quickly reminded that they have color vision deficiency (CVP)! This is what the visualizations probably looked like to them (depending on the type of CVP):


In hindsight, I should've picked a different color palette. Adobe Color provides excellent tools for constructing color palettes that are accessible to people with CVP:

Editing in a low brightness interface
In the past I've learned the hard way about the value of:
- A well-calibrated monitor.
- Being able to precisely control brightness (for consistency).
Unfortunately, I made the mistake of not having a look at the final graphic in Illustrator with a lighter interface background. This would've shown me that the image was a bit dark prior to exporting.

Final Comments
- Creating the initial SVG with D3 made things a lot simpler than trying to create this kind of plot in Python directly.
- The video Cleaning up a Python data visualization in Adobe Illustrator (pandas to ai2html) by Jonathan Soman seems to cover a lot of important ideas related to editing data-based graphics in Illustrator if you're interested in getting started with editing data-based graphics.
- The entire code is available in this GitHub repo (there may be small differences from what was presented here).