Tagging Mountaineering Accident Reports Using bart-large-mnli

Author:Murphy | View: 25322 | Time: 2025-03-22 22:58:53

I discovered the Himalayan Database a few weeks ago and decided to create a few "whimsical" visualizations based on this dataset. In two previous articles I created a simple elevation plot for Everest expeditions and a plot showing the relative number of deaths for 5 Himalayan peaks. This time I wanted to explore expedition accident reports.

The dataset I'll be using is a small CSV file with information on Himalayan expeditions (about 11,300 rows), where each record/row represents one expedition in the Himalayas. Here are 5 sample records for expeditions in Annapurna II:

       expid peakid  year  season  host            route1            route2 route3 route4      nation               leaders                                        sponsor  success1  success2  success3  success4 ascent1 ascent2 ascent3 ascent4  claimed  disputed     countries                                   approach   bcdate   smtdate  smttime  smtdays  totdays  termdate  termreason                                         termnote  highpoint  traverse    ski  parapente  camps  rope  totmembers  smtmembers  mdeaths  tothired  smthired  hdeaths  nohired  o2used  o2none  o2climb  o2descent  o2sleep  o2medical  o2taken  o2unkwn                           othersmts                                          campsites                           accidents achievment agency  comrte  stdrte  primrte  primmem  primref primid   chksum
0  ANN260101   ANN2  1960       1     1  NW Ridge-W Ridge               NaN    NaN    NaN          UK      J. O. M. Roberts                                            NaN      True     False     False     False     1st     NaN     NaN     NaN    False     False  India, Nepal           Marshyangdi->Hongde->Sabje Khola  3/15/60   5/17/60   1530.0       63        0     -   -           1                                              NaN       7937     False  False      False      6     0          10           2        0         9         1        0    False    True   False     True      False     True      False    False    False  Climbed Annapurna IV (ANN4-601-01)  BC(15/03,3350m),ABC(4575m),C1(5365m),C2(5800m)...                                 NaN        NaN    NaN   False   False    False    False    False    NaN  2442047
1  ANN269301   ANN2  1969       3     1  NW Ridge-W Ridge               NaN    NaN    NaN  Yugoslavia          Ales Kunaver                Mountaineering Club of Slovenia      True     False     False     False     2nd     NaN     NaN     NaN    False     False           NaN           Marshyangdi->Hongde->Sabje Khola  9/25/69  10/22/69   1800.0       27       31  10/26/69           1                                              NaN       7937     False  False      False      6     0          10           2        0         0         0        0    False   False    True    False      False    False      False    False    False  Climbed Annapurna IV (ANN4-693-02)  LowBC(25/09,3950m),BC(27/09,4650m),C1(27/09,53...  Draslar frostbitten hands and feet        NaN    NaN   False   False    False    False    False    NaN  2445501
2  ANN273101   ANN2  1973       1     1    W Ridge-N Face               NaN    NaN    NaN       Japan       Yukio Shimamura  Sangaku Doshikai Annapurna II Expedition 1973      True     False     False     False     3rd     NaN     NaN     NaN    False     False           NaN        Marshyangdi->Pisang->Salatang Khola  3/16/73    5/6/73   2030.0       51        0     -   -           1                                              NaN       7937     False  False      False      5     0           6           1        0         8         0        0    False   False    True    False      False    False      False    False    False                                 NaN  BC(16/03,3300m),C1(21/03,4200m),C2(10/04,5000m...                                 NaN        NaN    NaN   False   False    False    False    False    NaN  2446797
3  ANN278301   ANN2  1978       3     1    N Face-W Ridge               NaN    NaN    NaN          UK  Richard J. Isherwood                British Annapurna II Expedition     False     False     False     False     NaN     NaN     NaN     NaN    False     False           NaN        Marshyangdi->Pisang->Salatang Khola   9/8/78   10/2/78      NaN       24       27   10/5/78           4  Abandoned at 7000m (on A-IV) due to bad weather       7000     False  False      False      0     0           2           0        0         0         0        0     True   False    True    False      False    False      False    False    False                                 NaN                   BC(08/09,5190m),xxx(02/10,7000m)                                 NaN        NaN    NaN   False   False    False    False    False    NaN  2448822
4  ANN279301   ANN2  1979       3     1    N Face-W Ridge  NW Ridge of A-IV    NaN    NaN          UK           Paul Moores                                            NaN     False     False     False     False     NaN     NaN     NaN     NaN    False     False           NaN  Pokhara->Marshyangdi->Pisang->Sabje Khola    -   -  10/18/79      NaN        0        0  10/20/79           4             Abandoned at 7160m due to high winds       7160     False  False      False      0     0           3           0        0         0         0        0     True   False    True    False      False    False      False    False    False                                 NaN  BC(3500m),ABC,Biv1,Biv2,Biv3,Biv4,Biv5,xxx(18/...                                 NaN        NaN    NaN   False   False    False    False    False    NaN  2449204

The dataset has a column called "accidents" (trust me, it's in there!). This column contains expedition accident reports (it also contains NaN values whenever no accident report was provided). For example, the second row in the DataFrame above has the following report: "Draslar frostbitten hands and feet". Another accident report (not visible in the 5 rows above) reads: "Dewaele exhausted, shocked, needed O2 and was brought down; much slight frostbite; leader's serious lung ailment; Ang Lhakpa fatal fall".

After looking through several reports I thought it might be interesting to tag reports with topics. In this article I will show you how I did just that. The focus will be on tagging the accident reports, but I'll also explain how I created the visualization at the start of this article. Specifically, I'll go over the following topics:

Coming up with an initial set of tags.
Tagging the data.
Refining the tags.
Plotting results in Python.
How I edited the Python plot in Adobe Illustrator.

The Accidents Reports

After loading the expeditions dataset into Python I created a Pandas DataFrame (acc_df ) with the expid, peakid, and accidents columns, filtered the data to the 5 peaks with the most expeditions (Ama Dablam, Cho Oyu, Everest, Lhotse, and Manaslu), and added an accident ID column:

>>> acc_df.head(3)

   acc_id peakid                                          accidents
0       0   EVER  Dewaele exhausted, shocked, needed O2 and was ...
1       1   EVER                  Walker minor pulmonary edema only
2       2   EVER  Rabold stroke (?) and Slade CO2 in tent; Lithe...

Here are a few more examples of accident reports:

- Unoccupied tents at C3 and again at C2 blown away
- Batard had painfully wind-dried eyes at 7800m
- Prodanov death at 8600m; Yankov frostbitten fingers; 1 member hit by falling empty oxygen bottle in Apr; Savov frostbitten toes
- Doctor had cerebral edema at BC; 3 killed and 2 hurt in avalanche
- None except Sterbova extreme exhaustion

After spending some time looking through the reports, it became clear that a lot of them mention things like falls, frostbite, stomach issues, and high-altitude illness, but there are many other topics described. If I wanted to tag each accident description with the types of issues that the expedition faced I could do it manually like this:

- Description: "Unoccupied tents at C3 and again at C2 blown away".
- Tags: ["high wind"]

- Description: "Prodanov death at 8600m; Yankov frostbitten fingers; 1 member hit by falling empty oxygen bottle in Apr; Savov frostbitten toes".
- Tags: ["death", "frostbite", "people falling", "human error"]

but there are a few things to consider. First, manual tagging would be annoying to say the least. Second, how do I even decide which tags to use? In the example above, I could've tagged the first accident with the tag "tents blown away" instead of "high wind".

In the following sections I share how I automated tagging accident descriptions and addressed these points.

What Tags Should I Use?

In some cases, determining the set of potential tags follows immediately from the project goal. For example, when performing sentiment analysis the potential tags might be "positive" and "negative". In the Silicon Valley TV Series the characters develop an app to determine if an image depicts a hot dog or not (the potential tags are "hot dog" and "not hot dog").

In this project, however, I'm not starting with a specific question that needs answering. I'm not asking: "How many expeditions experienced frostbite?" or "How many expeditions experienced acute mountain sickness (AMS)?". All I have is high-level idea (after spending some time looking at the data and from personal experience) of the types of issues mountaineers often face… so that's where I'm going to start.

The Plan

I decided to start by constructing an initial set of tags with the help of OpenAI's GhatGPT and added any other tags I thought might occur frequently in the data. I could then use a zero-shot sequence classifier (like Facebook's bart-large-mnli model) to assign one or more of these tags to each accident report. Looking at any untagged accident reports could give me an idea of whether any additional tags are needed. Tags can also be grouped after looking at frequency counts, and tags can be removed if they're often implied by the presence or absence of other tags.

Start with a set of tags. Add new tags by looking at reports with no tags. Merge tags based on tag counts. Remove tags based on entailment from other tags. – *Image by author.*

An Initial Set of Tags

To come up with an initial set of tags I asked ChatGPT for help:

Snapshot from a conversation with ChatGPT.

I took ChatGPT's response and added a few tags of my own (30 tags total):

 0. acute mountain sickness (AMS)
 1. avalanches
 2. blizzards
 3. blood clot
 4. broken bones
 5. cerebral edema (HACE)
 6. equipment failure
 7. exhaustion
 8. extreme cold
 9. frostbite
10. getting lost
11. glacial crevasses
12. heart problems
13. high winds
14. hypothermia
15. icefall
16. inadequate preparation
17. lack of experience
18. lack of fitness
19. mental stress
20. people falling
21. pulmonary edema (HAPE)
22. respiratory problems
23. rockfall
24. running out of food
25. running out of water
26. snowfall
27. steep rock
28. stomach problems
29. storms

To tag accident descriptions with one or more of these tags I decided to use Facebook's bart-large-mnli model.

bart-large-mnli

First, what is "bart-large"? Here's a brief excerpt from Hugging Face:

BART is a transformer encoder-decoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. BART is pre-trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. BART is particularly effective when fine-tuned for text generation (e.g. summarization, translation) but also works well for comprehension tasks (e.g. text classification, question answering).

The bart-large model has been fine-tuned on different datasets to generate new models like bart-large-mnli, which was trained on the MultiNLI (MNLI) dataset and has over 5.85M downloads:

Snapshot taken from the Hugging Face website [Feb-1–2024].

The bart-large-mnli model can be used for zero-shot classification. As an example (copied from the model card on Hugging Face), imagine you have a sentence that reads: "I have a problem with my iphone that needs to be resolved asap!!", and a set of 5 possible tags: urgent, not urgent, phone, tablet, computer. The bart-large-mnli model can help you assign one or more tags to the sentence. For each tag, the model returns an estimate of the probability that the sentence talks about the tag. Specifically, running the model on the sample sentence above gives the following result:

Snapshot from the bart-large-mnli model card on Hugging Face [Feb-1–2024].

The model is telling us that the sentence is probably talking about something urgent (with a probability of 99.9%) and about a phone (with a probability of 99.5%). This is in fact correct.

Tagging accident descriptions is the same problem. Instead of the sentence "I have a problem with my iphone that needs to be resolved asap!!" we'll have a sentence representing an accident report, and instead of 5 labels we'll have 30.

Tagging Accidents

To start tagging accidents I'm going to use Hugging Face's built-in pipeline Python module (which requires installing either TensorFlow or PyTorch). After installing dependencies, we can instantiate a zero-shot classifier based on the bart-large-mnli model as follows:

from transformers import pipeline

classifier = pipeline("zero-shot-classification",
                      model="facebook/bart-large-mnli")

Now we're ready to start classifying! Let's briefly remind ourselves of what our dataset looks like:

>>> acc_df.head(3)

   acc_id peakid                                          accidents
0       0   EVER  Dewaele exhausted, shocked, needed O2 and was ...
1       1   EVER                  Walker minor pulmonary edema only
2       2   EVER  Rabold stroke (?) and Slade CO2 in tent; Lithe...

>>> acc_df.shape
(1681, 3)

Okay, so we have a DataFrame with 1681 rows and 3 columns. The "accidents" column contains the accident reports we want to tag.

Suppose row is a pandas series representing one of the rows in acc_df (let's take the first row as an example):

>>> row = acc_df.iloc[0, :]
>>> row

acc_id                                                       0
peakid                                                    EVER
accidents    Dewaele exhausted, shocked, needed O2 and was ...

>>> row['accidents']
"Dewaele exhausted, shocked, needed O2 and was brought down; much slight frostbite; leader's serious lung ailment; Ang Lhakpa fatal fall"

What we need to do is ask the model to assign tags to the report "Dewaele exhausted, shocked, needed O2 and was brought down; much slight frostbite; leader's serious lung ailment; Ang Lhakpa fatal fall". If we assume that our candidate tags are stored in a Python list called candidate_tags, then tagging the sentence can be done as follows:

candidate_tags = ['acute mountain sickness (AMS)', 'avalanches', 'blizzards',
                  'blood clot', 'broken bones', 'cerebral edema (HACE)',
                  'equipment failure', 'exhaustion', 'extreme cold', 'frostbite',
                  'getting lost', 'glacial crevasses', 'heart problems',
                  'high winds', 'hypothermia', 'icefall', 'inadequate preparation',
                  'lack of experience', 'lack of fitness', 'mental stress',
                  'people falling', 'pulmonary edema (HAPE)', 'respiratory problems',
                  'rockfall', 'running out of food', 'running out of water',
                  'snowfall', 'steep rock', 'stomach problems', 'storms']

LABEL_THRESHOLD = 0.75
result = classifier(row['accidents'], candidate_tags, multi_label=True)
filtered_tags = [(tag, score)
                 for tag, score in zip(result['labels'], result['scores'])
                 if score >= LABEL_THRESHOLD]

Running this code resulted in the following tags (note that I'm only returning tags where the tag probabilities are reported to be at least 0.75):

>>> filtered_tags

[('respiratory problems', 0.9938778281211853),  # "serious lung ailment"
 ('people falling', 0.9840078949928284),  # "fatal fall"
 ('extreme cold', 0.946757972240448),  # presumably because of "frostbite"
 ('frostbite', 0.9222371578216553),  # "frostbite"
 ('exhaustion', 0.9137904047966003)]  # "exhausted"

The results look pretty good! Although the tagging process was a bit slow (it took a few seconds to tag a single accident report!). Because of this, I decided to divide the accidents DataFrame (acc_df) into batches (one for each peak) which I would process sequentially. Within each batch the individual accident reports would be tagged in parallel as follows:

def classify_sequence(row):
    """Function for classifying a row's accident description."""

    result = classifier(row['accidents'], candidate_tags, multi_label=True)
    filtered_tags = [(tag, score) for tag, score in zip(result['labels'], result['scores'])
                     if score >= LABEL_THRESHOLD]

    return row['acc_id'], filtered_tags

if __name__ == "__main__":

    # Iterate over batches
    for peakid in acc_df.peakid.unique():
        df = acc_df.query(f"peakid == '{peakid}'")
        print(f"Working on peakid = '{peakid}'")

        # Parallelize tagging accidents within batch
        num_workers = 6
        with ProcessPoolExecutor(max_workers=num_workers) as executor:
            futures = [executor.submit(classify_sequence, this_row) for _, this_row in df.iterrows()]
            results = [future.result() for future in as_completed(futures)]

        # Format results
        tagged_df = pd.DataFrame(results, columns=['acc_id', 'tags'])
        tagged_df.sort_values(by='acc_id', ascending=True, inplace=True)

        # Save tags for batch
        if os.path.isfile(TAGGED_ACC_FILEPATH):
            tagged_df.to_csv(TAGGED_ACC_FILEPATH, mode='a', header=False, index=False)

        else:
            tagged_df.to_csv(TAGGED_ACC_FILEPATH, index=False)

After running the code (which took about an hour) the result is a CSV file with 1681 rows containing the list of tags associated with each accident:

acc_id,tags
0,"[('respiratory problems', 0.9938778281211853), ('people falling', 0.9840078949928284), ('extreme cold', 0.946757972240448), ('frostbite', 0.9222371578216553), ('exhaustion', 0.9137904047966003)]"
1,"[('respiratory problems', 0.9940309524536133)]"
2,"[('people falling', 0.9906882047653198)]"

Do We Need More Tags?

Let's have a look at our tagged data. First, load the tagged accident information as a pandas DataFrame:

>>> tagged_df.head(3)

   acc_id                                               tags
0       0  [('respiratory problems', 0.9938778281211853),...
1       1     [('respiratory problems', 0.9940309524536133)]
2       2           [('people falling', 0.9906882047653198)]

How many accident reports had no tags?

One way to determine if we need to add more tags to our set of potential tags is by having a look at the accident reports that received no tags at all:

>>> tagged_df.tags.eq('[]').sum()
271

>>> tagged_df.tags.eq('[]').sum() / tagged_df.shape[0]
0.16121356335514575

There are 271 accident reports without tags, which is approximately 16% of all accident reports. 16% is not a huge number and I'm tempted to say that the tags we selected already cover a large portion of the accident reports. Still, let's have a look at some of these untagged reports to see what's going on. First, I massaged the tagged_df DataFrame a bit:

# Explode tags
tagged_df['tags'] = tagged_df['tags'].apply(eval)
tagged_df = tagged_df.explode('tags', ignore_index=True)

# Splitting the 'tags' column into two separate columns
tagged_df[['tag', 'prob']] = tagged_df['tags'].apply(pd.Series)
tagged_df.drop(columns=['tags'], inplace=True)

>>> tagged_df.head(3)

   acc_id                   tag      prob
0       0  respiratory problems  0.993878
1       0        people falling  0.984008
2       0          extreme cold  0.946758

Then I added the tag_id and accidents columns:

# Add tag_id column
tag_df = pd.read_csv(TAGS_FILEPATH)
tagged_df = tagged_df.merge(tag_df, how='left', on='tag')

# Add accident information by acc_id
acc_df = pd.read_csv(ACCIDENT_FILEPATH)
tagged_df = tagged_df.merge(acc_df, how='left', on='acc_id')

>>> tagged_df.head(3)

   acc_id                   tag      prob  tag_id peakid                                          accidents
0       0  respiratory problems  0.993878    22.0   EVER  Dewaele exhausted, shocked, needed O2 and was ...
1       0        people falling  0.984008    20.0   EVER  Dewaele exhausted, shocked, needed O2 and was ...
2       0          extreme cold  0.946758     8.0   EVER  Dewaele exhausted, shocked, needed O2 and was ...

Now we can extract a few untagged accident reports:

>>> tagged_df.query('tag.isna()')['accidents'].head(3)

39                                      Nothing serious
58                      None except Abrego mild illness
64      Various illnesses which caused early departures

It turns out that 81 of the 271 accident reports without tags simply say things like "Nothing serious" (or something to that effect). It is understandable that these reports would have no associated tag (after all, no accidents occurred). If we ignore these 69 records, the remaining 190 untagged records represent approximately 11% of all records.

Should we add more tags to recover the 11% of untagged records?

Out of the 190 untagged records (11% of all records), the reasons they weren't tagged was one of the following:

The probability score for all tags fell below 0.75 due to the use of minimizing words like "slight" or "little": "Menchenin slight altitude sickness". After removing the word "slight" the probability for the tag "acute mountains sickness (AMS)" goes from 0.38 to 0.83.
Ambiguity: "Kjellbeerg's illness & Warner's illness". It's unclear what happened so tag probabilities end up being lower than 0.75.
Miscellaneous problems not covered by the initials set of tags: "Attacked by Sherpas". We have no tag for this.

If an accident report is untagged due to either of the first 2 reasons the situation wouldn't change by simply adding more tags. However, if a record is untagged due to the third reason, then that would highlight the need to add more tags. By simply adding the tag "attack by Sherpas", for example, we can tag at least one of the 190 untagged reports. Still, it seems most untagged reports were untagged for reasons 1 and 2, and adding additional tags is not likely to significantly affect the number of tagged accident reports. Let's also keep in mind that only 11% of accidents are missing tags to begin with, so we're probably "boiling the ocean" to begin with.

In conclusion, lowering the tag acceptance threshold is more likely to help recover tags for untagged records than adding new tags.

Should we add tags to capture Yeti attacks?

As discussed, the untagged records do not highlight a need for more tags. However, it's still technically possible that a very common mountaineering problem is not being captured by our tags. As an example, imagine every single accident report ended with the following phrase:

"By the way, we were attacked by a Yeti."

It would then make sense to add the tag "Yeti attack" to our set of tags. In other words, just because an accident report has at least one tag, it doesn't mean we found all issues described in the report. How do we know we're not missing a prominent category (like Yeti attacks)? The truth is we don't know for sure, but two (of many) ways to explore this question are:

Having an LLM look at large concatenated chunks of accident reports, asking the model to extract the most common types of issue, and cross reference these issues with our existing set of tags. If any new topics emerge, we can add them to our set of tags.
Using a topic modeling technique like Latent Dirichlet Allocation (LDA)for topic extraction. Again, if any new topics emerge, we can add them to our set of tags.

I ended up using both these methods and didn't identify any Yeti attacks or other tags that seemed frequent-enough. I decided not to add more tags.

Should we lower the tag acceptance threshold?

So, I decided that I wouldn't benefit much from adding more tags, but what about lowering the tag acceptance threshold? If you recall, I only assigned tags to an accident report if the probability that the report described the tag topic was at least 0.75. This feels kind of arbitrary.

Actually, the choice of a 0.75 tag acceptance threshold was not entirely arbitrary. I did try using higher acceptance rates and realized there were a lot of untagged records that should've received one of the initial tags. I then lowered the threshold to 0.6 to reduce the number of untagged records and realized a lot of accident reports received tags that were only weakly related to the accident description. The decision to set the tag acceptance threshold equal to 0.75 was therefore based on trial an error. In an unsupervised setting without a specific objective to optimize the choice of threshold was always going to be based on trial and error and heuristics.

Now that we've decided we don't need more tags and that we aren't going to lower the tag acceptance threshold, we can move on to the next step: deciding whether we should remove or group tags.

Removing Tags

When might a tag be unnecessary? A tag could be considered unnecessary if it's presence can largely be inferred from the presence/absence of other tags. Consider, for example, the tags: extreme cold, frostbite, and hypothermia. These tags are clearly related. I expect that the many of the accident reports tagged with frostbite or hypothermia will also be tagged with extreme cold.

In this Venn diagram of tag counts, the "extreme cold" tag doesn't contribute much additional information. – *Image by author.*

In the Venn diagram above, if the "frostbite" and "hypothermia" circles cover most of the "extreme cold" circle (e.g., if they cover more than 80%), we may decide to remove the "extreme cold" tag entirely since it's often representing duplicate information.

Of course, we can go down the rabbit hole of trying to identify complex relationships between tags and removing tags that can be inferred from other tags via these complex relationships, but what I want to do here is very simple: there are certain high-count tags that I'm worried are often just re-stating information. Specifically, there are only two cases I'm concerned about:

Case 1: extreme_cold being a duplicate of frostbite and hypothermia.

Case 2: respiratory problems being a duplicate of HAPE and AMS.

Case 1

Regarding the first case, it seems that out of all the accident reports that received the extreme cold tag only 16% did not also receive either the hypothermia of frostbite tags. If might therefore be a good idea to remove the extreme cold tag. Here are a few randomly sampled accident reports that received the extreme cold tag that further support this decision:

- Pepevnik and Breznik frostbite
- Na Temba snowblind temporarily; cook boy pneumonia
- Yoon got frostbite in middle finger on right hand and both toes
- Prokopenko all fingers frostbitten badly
- Latallo died of exhaustion while descending from C3 in a strong blizzard
- Frostbitten fingers
- Vadasdi has frostnip on both feet (big toes)
- Darabont got slight frostbite at middle finger of left hand on summit day, but is expected to recover 100%
- G. P. Kumar and Sirdar Pasang Gombu had 2 frostbitten fingers
- Marcos frostbite

Case 2

Regarding the second case, over 70% of accident reports that received the "respiratory illness" tag were not tagged with either AMS or HAPE. Therefore, it's a good idea to keep this tag.

Tag Frequencies

After removing the extreme_cold tag, here are the value counts (per tag):

*Frequency counts by tag. – Image by author.*

Grouping Tags

There are a few low-count tags that could be grouped with other tags. Specifically, I decided to implement the following mappings:

"HAPE", "HACE", "AMS" => "high-altitude illness".
"running out of food", "running out of water" => "running out of resources".
"storms", "high wind", "blizzards" => "weather-related issues".
"lack of fitness", "lack of experience" => "inadequate preparation".

After implementing this mapping, I had a DataFrame that like this (note the new_tag column which contains the mapped tag labels):

tagged_df.query('new_tag == "high-altitude illness"').head(3)

    acc_id                            tag                new_tag      prob  tag_id peakid                                          accidents
12       6  acute mountain sickness (AMS)  high-altitude illness  0.915278       0   EVER  Andrews altitude sickness and severely frostbi...
24      12  acute mountain sickness (AMS)  high-altitude illness  0.833330       0   EVER                         1 member altitude sickness
64      38  acute mountain sickness (AMS)  high-altitude illness  0.793059       0   EVER  3 members frostbite; Teare developed altitude ...

The updated value counts (making sure not avoid double counting tags introduced by the tag mapping) is as follows:

*Frequency counts by tag after applying tag mapping. – Image by author.*

Final Inspection

Out of the 10 tags with the highest frequencies, two seemed problematic upon further inspection: inadequate preparation and steep rock. Here are some examples of accident reports associated with these tags:

inadequate preparation: "Mukherjee frostbitten during summit attempt; Chaitanya fatal fall".

steep rock: "1 Sherpa hit by falling stone on Lhotse face; 1 Sherpa seriously ill from altitude".

In general, it seems these topics are poorly related to the accidents they claim to tag. As such, I decided to delete these tags from the list. Here are the final value counts:

*Frequency counts by tag after removing tags. – Image by author.*

The Final (Tagged) Dataset

In the end, these were the top 10 tags (along with a randomly sampled accident report associated with the tag):

frostbite: "Surchat frostnipped toe".
respiratory problems: "Goettler's slight frostbite; Dujmovits lung infection".
people falling: "1 Sherpa injured in Icefall brought to KTM".
hypothermia: "Simunek frostbite left thumb but will heal".
high-altitude illness: "Lhakpa Gyalzen Sherpa acute altitude sickness".
broken bones: "2 Sherpas injured in icefall (1 broken leg, 1 broken ribs and 'banged' head)".
stomach problems: "Kleppinger had constant stomach problems & never got beyond C1".
avalanches: "2 avalanches (1 fatal)".
snowfall: "2 Sherpas snowblinded; also 2 [members] had snowblindness & leader little snowblind".
icefall: "Nothing serious (snow, falling ice pieces, but not hurt)".

It's not perfect, but it doesn't seem too bad either. Our final tagged dataset looks like this:

>>> tagged_df.head(3)

   acc_id                   tag      prob  tag_id peakid                                          accidents               new_tag
0       0  respiratory problems  0.993878      22   EVER  Dewaele exhausted, shocked, needed O2 and was ...  respiratory problems
1       0        people falling  0.984008      20   EVER  Dewaele exhausted, shocked, needed O2 and was ...        people falling
2       0             frostbite  0.922237       9   EVER  Dewaele exhausted, shocked, needed O2 and was ...             frostbite

In the following section I'll show you how I created the sorted stream graph in the beginning of this article.

Plotting As a Sorted Stream Graph

I wanted to somehow plot the tag counts per tag per peak. However, because Everest has significantly more expeditions than the other peaks, I decided to create a column count_frac which is equal to count / no_exped:

>>> tag_count_df.head()

  peakid                new_tag  count  no_exped  count_frac
0   AMAD   respiratory problems     75      1525    0.049180
1   AMAD              frostbite     68      1525    0.044590
2   AMAD         people falling     33      1525    0.021639
3   AMAD            hypothermia     32      1525    0.020984
4   AMAD  high-altitude illness     26      1525    0.017049

The column count_frac represents the proportion of all expeditions for any given peak that experienced an accident related to a specific tag. I then created a plot for count_frac using a sorted stream graph. Why? Just for fun (a sorted stream graph would usually not be my call for this dataset, but I was interested to see what it would look like).

The "Bare Bones" Plot

I recently started writing a small Python package called pyllplot (very recent and still a work in progress) which I can use to create the sorted stream graph. To use the package, I installed it from the GitHub repo:

pip install git+https://github.com/karlahrnndz/pyllplot.git

Then I created the bare bones sorted stream graph as follows:

from pyllplot import SortedStream

# Make sorted stream graph
sorted_stream = SortedStream(
    tag_count_df,
    x_col="new_tag",
    height_col="count_frac",
    label_col="peakid",
    pad=0.0,
    centered=True,
    ascending=True,
)
color_palette = ["#ca0203", "#131220", "#ead8da", "#f2b202", "#48c8f6"]
sorted_stream.plot(filepath=PLOT_FILEPATH, color_palette=color_palette, title=None, figsize=(14, 6))

Disclaimer: I'm working on changing the implementation so usage of SortedStream() will likely change in the near future.

This is what the plot looked like straight out of Python:

Initial sorted stream plot created with custom Python package. – Image by author.

The x-axis specifies the tag, and the color specifies the peak. For each (tag, peak) combination, the width of the band is equal to count_frac, the fraction/percentage of expeditions for that peak that experienced an accident with the tag in question. The peak with the highest value of count_frac appears on top, the peak with the second highest value appears second from the top, and so on. In between tags, the width and position of a band is interpolated using monotonic interpolation.

Polishing In Adobe Illustrator

Here are a few checkpoints for the steps I followed to clean up the plot:

[Step 1] Open chart in Illustrator, ungroup all elements, release clipping masks, and remove unnecessary elements:

First step in Illustrator. – Image by author.

[Step 2] Select the 5 bands, remove outline, and lightly smooth paths (it may be hard to tell but the band outlines are smoother than before):

[Step 3] Add dark vertical bars aligned to tags, set their transparency mode to "color burn", and reduce their opacity to 50%:

[Step 4] Remove unnecessary parts of the bars using the shape builder tool (I also smoothed out the paths a bit more):

[Step 5] Add paper texture (I used a this texture by Kiwihug on Unsplash) and changed band transparency mode to "multiply". I also moved the black band to the top:

[Step 6] Change font, add annotations, edit legend, and play with layout:

That's it! This is the final plot.

It seems like Everest is often on top, except for when it comes to "Frostbite" an "Respiratory Problems", which are more commonly mentioned (relative to the number of expeditions) in Cho Oyu.

Final Thoughts

Regarding the tagging process:

Runtime was slow (about an hour).

Using a higher number of workers would naturally speed things up.
If you're working with a larger dataset, you could consider using models like Google's "Gemini" or OpenAI's "GPT" and ask the models to tag multiple accident reports at a time.

In the first case, you're likely looking at paying to increase the number of workers, and in the second case you're looking at spending time designing the prompt and (possibly) paying to increase the API request limit.

Regarding the plot:

As I mentioned, a sorted stream graph is not really appropriate in this case (because the x-axis is not a continuous variable). If this visualization hadn't been part of a personal project, I would've chosen a different chart type.

I should also mention that smoothing paths in Illusgtrator does mess with the data a bit. In a real-world scenario you might want to keep this type of smoothing to a minimumt. Still, in a chart like this were the relative position (top to bottom) of each band is the most important feature, a little more more smoothing isn't necessarylly a big issue.

The code:

The full code for this project can be found on GitHub.

Evaluating performance:

If you really needed to evaluate performance or are looking for a more principled way to select parameters (like the tag acceptance threshold), you could consider manually tagging a subset of the data with the final tags and cross-referencing with the model results.

Improving performance:

In addition to tuning hyperparameters (like the tag acceptance threshold), you can consider trying out other models and combining their results. Of course, you might need to think about whether you care more about precision or recall.

Tags: Data Science Data Visualization Editors Pick Hands On Tutorials Python Programming