The History of Bodybuilding Through Network Visualization

Author:Murphy  |  View: 25742  |  Time: 2025-03-22 21:31:52

All images and code created by the author unless the source marked otherwise.

I have been passionate about lifting weights for about a decade now, so it was time to map out the greatest legends of the sport, in particular, competitive Bodybuilding, iconized by Arnold Schwarzenegger, using a data-driven approach. Hence, here, I first collected data about the Top 3 positions of every Mr. Olympia competition and created the shared-podium network of the winners. This network, later visualized in Gephi, ought to highlight the different eras and their key figures in this sport.

Additionally, this article aims to illustrate how one can use data science and network visualization to map out hidden connections within any social ecosystem, let them be small or large, coming from the world of sports or arts, having little or gigantic commercial commercial value.

1. Data preparation

1.1. Collecting the data

I usually start with Data collection – such as writing a crawler or tapping on an API. However, for this exercise, a strikingly simple solution worked best – going for the right Wikipedia site (also thanks to Wiki for its open usage license), scrolling down for the Top 3 position table, also shown below, and copy-pasting it into an excel spreadsheet I named olimpia.xlsx.

The first few years of the Mr. Olympia Top 3. Source: https://en.wikipedia.org/wiki/Mr._Olympia

1.2. Parsing and cleaning data

Let's get to the Pyhton terminal now, and parse, clean, and display the spreadsheet as a Pandas DataFrame:

import pandas as pd

# parsing the data frame
df = pd.read_excel('olimpia.xlsx')

# getting ride of some data cleaning issues resulted from the manual copy-pasting
df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)

# getting some stats:
print('The start of the data set:', min(df.year))
print('The end of the data set:', max(df.year))

# taking a quick look:
df.head(3)

The output:

Now let's create a list of all people who ever made it to the top three:

athletes = pd.unique(df.drop(columns = ['year']).values.ravel())
len(set(athletes))
athletes[0:15]

The output:

Here one may notice that there is still some work to do about data cleaning, as there were years, when the same position was takin by two people in a tie – such as the 3rd place in 1976 between Mike Katz & Frank Zane.

So now I create a new, cleaned list of athletes by splitting the ties into lists, getting rid of the nan-s, and packing everybody into the new list called athletes_clean:

athletes_clean = []
for athlete in athletes:
    if type(athlete) is str:
        athletes_clean += [a.strip().rstrip() for a in athlete.replace('xa0', ' ').split('&')]

athletes_clean = list(set(athletes_clean))    
print(len(athletes_clean))

This process will lead to a list of 56 athletes.

1.3. Outline statistics

So now I create a new, cleaned list of athletes by splitting the ties into lists, getting rid of the nan-s, and packing everybody into the new list called athletes_clean:

athlete_cnt = {athlete : df.applymap(lambda x: athlete in str(x)).sum().sum() for athlete in athletes_clean}

pd.DataFrame(athlete_cnt.items(), columns = ['Name', 'Number_of_Times']).sort_values(by = 'Number_of_Times', ascending = False).head(10)

The output of this code block, showing the all-time top 10 with the household names of the sport:

Top 10 most freuqnet athletes in the Top 3 of the Mr. Olympia competition all time.

2. The Network

2.1. Building the Network

Now, I will iterate through the DataFrame row by row, store all the names of a given year in a list, and add an edge with a strength of one between each pair of the year. This means that the nodes will be defined as the athletes, and they will be linked to the shared podium. One shared podium counts with a width of one, regardless of the positions they reached, and the more frequently they co-starred, the stronger the link.

import networkx as nx

edges = {}

for i in range(len(df)):

    # get the row values
    row = df.iloc[i].dropna().to_list()

    # keep the names (strings)
    top3 = [t for t in row if type(t) is str]

    # transform the ties
    tie = [t for t in top3 if 'xa0' in t]
    if len(tie)>0:
        tie = [a.strip().rstrip() for a in tie[0].replace('xa0', ' ').split('&')]
        top3 = [t for t in top3 if 'xa0' not in t]
        top3 += tie

    # now link the athletes taking the top 3 positions

    for idx, t1 in enumerate(top3):
        for t2 in top3[idx+1:]:
            edge = 't'.join(sorted([t1, t2]))
            if edge not in edges:
                edges[edge] = 1
            else:
                edges[edge] += 1
# initiatie an empty graph
G = nx.Graph()

# pack the edges into the Graph
for edge, weight in edges.items():
    e1, e2 = edge.split('t')
    G.add_edge(e1, e2, weight = weight)

# Check the basic stats of the graph - number of nodes and links
G.number_of_nodes(), G.number_of_edges()

The output of this code section shows that the network we built contains 56 nodes (as athletes) connected by 120 links total.

2.2. Visualizing the Network

While I did the visualization using Gephi, here are a few lines that I used to export and prepare the files for the final visualization, which you will find in the figure below.

# export the graph
nx.write_gexf(G, 'olympia.gexf')

# export the count table
df_out = pd.DataFrame(athlete_cnt.items(), columns = ['Id', 'Cnt']).set_index('Id')
df_out.to_csv('cnt.csv')

This final visual vividly tells the evolution of the field of bodybuilding. First, one may notice that the glory days, the old-school era in a completely separated component, centered by Schwarzenegger and Frank Zane. We can further pinpoint other iconic names of the golden era especially in the silver community, such as Franco Columbu and Sergio Oliva.

Then, in the second network community, we see clearly how time moves on, starting in the 80s, ruled by Lee Haney, and then moving on to the modern era's giants, with Dorian Yates, Kevin Levrone, and then the past three long-standing champions: Ronnie Colemen, Jay Cutler, and Phil Heath.

This figure, and the article in general, also stand as an example of how we can use Data Science and network visualization to map out various niche social structures, let them be from the world of sports or fantasy, while the methods and tools are just as applicable to banking or HR.

Tags: Bodybuilding Data Data Science Data Visualization Sports

Comment