Predict the Limits of Human Performance with Python

Author:Murphy | View: 28101 | Time: 2025-03-23 18:45:45

An expressive oil painting of an Olympic runner bursting through the finish line tape depicted as the explosion of a nebula (by DALL-E2)

Will a human ever outrun a Ferrari? Of course not. Human performance is inherently limited, and many factors restrict our speed, including how quickly our blood can deliver oxygen and how fast our muscles can twitch. Unless we undergo significant genetic engineering, we're about as fast as we'll ever be.

So how do we know this? Well, human performance, like many other traits, follows a _bell curve distribution_ [1]. This means that most people fall within the average range, near the peak of the curve, with only a small percentage being exceptionally slow or fast. As we move farther away from the peak, the number of individuals with that level of performance drops off exponentially. In the case of sprinting, this means that the fastest sprinters have already reached the flattened, tapered part of the curve. As a result, making significant speed improvements will become increasingly difficult.

Graph of 100 m trial times below 9.93 seconds recorded since 2005 [2]. (after McCormick School of Engineering (2016))

This is not to say that there's no room for improvement. Training, nutrition, equipment, and other factors can help individuals improve their performance within their genetic limits. However, it's important to recognize that we cannot fundamentally alter our physical limitations, especially if regulatory authorities continue to limit the use of advanced biotechnologies in track and field.

Because human performance is bounded, we can forecast future outcomes in many sports [3]. For example, the reduction in world record times for the 100 m dash, like many other natural phenomena, appears to follow a pattern of _exponential decay_ [4]. As a result, we can model it with an exponential equation:

An exponential equation (image by author)

In this equation, y represents a prediction of a world record sprint time, in seconds; x represents the number of years since the first record was set; and a, b, and c represent curve-fitting parameters:

a is the scale factor or amplitude. It determines the vertical stretch or compression of the exponential function.
b is the decay constant. It represents how quickly the function decays as x increases.
c is the amount of vertical shift. It determines the y value of the horizontal asymptote, which is a horizontal line that the exponential function approaches as x approaches infinity.

These three parameters are solved by fitting an exponential function to a set of data points using a curve-fitting algorithm. The values that best fit the data are determined by minimizing the sum of the squares of the differences between the predicted values of y and the actual values of y at each value of x.

Of course, if you're not a mathematician, finding the correct parameters might be a bit daunting. Fortunately for the rest of us, there's a Python library that makes curve optimization a breeze.

The SciPy Library

The open-source SciPy library expands on NumPy by providing physical constants, conversion factors, and numerical routines for mathematics, science, and engineering use [5]. These include optimization routines for curve fitting, which is just what we need for this project.

To install SciPy with conda use:

conda install scipy

To install with pip use:

pip install scipy

The Code

The Python code for this project was written in JupyterLab. If you want to download the notebook, you can find it at this Gist.

Importing Libraries and Setting RC Parameters

The following cell imports libraries and sets run configuration parameters for matplotlib figures. Setting these parameters upfront is not strictly necessary but reduces code later when plotting multiple figures.

import warnings
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import scipy.optimize

# Suppress warnings for using np.exp() on large values during optimization:
warnings.filterwarnings('ignore')

# Set default run configuration for plots:
plt.rcParams['figure.figsize'] = (6, 4)
plt.rc('font', size=12)
plt.rc('axes', titlesize=14) 
plt.rc('axes', labelsize=12) 
plt.rc('xtick', labelsize=11) 
plt.rc('ytick', labelsize=11) 
plt.rc('legend', fontsize=11)

Loading the Data

World records don't get broken very often, so there are fewer than two dozen to match. We'll use the records list at the topend sports site and enter it as a dictionary [6]. We'll then turn the dictionary into a pandas DataFrame for ease of use. Most of our analysis will use the records (in seconds) versus the number of years since the first record, so we'll add a column for the number of years.

# Input men's 100 m world records in seconds.
# If two records were set in the same year, list only the latest (lowest):
records = {2009: 9.58, 2008: 9.69, 2007: 9.74, 2005: 9.77, 2002: 9.78,
           1999: 9.79, 1996: 9.84, 1994: 9.85, 1991: 9.86, 1988: 9.92, 
           1983: 9.93, 1968: 9.95, 1960: 10, 1956: 10.1, 1936: 10.2, 
           1930: 10.3, 1921: 10.4, 1912: 10.6} 

# Turn dictionary into a DataFrame:
df = pd.DataFrame(records.items(), columns=['year', 'time'])
df['years'] = df['year'] - 1912  # Years since first record.
df = df.sort_values('year').reset_index(drop=True)
display(df)

The men's 100 m world records DataFrame (image by the author)

Graphing the World Records

It's always a good idea to look at your data as soon as possible, so let's make a stem plot from the DataFrame. This should make it easy to see trends and outliers in the data.

# Graph the world records:
plt.stem(df.year, df.time)
plt.title("Men's 100 m Sprint World Records")
plt.ylabel("Time (secs)")
plt.ylim(9.5, 10.8)
plt.grid(True);

Stem plot of men's 100 m world records by year (image by author)

As you might expect for exponential decay, the record times decrease fairly quickly at first but then start to flatten out, like an airplane approaching a runway. While modern athletes are professionals with optimized training, nutrition, and equipment, gains get harder to come by as you approach the limit of human performance. Or do they?

Look at the last two data points on the right. They look like they fell off a cliff. This is no gentle curve coming in for a soft landing. This is weird. This is Usain Bolt.

Jamaican runner montage by DALL-E2 (prompt: A dramatic oil painting of a Jamaican Olympic runner wearing yellow shirt and green shorts and bursting through the finish line tape depicted as the explosion of a nebula)

The Insane Story of Usain Bolt

Usain Bolt is a Jamaican runner and holder of the "world's fastest man" title [6][7]. In 2008, he won an Olympic gold medal in the men's 100 m sprint with a time of 9.69 seconds. This set a new world record, despite the fact that he slowed down to celebrate early (you can watch it here).

A year later Bolt stayed focused and crossed the finish line with a time of 9.58 seconds and a peak speed of 44.72 km/hr (27.79 mph). This record was decades earlier than biostatisticians expected, based on the mathematical models of the time.

Today, thanks to Bolt, predictions for the ultimate time for the 100 m come with a lot of humility and uncertainty. Among the numbers bandied around are a relatively high 9.44 seconds and a very low 9.27 [8][9].

To judge the impact Bolt had on 100 m dash predictions, let's make some of our own. These will be purely mathematical, and based only on previous world records, rather than on the results of all professional races. We'll first forecast the outcomes without Bolt, then repeat the process with Bolt. Since we'll be doing this more than once, we'll start by writing functions for creating and optimizing exponential functions.

Defining Functions for Exponential Decay

The first function will take the x values (years since the first record) and curve-fitting parameters a, b, and c, and return predicted y values (times). The second function will accept the first function as an argument, along with the x and y data, and use SciPy's [optimize.curve_fit()](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html#scipy.optimize.curve_fit) method to automatically choose the best fitting parameters. We'll set the p0 parameter to None, which means we'll let the method decide on the best minimum and maximum values for predicted values of y, rather than providing guesses.

def expo(x, a, b, c):
    """Return y values for exponential decay curve."""
    return a * np.exp(-b * x) + c

def optimize_curve_fit(a_func, x, y):
    """Return optimized parameters for curve fit."""
    params, covar = scipy.optimize.curve_fit(a_func, x, y, p0=None)
    return params

Optimizing the Curve Fitting Parameters

Before calling our functions, we need to build datasets of the world records with Bolt's times (_all suffix) and without them (_nB suffix). We'll pass these, along with our expo() function, to the optimize_curve_fit() function. This function returns the optimized a, b, and c fitting parameters as a NumPy array.

# Generate datasets with and without Bolt's times (nB = No Bolt):
x_all, y_all = df.years, df.time
x_nB, y_nB = x_all[:-2], y_all[:-2]

# Find optimized parameters for fitting the curve to the points:
params_nB = optimize_curve_fit(expo, x_nB, y_nB)
params_all = optimize_curve_fit(expo, x_all, y_all)
print(f"Parameters without Bolt (a, b, c) = {params_nB}") 
print(f"   Parameters with Bolt (a, b, c) = {params_all}")

Parameters without Bolt (a, b, c) = [0.98795896 0.01631187 9.57391395]
   Parameters with Bolt (a, b, c) = [1.34836526 0.00941746 9.18654695]

Plotting the Results

To plot our predictive curves, we'll pass our exponential decay function (expo()) to matplotlib's plot() method, along with the optimized fitting parameters.

# Plot exponential curves for data with and without Bolt's times:
plt.plot(x_all, y_all, '.', label='measured data', c='k')
plt.plot(x_nB, expo(x_nB, *params_nB), 
         '-', label='fitted without Bolt')
plt.plot(x_all, expo(x_all, *params_all), '--', 
         label='fitted with Bolt', c='red')
plt.title("Men's 100 m World Record")
plt.xlabel('Years Since First Record (1912)')
plt.ylabel('Times (s)')
plt.grid(True)
plt.legend(framealpha=1);

The two exponential curves fitted to the world record data (image by author)

Wow. Usain Bolt is truly off the curve. This is because, physically, he's an outlier. Although taller than most sprinters and with a longer stride, he's still able to maintain a similar stride frequency which means he can cover the same distance in fewer steps. This may be because he incorporates the fast-twitch muscle fibers of smaller sprinters with the mechanical advantages of a taller man's body [8].

A valid question here is whether he broke the curve or merely accelerated our progress along it. To investigate, we'll need to extrapolate the two curves into the future.

Predicting Future Performance

The following code first extrapolates the exponential curves over 570 years (from 20 years before the first record in 1912 to 550 years after). After plotting the curves, it marks Bolt's data points so that we can see where they intersect the curves in the future. Finally, it prints the minimum time predicted for each curve. Note that these values are the same as the c parameter from the curve-fitting exercise.

# Extrapolate exponential curves to predict future performance:
x_extrap = np.arange(-20, 550)
y_nB_extrap = expo(x_extrap, *params_nB)  # Without Bolt.
y_B_extrap = expo(x_extrap, *params_all)  # With Bolt.

# Create a plot of the world record times and the extrapolated curves.
fig, ax = plt.subplots()
ax.plot(x_all, y_all, '.', label='data', c='k')
ax.plot(x_extrap, y_nB_extrap, '-', label='fitted without Bolt')
ax.plot(x_extrap, y_B_extrap, '--', c='red', label='fitted with Bolt')
ax.set(title="Men's 100 m World Record Extrapolated",
       xlabel='Years Since First Record (1912)',
       ylabel='Time (s)',
       yticks=np.arange(9.0, 11.0, 0.2))
ax.grid(True)
ax.legend(framealpha=1)

# Add a dotted horizontal line for each of Bolt's world record times.
bolt_times = {2009: 9.58, 2008: 9.69}
for year, time in bolt_times.items():
    ax.axhline(time, ls=':', linewidth=1.3, color='red')
    ax.text(0, time + 0.01, f"Bolt {year}", color='red',
            horizontalalignment='left', size=9)

# Define function and inverse function to permit a secondary x-axis for year:
axis_transform = lambda x_extrap: x_extrap + 1912
axis_inverse = lambda x_extrap: x_extrap - 1912
ax2 = ax.secondary_xaxis('top', functions=(axis_transform, axis_inverse))

print(f"nMinimum predicted time without Bolt data = {min(y_nB_extrap):.2f} sec.")
print(f"Minimum predicted time with Bolt data =    {min(y_B_extrap):.2f} sec.n")

Minimum predicted time without Bolt data = 9.57 sec.
Minimum predicted time with Bolt data =    9.19 sec.

The two curves extrapolated into the future (image by author)

Technically, both curves allow for Bolt's current record of 9.58 seconds. If we assume that the red curve, which includes Bolt's data, is providing a valid prediction, then Bolt's accomplishment was decades ahead of schedule.

The red curve predicts that the ultimate human limit in the 100 m dash is 9.19 seconds and will be reached in about 400 years. Although 9.19 seconds is surely on the fast side, it's not out of line with other published predictions, such as 9.27, 9.26, and 9.09 seconds [9][10][11].

And while 400 years is a long time, some researchers believe Bolt's current record will stand for another 230 years [2]! At any rate, our values of 9.57 and 9.19 seconds are plausible and have a good chance of bracketing the ultimate value. Isn't math (and Python) wonderful?

Summary

Many natural phenomena, such as radioactive decay, fracturing in rocks, and population growth, can be modeled mathematically with tools like exponential equations, power laws, and logistic functions. In addition to matching existing data, these models can also forecast future behavior. In this Quick Success Data Science project, we used an exponential equation to predict the ultimate run time for the men's 100 m race.

Fitting curves to data requires the manipulation of multiple parameters. The goal is to minimize the errors between actual data points and predicted data points. Python's SciPy library includes functions that automate this process and make curve-fitting accessible to everyone.

Sources

[1] Normal distribution. (2023, April 16). In Wikipedia. https://en.wikipedia.org/wiki/Normal_distribution.

[2] Northwestern University, McCormick School of Engineering: How Long Will it Take to Break Usain Bolt's 100-meter Dash Record? Professor Luis Amaral calculates the odds (2016).

[3] Little, Brown and Company, The Formula: The Universal Laws of Success by Albert-László Barabási (2018).

[4] _Exponential decay_. (2023, March 11). In Wikipedia.

[5] Scipy: https://scipy.org/ (2023).

[6] Robert Wood, "100m World Records." Topend Sports Website, 2008, https://www.topendsports.com/sport/athletics/record-100m.htm, Accessed 1 May 2023.

[7] _Usain Bolt_. (2023, April 15). In Wikipedia.

[8] Wired: Bolt is Freaky Fast, But Nowhere Near Human Limits by Alexis Madrigal (2008).

[9] Runner's World: Ultimate 100-Meter Time: 9.27 Seconds? by Amby Burfoot (2014).

[10] Idea & Issac: Femto Essays: World Records for Men's 100 m Defy Simple Curve Fitting by Tatsuo Tabata (2008).

[11] Idea & Issac: Femto Essays: Bolt's World Record Changes Empirical Prediction Again by Tatsuo Tabata (2009).

Thanks!

Thanks for reading and be sure to follow me for more Quick Success Data Science projects in the future.

Tags: Data Science Exponential Decay Hands On Tutorials Scipy Usain Bolt

Add Fav

Comment

Murphy

Recommend

◦ Exposing sklearn machine learning models in Power BI

◦ Mastering Hadoop, Part 3: Hadoop Ecosystem: Get the most out of your cluster

◦ A Gentle Introduction to the DCIN for Decentralized Inference

◦ Mastering Weather Predictions: Unleash the Power of AI with LSTM Deep Learning Models for Accurate&#

◦ 5 Python One-Liners to Kick Off Your Data Exploration

◦ Streamline Property Data Management: Advanced Data Extraction & Retrieval with Indexify

◦ Implementing Vision Transformer (ViT) from Scratch

◦ How to Implement Random Forest Regression in PySpark

◦ The Importance of Storytelling in Data Science

◦ I Wasn't Always a Data Scientist – How I Broke into the Field

◦ Paper Walkthrough: U-Net

◦ From Chaos to Clarity: Streamlining Data Cleansing Using Large Language Models