Public Transport Accessibility in Python

Author:Murphy | View: 30057 | Time: 2025-03-22 21:55:53

Accessible public transport is essential to any livable neighborhood and should be the focus of local governments and urban planners. In this article, I use Budapest as an example, relying on its publicly available GTFS (General Transit Feed Specification) data and link to various tools introduced in my previous tutorials published on Towards Data Science, such as Quantifying Transportation Patterns Using GTFS Data and Urban Accessibility – How to Reach Defibrillators on Time. Namely, in this tutorial, I will study the accessibility of the different types of public transportation modes, such as subway and tram, based on the walking time needed to reach the nearest stops from every location in the city.

In today's urban planning, the usage of such fine-grained, data-driven analytics is essential to ensure equality and accessibility in public transport as well as planning future-proof and green public transport ecosystems.

1. Data

For this article, I downloaded public transport data from Transitfeeds.com, an online aggregator website for public transport data covering the city of Budapest and dating back to September 2023.

2. Stop-accessibility

First, we stop at stops to do several data parsing and preprocessing steps. Namely, I parse the .txt data file part of the GTFS batch called stops.txt, which contains the spatial locations of each public transport stop:

import pandas as pd

city= 'Budapest'
root = 'data/Budapest/23 September 2023'
df_stops = pd.read_csv(root + '/stops.txt')
display(df_stops.head(5))

This spreadsheet contains the _stopid (which we will be using later on) along with a number of other information characteristics to each step, such as whether they are accessible via wheelchair, the name of the stops (in the local language), and the longitude and latitude coordinates of that particular location. Next, I will use these coordinates to turn the Pandas spreadsheet into a GeoPandas data structure encoding the spatial dimension, where I also use Shapely to create geometry objects, and Matplotlib to visualize the results:

import geopandas as gpd
from shapely.geometry import Point
import matplotlib.pyplot as plt

geometry = [Point(xy) for xy in zip(df_stops['stop_lon'], df_stops['stop_lat'])]
gdf_stops = gpd.GeoDataFrame(df_stops, geometry=geometry)
df_stops.crs = 'EPSG:4326'

f, ax = plt.subplots(1,1,figsize=(10,10))

ax.set_title('Public transport stops in Budapest', fontsize = 18)
gdf_stops.plot(ax=ax, alpha = 0.2)
plt.tight_layout()

Let's extend this visual by adding the administrative boundaries of the city for a better view using OSMNx:

import osmnx as ox

admin = ox.geocode_to_gdf(city)
admin.crs = 4326

f, ax = plt.subplots(1,1,figsize=(10,10))
ax.set_title('Public transport stops in Budapest', fontsize = 18)
admin.plot(ax=ax, color = 'none', edgecolor = 'k')
gdf_stops.plot(ax=ax, alpha = 0.2)
plt.tight_layout()
ax.axis('off')

The result of this codeblock:

3. Different modes of transport

Now we have the location of each public transport stop – let's differentiate between them based on the means of transportation they serve. This information is usually stored in the routes.txt, under the column _routetype. This, [this](https://www.transit.land/routes/r-u0wj-717), and this mapping can encode transportation mode codes to their English names.

Based on some manual mapping from the previous sources, I created the official map and then a simplified version of it, which I will use later. In the simplified version, I renamed, for instance, both the category ‘Tram, Streetcar, Light rail' (code 0) and the category ‘Tram Service' (code 900) to just ‘Tram'. The dictionary encoding each means of public transport in a unified, simplified terminology:

map_complete = { 0   : 'Tram, Streetcar, Light rail', 
                 1   : 'Subway, Metro', 
                 2   : 'Rail',
                 3   : 'Bus', 
                 4   : 'Ferry', 
                 11  : 'Trolleybus', 
                 100 : 'Railway Service',
                 109 : 'Suburban Railway',
                 400 : 'Urban Railway Service', 
                 401 : 'Metro Service', 
                 700 : 'Bus Service', 
                 717 : 'Regional Bus',
                 900 : 'Tram Service', 
                 1000: 'Water Transport Service'}

map_simple = { 0    : 'Tram', 
               1    : 'Subway', 
               2    : 'Railway',
               3    : 'Bus', 
               4    : 'Ferry', 
               11   : 'Trolleybus', 
               100  : 'Railway',
               109  : 'Railway',
               400  : 'Railway', 
               401  : 'Subway', 
               700  : 'Bus',
               717  : 'Bus',
               900  : 'Tram', 
               1000 : 'Ferry', }

Let's take a quick look at the prevalence of the different types of transportation means:

from collections import Counter
import matplotlib.pyplot as plt

# this function does some nice formatting on the axis and labels
def format_axis(ax):   
    for pos in ['right', 'top']:   ax.spines[pos].set_edgecolor('w')    
    for pos in ['bottom', 'left']: ax.spines[pos].set_edgecolor('k')         
    ax.tick_params(axis='x', length=6, width=2, colors='k')
    ax.tick_params(axis='y', length=6, width=2, colors='k') 
    for tick in ax.xaxis.get_major_ticks():  tick.label.set_fontsize(12) 
    for tick in ax.yaxis.get_major_ticks():  tick.label.set_fontsize(12)

f, ax = plt.subplots(1,1,figsize = (8,6))

# get the data, map the english names
df_route = pd.read_csv(root + '/routes.txt')
df_route['route_type'] = df_route['route_type'].astype(int)
df_route['route_type_en'] = df_route['route_type'].map(map_simple)
D = dict(Counter(df_route.route_type_en))

# define my color palette
transport_colors = {'Railway': '#A65C47',  
                    'Bus': '#0BB3D9',      
                    'Tram': '#F2B705',     
                    'Ferry': '#997CA6'   ,  
                    'Trolleybus' : '#D91818',
                    'Subway' : '#0869A6'}

# create the bar charts
labels = D.keys()
values = D.values()
colors = [transport_colors[l] for l in labels]

ax.bar(labels, values, color = colors)
ax.set_xticklabels(labels, fontsize = 10, rotation = 60, ha = 'right')
format_axis(ax)
ax.set_title(city, fontsize = 26)
ax.set_ylabel('Number of routes', fontsize = 15)
ax.set_yscale('log')

plt.tight_layout()
#plt.savefig('figure6.png', dpi = 150, bbox_inches = 'tight')

The result of this codeblock:

This graph clearly shows the superiority of bus lines in Budapest, followed by tram and trolleybuses.

4. Map transport modes to stops

Now we have all the pieces; we just need to stitch them together. Let's parse two more data tables, one containing the stopping times and the other the trips described as a series of stops.

df_stop_times = pd.read_csv(root + '/stop_times.txt')
df_trips = pd.read_csv(root + '/trips.txt')

display(df_route.head(3))
display(df_trips.head(3))
display(df_stop_times.head(3))

The output of this cell:

The previous quick-view of the data tables at hand shows how we can start merging them together. First, _dfroute provides a mapping between _route_typeen and the _routeid. Second, _dftrips connects _routeid to _tripid.Third, _df_stoptimes contains the information on how _tripid encapculsates _stopid. Finally, _gdfstops connects the _stopid to geolocation. Let's do the merges:

df_merged = df_route[['route_type_en', 'route_id']].merge(df_trips[['route_id', 'trip_id']], left_on = 'route_id', right_on = 'route_id')[['trip_id', 'route_type_en']].drop_duplicates()
df_merged = df_merged.merge(df_stop_times, left_on = 'trip_id', right_on = 'trip_id')[['route_type_en', 'stop_id']].drop_duplicates()
df_merged = gdf_stops.merge(df_merged, left_on = 'stop_id', right_on = 'stop_id')[['stop_id', 'stop_name', 'route_type_en', 'geometry']].drop_duplicates()
df_merged

The resulting table:

Quick check on the number of stops per transport modes:

Counter(df_merged.route_type_en)

The final counts from this cell:

5. Stop Location Accessibility

Let's compute the accessibility of each location using a 5km/h walking speed and an upper limit of stop-distance 2500m for the most popular transport types relying on earlier work and the library Pandana:

Note: here you will need osmnet 0.1.7 installed as a pandana dependency

import os
import pandana # version: 0.7
import pandas as pd # 
import numpy as np # 
from shapely.geometry import Point 
from pandana.loaders import osm

def get_city_accessibility(admin, POIs):

    # walkability parameters
    walkingspeed_kmh = 5
    walkingspeed_mm  = walkingspeed_kmh * 1000 / 60
    distance = 2500
    POIs['lon'] = POIs.geometry.apply(lambda g: g.x)
    POIs['lat'] = POIs.geometry.apply(lambda g: g.y)

    # bounding box as a list of llcrnrlat, llcrnrlng, urcrnrlat, urcrnrlng
    minx, miny, maxx, maxy = admin.bounds.T[0].to_list()
    bbox = [miny, minx, maxy, maxx]

    # setting the input params, going for the nearest POI
    num_pois = 1
    num_categories = 1
    bbox_string = '_'.join([str(x) for x in bbox])
    net_filename = 'data/network_{}.h5'.format(bbox_string)
    if not os.path.exists('data'): os.makedirs('data')

    # precomputing nework distances
    if os.path.isfile(net_filename):
        # if a street network file already exists, just load the dataset from that
        network = pandana.network.Network.from_hdf5(net_filename)
        method = 'loaded from HDF5'
    else:
        # otherwise, query the OSM API for the street network within the specified bounding box
        network = osm.pdna_network_from_bbox(bbox[0], bbox[1], bbox[2], bbox[3])
        method = 'downloaded from OSM'

        # identify nodes that are connected to fewer than some threshold of other nodes within a given distance
        lcn = network.low_connectivity_nodes(impedance=1000, count=10, imp_name='distance')
        network.save_hdf5(net_filename, rm_nodes=lcn) #remove low-connectivity nodes and save to h5

    # precomputes the range queries (the reachable nodes within this maximum distance)
    # so, as long as you use a smaller distance, cached results will be used
    #network.precompute(distance + 1)

    # compute accessibilities on POIs
    pois = POIs.copy()
    pois = pois.drop(columns = ['geometry'])
    network.init_pois(num_categories=num_categories, max_dist=distance, max_pois=num_pois)
    network.set_pois(category='all', x_col=pois['lon'], y_col=pois['lat'])

    # searches for the n nearest amenities (of all types) to each node in the network
    all_access = network.nearest_pois(distance=distance, category='all', num_pois=num_pois)

    # transform the results into a geodataframe
    nodes = network.nodes_df
    nodes_acc = nodes.merge(all_access[[1]], left_index = True, right_index = True).rename(columns = {1 : 'distance'})
    nodes_acc['time'] = nodes_acc.distance / walkingspeed_mm
    xs = list(nodes_acc.x)
    ys = list(nodes_acc.y)
    nodes_acc['geometry'] = [Point(xs[i], ys[i]) for i in range(len(xs))]
    nodes_acc = gpd.GeoDataFrame(nodes_acc)
    nodes_acc = gpd.overlay(nodes_acc, admin)

    return nodes_acc[['time', 'geometry']]

types = ['Subway', 'Bus', 'Tram', 'Trolleybus']

accessibilities = {}

for t in types:
    gdf_t = gpd.GeoDataFrame(df_merged[df_merged.route_type_en == t])
    print(t, len(gdf_t))
    accessibilities[t] = get_city_accessibility(admin, gdf_t)

Output of this cell, including network data downloading:

Now visualize the maps using color maps designed for the different transport means:

from matplotlib.colors import LinearSegmentedColormap
import matplotlib.colors as mcolors

def create_colormap(color):
    color = mcolors.hex2color(color)
    cmap_dict = {
        'red':   ((0.0, 1.0, 1.0), (1.0, color[0], color[0])),
        'green': ((0.0, 1.0, 1.0), (1.0, color[1], color[1])),
        'blue':  ((0.0, 1.0, 1.0), (1.0, color[2], color[2])),
    }
    cmap = LinearSegmentedColormap('custom_colormap', cmap_dict)
    return cmap

for k, v in accessibilities.items():

    f, ax = plt.subplots(1,1,figsize=(12,7))
    color = transport_colors[k]
    custom_cmap = create_colormap(color)

    v.crs = 4326
    admin.to_crs(23700).plot(ax=ax, color = 'k', edgecolor = 'k', linewidth = 3)
    v.to_crs(23700).plot(column = 'time', ax=ax, cmap = custom_cmap, legend = True,  markersize = 2, alpha = 0.5)

    ax.set_title('Accessibility in minutesn' + k, pad = 40, fontsize = 24)    
    ax.axis('off')
    plt.savefig(k + '.png', bbox_inches = 'tight', dpi = 200)

Output figures for tram, bus, trolleybus, and subway transport accessibility:

6. Overall Public Transport Accessibility

Now, let's use the widely popular spatial indexing method, Uber's H3, and aggregate the availability of different means of transportation by computing the type and accessibility of the nearest stop for each hexagon. We will use hexagon level 9. First, let's get the hexagon split:

import h3
from shapely.geometry import Polygon

# use this function to split an admin polygon into a hexagon grid
def split_admin_boundary_to_hexagons(admin_gdf, resolution):
    coords = list(admin_gdf.geometry.to_list()[0].exterior.coords)
    admin_geojson = {"type": "Polygon",  "coordinates": [coords]}
    hexagons = h3.polyfill(admin_geojson, resolution, geo_json_conformant=True)
    hexagon_geometries = {hex_id : Polygon(h3.h3_to_geo_boundary(hex_id, geo_json=True)) for hex_id in hexagons}
    return gpd.GeoDataFrame(hexagon_geometries.items(), columns = ['hex_id', 'geometry'])

# let's test two resolutions for Budapest
gdf_hexagons = split_admin_boundary_to_hexagons(admin, 9)
gdf_hexagons.plot()

Now merge each DataFrame into the hexagon grid by attaching the average time needed to reach a certain type of transport within a hexagon grid cell:

gdf_hexagons_agg = gdf_hexagons.copy()

for t in types:
    hexagons_dict = gpd.sjoin(gdf_hexagons, accessibilities[t]).groupby(by = 'hex_id').mean().to_dict()['time']
    gdf_hexagons_agg[t] = gdf_hexagons_agg.hex_id.map(hexagons_dict)

gdf_hexagons_agg

The resulting data table:

Finally, add two more columns – one computing the walking time needed to reach the most accessible public transport time within that hexagon grid, while the other column encodes which type of transportation that is.

gdf_hexagons_agg['nearest_time'] = gdf_hexagons_agg[[k for k in gdf_hexagons_agg.keys() if k in types]].min(axis=1)
gdf_hexagons_agg['nearest_type'] = gdf_hexagons_agg[[k for k in gdf_hexagons_agg.keys() if k in types]].idxmin(axis=1)
gdf_hexagons_agg = gdf_hexagons_agg.dropna()
gdf_hexagons_agg.crs = 4326
gdf_hexagons_agg.head(3)

The resulting data table:

The final visual shows us the types of most accessible means of transportation for each hexagon grid, highlighting an ultimate hegemony of bus transport within the city and the accessibility of the different public transport means in Budapest. Certain owntown regions on the northeast side are densely knitted by the Trolley service, while the Tram network complements that of the rest of the inner city.

from matplotlib.colors import ListedColormap
import contextily as ctx

colors0 = [transport_colors[value] for value in gdf_hexagons_agg['nearest_type']]
custom_cmap = ListedColormap(colors0)

f, ax = plt.subplots(1,1,figsize=(15,15))

gdf_hexagons_agg.to_crs(23700).plot(column = 'nearest_type', color = colors0, ax = ax, legend = True)
ax.legend(loc = 'best', fontsize = 10)
ctx.add_basemap(ax, alpha = 0.9, crs = 23700, url = ctx.providers.CartoDB.DarkMatterNoLabels)

The output figure of this cell:

7. Conclusion

This case study highlights how Python-based spatial analytics and fine-grained Geospatial data can contribute to urban planning and transportation in various ways, such as measuring public transport accessibility as the walking time needed to reach different types of transport modes. Such analytics can further be used by planners to identify, for instance, transportation blindspots and design more accessible development plans in the future.

All images created by the author.

Tags: Data Science Data Visualization Geospatial Transportation Urban Planning