Easy Map Boundary Extraction with GeoPandas

Author:Murphy  |  View: 26822  |  Time: 2025-03-22 19:03:37

QUICK SUCCESS DATA SCIENCE

Extracting a border imagined by DALL-E3

I recently came across a geography game where you're shown a segment of border between countries and asked to name them. Here's an example; can you guess the two countries that share this border?

Guess this famous "open" border (by the author)

Here's another "open" border. What's the country to the north?

Guess the border (by the author)

What countries share this super long border:

Name that border! (by the author)

Older Boomers may be familiar with this infamous border region:

Guess the border (by the author)

In order these were the US-Mexico, US-Canada, Argentina-Chile, and Vietnam-Laos-Cambodia borders.

Besides testing your geographic knowledge, you can use extracted borders to make interesting infographics. Here, I show the length of the four longest shared borders:

The top 4 longest shared borders (by the author)

In this Quick Success Data Science project, we'll use Python's open-source Geopandas geospatial library to extract, plot, and calculate the length of political borders. This process will also work with other borders stored in a shapefile, such as province, state, county, and municipal boundaries.


The Process

Here's the high-level process we'll use:

  1. Load the shapefile: use GeoPandas to load a shapefile containing country borders. A shapefile is a Geospatial vector data format for geographic information system (GIS) software.
  2. Filter to relevant countries: use GeoPandas to filter the countries of interest into separate Geopandas GeoDataFrames.
  3. Extract country boundaries: Locate the country boundaries in the "Geometry" column of the GeoDataFrames. This column contains the coordinates used to draw the boundaries.
  4. Find the shared borders: Identify shared borders by computing the intersection of the primary country's boundary with the boundaries of the adjacent countries.
  5. Create a list of border coordinates: Convert the border data from a geometry object (e.g., LineString or MultiLineString) to a list.
  6. Plot the borders: Plot the borders as a GeoPandas GeoSeries.

We'll also look at how to calculate the length of a border.

For an overview of shapefiles and GeoPandas, including installation instructions, please refer to this article:

Comparing Country Sizes with GeoPandas


Download the Shapefile

Natural Earth is a public domain dataset that includes shapefiles. You can download the country boundaries shapefile here (see figure below). This is Natural Earth's most detailed dataset at a scale of 1:10,000,000 (1 cm = 100 km).

Click the Download link highlighted in yellow (Natural Earth webpage)

The link highlighted in the previous figure will download the _ne_10m_admin_0countries.zip file. There's no need to unzip it as GeoPandas can handle zipped data. Store it in the same folder as your Python script or notebook for convenience.


Full Code

The full annotated code follows. I'll describe it in more detail in the sections that follow.

import matplotlib.pyplot as plt
import geopandas as gpd

# Define the primary country name:
# (Must match names in shapefile).
COUNTRY1_NAME = 'Venezuela'

# Define the name(s) of adjacent countries:
# (Must match names in shapefile).
COUNTRY2_NAME = ['Colombia', 'Brazil', 'Guyana']

# Load the shapefile as a GeoDataFrame:
gdf = gpd.read_file('ne_10m_admin_0_countries.zip')
gdf = gdf.rename(columns={'ADMIN': 'name'})

# Make separate GDF for country1 and adjacent countries:
country1 = gdf[gdf['name'] == COUNTRY1_NAME]
others = gdf[gdf['name'].isin(COUNTRY2_NAME)]

# Extract the boundaries:
country1_border = country1.boundary.unary_union
others_border = others.boundary.unary_union

# Find intersections
shared_borders = country1_border.intersection(others_border)

# Ensure geometry is a MultiLineString or LineString:
if shared_borders.geom_type == 'GeometryCollection':
    borders = [geom for geom in shared_borders.geoms if 
               geom.geom_type in ['LineString', 'MultiLineString']]
elif shared_borders.geom_type in ['MultiLineString', 'LineString']:
    borders = [shared_borders]
else:
    borders = []

# Create a dynamic title:
countries_in_title = f"{COUNTRY1_NAME} and {', '.join(COUNTRY2_NAME)}"
title = f"Shared Borders Between {countries_in_title}"

# Set up the figure:
fig, ax = plt.subplots(figsize=(10, 10))
ax.set_title(title, fontsize=14)
# ax.set_xticks([])
# ax.set_yticks([])
# ax.set_xticklabels([])
# ax.set_yticklabels([])

# Plot the borders:
gpd.GeoSeries(borders).plot(ax=ax, 
                            color='firebrick', 
                            linewidth=2)

# plt.savefig(f"{COUNTRY1_NAME}_border_segment.png", dpi=600)

plt.show()

This code plots the shared border of Venezuela, Colombia, and Guyana:

The border between Venezuela, Colombia, and Guyana (by the author)

Importing Libraries and Assigning Constants

Start by importing Matplotlib (for plotting) and GeoPandas (for working with the shapefiles). Next, we assign constants for the name of the primary country (COUNTRY1_NAME) and the names of the adjoining countries (COUNTRY2_NAME) that share its border. The latter variable uses a Python list of one or more country names.

import matplotlib.pyplot as plt
import geopandas as gpd

# Define the primary country name:
# (Must match names in shapefile).
COUNTRY1_NAME = 'Venezuela'

# Define the name(s) of adjacent countries:
# (Must match names in shapefile).
COUNTRY2_NAME = ['Colombia', 'Brazil', 'Guyana']

NOTE: The country names used as constants must match the names in the shapefile. We'll look at this more in the next section.

Loading and Preparing the Shapefiles

Now we load the downloaded shapefile as a GeoDataFrame named "gdf." Note how we don't need to unzip the shapefile.

# Load the shapefile as a GeoDataFrame:
gdf = gpd.read_file('ne_10m_admin_0_countries.zip')
gdf = gdf.rename(columns={'ADMIN': 'name'})

# Make separate GDF for country1 and adjacent countries:
country1 = gdf[gdf['name'] == COUNTRY1_NAME]
others = gdf[gdf['name'].isin(COUNTRY2_NAME)]

The Natural Earth shapefile uses the nonsensical name "ADMIN" for the country name column. We can fix this using the GeoPandas rename() method.

Next, we create two new GeoDataFrames using the newly minted "name" column. For the first (country1) we filter with COUNTRY1_NAME. As the COUNTRY2_NAME variable is a list, we add a twist and use the isin() method. That way, any matching names in the list are used for filtering.

Speaking of matching names, if you're unsure how a country name is spelled or stored in the GeoDataFrame (such as, "United States of America" vs. "USA"), you can print all the names using this command:

print(gdf['name'].unique())

Finding the Shared Borders

With the data loaded in GeoDataFrames, we now extract the borders for each country and find where they intersect.

# Extract the boundaries:
country1_border = country1.boundary.unary_union
others_border = others.boundary.unary_union

# Find intersections
shared_borders = country1_border.intersection(others_border)

# Create list from geometry object:
if shared_borders.geom_type == 'GeometryCollection':
    borders = [geom for geom in shared_borders.geoms if 
               geom.geom_type in ['LineString', 'MultiLineString']]
elif shared_borders.geom_type in ['MultiLineString', 'LineString']:
    borders = [shared_borders]
else:
    borders = []

To extract a border, we chain GeoPandas' boundary operation with Shapely's unary_union property (Shapely is a library that GeoPandas uses under the hood for geometric operations). The first part produces a GeoSeries of Shapely LineString or MultiLineString objects. The second part aggregates multiple geometries by performing a union operation across all the geometries in the GeoSeries (e.g., from country1.boundary). It returns a single Shapely Geometry object (e.g., a LineString, MultiLineString, or GeometryCollection).

The next step identifies the shared border using Shapely's intersection() method. This method calculates the overlapping part of the two geometries, such as lines or polygons. It returns one of many Shapely geometry objects depending on the input geometries and their spatial relationship. Examples include line strings, points, polygons, and more.

Because of the potentially complex output from the intersection() method, we must check that the datatypes are valid for working with lines. We accomplish this with a block of conditionals that returns the final border coordinates as a Python list of Shapely geometries.

Plotting the Results

The remainder of the code plots the border. After setting up the figure, we convert borders from a list to a GeoSeries (gpd.GeoSeries(borders)) and call plot() on this object.

# Create a dynamic title:
countries_in_title = f"{COUNTRY1_NAME} and {', '.join(COUNTRY2_NAME)}"
title = f"Shared Borders Between {countries_in_title}"

# Set up the figure:
fig, ax = plt.subplots(figsize=(10, 10))
ax.set_title(title, fontsize=14)

# Uncomment to hide axis labels and ticks:
# ax.set_xticks([])
# ax.set_yticks([])
# ax.set_xticklabels([])
# ax.set_yticklabels([])

# Plot the borders:
gpd.GeoSeries(borders).plot(ax=ax, 
                            color='firebrick', 
                            linewidth=2)

# plt.savefig(f"{COUNTRY1_NAME}_border_segment.png", dpi=600)

plt.show()

Using a GeoSeries for plotting is preferred because GeoPandas' plotting functionality integrates with geographic projections and allows for more advanced handling of spatial data.

Calculating the Length of the Border

In theory, we can calculate the length of a border after extracting it. The first step is to convert the GeoDataFrame's coordinate reference system (CRS) to an equal area projection (e.g., EPSG:2163) after loading the data:

gdf = gdf.to_crs('EPSG:2163')

Then, after extracting the border as a list of Shapely geometries, we could run the following code:

# Calculate the total length of the shared border:
total_length = sum(line.length for line in borders)
print(f"Total shared border length (in projected CRS units): ")
print(f"{total_length / 1000} km")

Each geometry in borders (e.g., LineString or MultiLineString) has a .length property provided by Shapely. This attribute is the calculated length of the geometry in the units of the current CRS of the data.

Unfortunately, we won't get accurate results with our Natural Earth dataset. According to their website, it's designed to "show the world on a large wall poster." This means it smooths out details for simplicity.

Calculating the true length of a country's border is tricky business. You need a very high-resolution shapefile to handle rugose borders. For example, consider the boundary of Russia and Kazakhstan, which in places follows a meandering river:

Part of the Kazakhstan-Russia border (source: Google Maps)

Most shapefiles won't capture every twist and turn of this border, so we'll underestimate its length. Of course, we'll never know the true length of any border, as we'll never have a shapefile with infinite resolution.

In addition to hi-res shapefiles, you may need to use multiple CRSs. Latitude-dependent distortions in geographic CRSs make lengths appear shorter or longer depending on proximity to the poles. Borders spanning more than six degrees of latitude or longitude should be divided into segments and reprojected using an appropriate local CRS that minimizes distortion in each segment. Options include UTM zones or other locally optimized projections.

The method of measurement is also critical. Length calculations for long borders should be done using geodesic methods, which account for the curvature of the Earth. GIS tools like GeoPandas support geodesic length calculations when using a geographic CRS.


Summary

GeoPandas has tools that help you find and extract boundary lines shared between map regions. Once extracted, the boundaries can be plotted and measured. The accuracy of these measurements depends on the resolution of the shapefile and the projection and measurement methods used.


Thanks!

Thanks for reading and please follow me for more Quick Success Data Science projects in the future.

Tags: borders Data Visualization Geopandas Geospatial Hands On Tutorials

Comment