The Floyd-Warshall Algorithm From Graph Theory, Applied to Parsing Molecular Structures

Author:Murphy  |  View: 28188  |  Time: 2025-03-23 11:45:10

The Floyd-Warshall algorithm is essential in graph theory as it provides an efficient means to compute the shortest paths between all pairs of nodes in a graph. This classic dynamic programming algorithm is widely applicable beyond its traditional use in theoretical network analysis. It works by reading in a matrix that describes which pairs of nodes are connected by a single edge, and outputting the minimal number of edges connecting every possible pair of nodes:

A set of nodes (red) linked by edges (light green lines) and then 2 examples of the distances (numbers of links between pairs of nodes, in dark green with dashed lines) calculated by the Floyd-Warshall algorithm. The direct links are encoded in a matrix called the "adjency matrix" whose elements are 1 (nodes linked) or 0 (not linked). The output from the algorithm if a matrix of equal size but containing the distances between all nodes as the minimal numbers of edges separating any two of them. This and all other figures were produced by the author.

This is precious information about connectivity within the graph, that finds tons of applications; for example in optimizing communication networks, analyzing contacts in social networks, or, as I will cover here, in parsing molecular structures – at the core of many tasks in cheminformatics and structural bioinformatics.

In this post I will show you how the Floyd-Warshall algorithm can be employed to compute bond distances between atoms in a molecule, or said in a different way, the minimal number of bonds separating every possible pair of atoms. Hopefully, my example will not only showcase the algorithm's application in Chemistry but also inspire you to use it for other applications in different fields.

And as in most of my projects and tutorials, since I provide all code in JavaScript, you can use it directly in web apps and web-based software. In fact, I extracted and adapted the code shown here from a web-based molecular graphics program that we are developing in my lab, that you can know about here:

The Future of Chemistry Education is just around the corner with HandMol

In the past I have also described other components of this software that involve quite interesting mathematics and Geometry with broader applications beyond chemistry, as with the Kabsch algorithm:

The Definitive Procedure for Aligning Two Sets of 3D Points with the Kabsch Algorithm

The Floyd-Warshall Algorithm in Chemistry

The Floyd-Warshall algorithm is designed to find the shortest paths between every pair of nodes in a graph, making it a powerful tool for network analysis. The algorithm works by systematically considering all possible paths through intermediate nodes and updating the shortest path estimates accordingly. Its efficiency and completeness make it ideal for dense graphs where direct path calculations are needed.

Application to parsing bonds and molecular structure

In chemistry, the concept of graph theory is particularly useful for visualizing and analyzing molecular structures. Atoms can be represented as nodes, and bonds as edges in a graph. By employing the Floyd-Warshall algorithm, we can compute the shortest path, or the number of bonds, separating any two atoms in a molecule, information central to many tasks in cheminformatics and in general in the handling of molecular structures by computer programs:

A molecule of butanol as an example, made of 4 C atoms in grey, one O atom in red, and 10 H atoms in white (spheres) connected by bonds (sticks). The atoms are parsed as nodes in a graph, the bonds as edges, and the number of bonds separating any two atoms corresponds to the distance computed by the Floyd-Warshall algorithm (exemplified with the green dashed curves).

The procedure starts from a list of atoms that are connected directly by a single bond, information provided by a matrix of size n x n where n is the number of atoms (nodes) in the system, and whose elements are 1 if the two atoms are bonded directly (linked by an edge) or 0 otherwise. Such matrix, that I call the "bonds matrix" but corresponds to the "adjacency matrix" in Graph Theory terms (because it describes which nodes are adjacent, i.e. they are linked by just one edge), can be computed from the molecular structure with simple chemistry rules: if two atoms are close enough, they are bonded, just like you can see me forming bonds in this video using the free tool called HandMol:

Let's now walk through the JavaScript code that applies the Floyd-Warshall algorithm to determine the number of bonds between atoms in a molecule.

Code Walkthrough

1. Generating the Bonds MatrixThe function getBondsMatrix() creates a matrix that represents the presence or lack of (direct, chemical) bonds between atoms based on their positions in 3D space. This is the "adjency matrix" in Graph Theory terminology, as it is the matrix that describes pairs of nodes that are adjacent i.e. connected by a direct edge.

The input to getBondsMatrix is simply an array that contains the x, y, z positions for all the atoms of the molecules, and nothing else.

function getBondsMatrix(atomPositions) {
  let bondsMatrix = []; //Initialize empty matrix

  //Now get the distances for all possible pairs of atoms, check
  //if they are within bonding distance

  for (let i = 0; i < atomPositions.length; i++) {
    let distsqr;
    let bondsMatrixRow = []; //Initialize empty row for atom i
    for (let j = i+1; j < atomPositions.length; j++) {
      let newBondsMatrixElement = 0;
      //Get distance squared
      distsqr =
      Math.pow((atomPositions[i].x - atomPositions[j].x), 2) +
      Math.pow((atomPositions[i].y - atomPositions[j].y), 2) +
      Math.pow((atomPositions[i].z - atomPositions[j].z), 2);
      //If distance squared is less than 1.2 x the sum of the
      //atomic radii squared, then atoms i and j are bonded
      const radSum = radiiSum(elementradii[i], elementradii[j]);
      if (distsqr < radSum && chains[i] === chains[j]) {
        newBondsMatrixElement = 1;
      }
      bondsMatrixRow.push(newBondsMatrixElement);
    }
    bondsMatrix.push(bondsMatrixRow);
  }
  return bondsMatrix;
}

Note the very small improvement trick that instead of comparing distances we compare squared distances, in order to skip one sqrt() calculation.

By the way, the threshold for bond determination based on atomic radii is handled by a function that reads in element names and takes their radii from a table:

function radiiSum(elementA, elementB) {
  return 1.2 * Math.pow(elementA.radius + elementB.radius, 2);
}

2. Creating the Distance Matrix via the Floyd-Warshall algorithmThe createDistanceMatrix() function whose code I show you below initializes and populates the distance matrix based on the matrix of bonds produced above and passed as input, using the Floyd-Warshall algorithm.

function createDistanceMatrix(bondsMatrix) {
  adjMatrix = bondsMatrix;
  const n = adjMatrix.length;
  const distMatrix = Array.from({ length: n }, () => Array(n).fill(Infinity));

  // Initialize the distance matrix distMatrix which will at the end
  //of the procedure contain the matrix of bond distances
  for (let i = 0; i < n; i++) {
    for (let j = 0; j < n; j++) {
      if (i === j) {
        distMatrix[i][j] = 0;
      } else if (adjMatrix[i][j] === 1) {
        distMatrix[i][j] = 1;
      }
    }
  }

  // Apply the Floyd-Warshall algorithm on the distance matrix setup above
  for (let k = 0; k < n; k++) {
    for (let i = 0; i < n; i++) {
      for (let j = 0; j < n; j++) {
        if (distMatrix[i][k] < Infinity && distMatrix[k][j] < Infinity) {
          distMatrix[i][j] = Math.min(distMatrix[i][j], distMatrix[i][k] + distMatrix[k][j]);
        }
      }
    }
  }

  return distMatrix;
}

The first pair of nested for loops sets the distance between nodes to 0 for self-loops and 1 for direct bonds, the latter taken from the input adjacency matrix.

Then, the block with 3 nested for loops is the core of the algorithm itself. It updates the distance matrix by considering intermediate nodes to find the shortest paths.

The three nested for loops already advance that the algorithm might not scale very well, and indeed you are right; in fact, it works well in our system for molecular graphics in AR/VR because it is limited to rather small molecules with a few hundreds to low thousands atoms. But let's explore next the algorithm's exact limits and what alternatives are out there.

Scalability of the Floyd-Warshall algorithm

As you saw in my code, the algorithm is rather easy to follow and understand; but as we advanced on closing the previous section, it is not very efficient. It is straightforward and effective for dense graphs or smaller graphs where all pairs of shortest paths need to be calculated. However, its cubic time complexity is feasible only for graphs with a relatively small number of nodes (e.g., hundreds to a few thousand as we use in our molecular graphics software).

As the number of nodes increases, the algorithm's performance deteriorates rapidly given its O(n3) scaling. For large graphs such as those encountered in large-scale social networks, road networks, or molecular systems with thousands of atoms, the time complexity becomes prohibitively expensive. Running time grows cubically with the number of nodes, making it impractical for very large graphs.

In chemistry this might not always be a problem, because many molecular systems that look large or even huge, with tens of thousands to even millions of atoms, are actually made of repetitive elements whose substructures are well-defined upfront. For example, proteins are made of 20 amino acids whose structures are clear and don't need to be computed, and which connect with each other via well-defined rules and positions (the "peptide" bonds). Same for nucleic acids, made of bases of well-known structure connected via simple well-known connections (phosphate groups). In materials science, most materials are regular arrangements of smaller units, that can again be treated easily.

Still, for some applications in chemistry more efficient methods for computing the matrix of bond distances is needed, and researchers are working in developing such kinds of more advanced methods, for example here:

Connectivity stepwise derivation (CSD) method: a generic chemical structure information extraction…

Are There More Efficient Alternatives?

But more broadly in Graph Theory, depending on the nature of the graph and the specific problem, there are more efficient alternatives to the Floyd-Warshall algorithm. I will only overview them briefly here:

  • Dijkstra's Algorithm (Single-Source Shortest Path) uses a priority queue based on Fibonacci heaps to provide a more efficient way to find the shortest path from a single "source" node to all other nodes, especially in sparse graphs. It is faster than Floyd-Warshall for single-source queries, but it does not compute all-pairs shortest paths.
  • Johnson's Algorithm (All-Pairs Shortest Paths) is efficient for sparse graphs when one needs all-pairs shortest paths. It is a variation of Dijkstra's algorithm but for all pairs, still more efficient than Floyd-Warshall for large, sparse graphs.
  • The *A Algorithm** (Single-Pair Shortest Path) solves for single-pair shortest paths using heuristics to try to speed up the search. Its scaling is variable, and in the best-case it can be better than Dijkstra's algorithm. In principle it is the best for finding the shortest path between a specific pair of nodes especially in graphs where heuristics (such as geographical distance, possibly molecular structure in cases of well-defined building blocks) can guide the search efficiently.

The Power of Maths Exemplified in one Algorithm

The Floyd-Warshall algorithm is rather simple, as you saw in the code and this article, but finds tons of applications and thus it stands as a great example of how math impacts technology – and hence our lives directly.

By leveraging this algorithm, researchers and analysts can gain valuable insights into complex systems, from chemical interactions as I focused here because I work with molecular structures everyday, to network optimization in various fronts. In network analysis, the Floyd-Warshall and other algorithms presented above serve to optimize routes in communication networks and data flow. In urban planning, these algorithms help to enhance route planning and traffic management in transportation systems. Applied to social networks, they help to find the shortest paths in social connections and influence networks, central to algorithms that recommend you connections in communities like LinkedIn or Facebook.

If you, like me, are amazed by how important Mathematics is, check this out:

Ensure Your Dose of Math (and Science and Tech) Fix With This YouTube Channel


www.lucianoabriata.com I write about everything that lies in my broad sphere of interests: nature, science, technology, Programming, etc. Subscribe to get my new stories by email. To consult about small jobs check my services page here. You can contact me here. You can tip me here.

Tags: Chemistry Geometry Hands On Tutorials Mathematics Programming

Comment