Floyd-Warshall Algorithm — Junior Level¶
One-line summary: Floyd-Warshall computes the shortest path between every pair of vertices in a weighted graph using a three-line dynamic-programming triple loop in
O(V³)time andO(V²)space, handling negative edges and detecting negative cycles.
Table of Contents¶
- Introduction
- Prerequisites
- Glossary
- Core Concepts
- Big-O Summary
- Real-World Analogies
- Pros & Cons
- Step-by-Step Walkthrough
- Code Examples
- Coding Patterns
- Error Handling
- Performance Tips
- Best Practices
- Edge Cases & Pitfalls
- Common Mistakes
- Cheat Sheet
- Visual Animation
- Summary
- Further Reading
Introduction¶
The Floyd-Warshall algorithm solves the all-pairs shortest path (APSP) problem: given a weighted directed graph, it finds the shortest distance from every vertex to every other vertex. Where Dijkstra answers "how far is everything from this one source?", Floyd-Warshall answers "how far is everything from everything?" — and it does so in a single, astonishingly compact algorithm.
The algorithm was published independently by Robert Floyd (1962), building on Stephen Warshall's (1962) work on transitive closure, and Bernard Roy (1959) earlier still. Its central idea is one of the cleanest examples of dynamic programming you will ever meet:
Consider paths that are only allowed to pass through the first
kvertices as intermediate stops. Growkfrom 0 up toV. At each step, ask: "Does routing through vertexkgive me a shorter trip fromitojthan what I already had?"
That single question, asked for every pair (i, j) and every intermediate k, is the entire algorithm. It is literally three nested loops:
The beauty is that you operate directly on a V × V distance matrix, you never build an explicit graph traversal, and the code handles negative edge weights for free (something Dijkstra cannot do). It even detects negative cycles: if any vertex ends up with a negative distance to itself, a negative cycle exists.
The cost is O(V³) time — fine for dense graphs with a few hundred vertices, a poor choice for sparse graphs with thousands. Knowing when to reach for it is as important as knowing how it works.
Prerequisites¶
Before reading this file you should be comfortable with:
- 2D arrays / matrices — the algorithm lives in a
dist[i][j]grid. - Nested loops — three of them, and their order matters.
- Graphs basics — vertices, directed/undirected edges, edge weights (see sibling topic
01-representation). - The idea of "shortest path" — a path minimizing the sum of edge weights.
- Big-O notation —
O(V³),O(V²). - Infinity as a sentinel — representing "no known path yet" with a large value.
Optional but helpful:
- Brief exposure to single-source shortest path (
04-dijkstra,05-bellman-ford). - Familiarity with dynamic programming — overlapping subproblems and optimal substructure.
Glossary¶
| Term | Meaning |
|---|---|
| All-pairs shortest path (APSP) | The shortest distance between every ordered pair of vertices (i, j). |
Distance matrix dist | A V × V grid where dist[i][j] holds the best known shortest distance from i to j. |
| Intermediate vertex | A vertex a path is allowed to pass through (not an endpoint). |
dp[k][i][j] | Shortest i→j distance using only vertices {0,…,k-1} as intermediates. |
| Relaxation | The update dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j]). |
| INF (infinity) | A sentinel marking "no path known yet." Must be large but guarded against overflow. |
| Transitive closure | The boolean version: is j reachable from i at all? (Warshall's algorithm.) |
| Negative cycle | A cycle whose edge weights sum to a negative value — makes "shortest path" undefined. |
| Path reconstruction | Recovering the actual sequence of vertices, via a next or predecessor matrix. |
| Dense graph | A graph with many edges (close to V²); Floyd-Warshall's sweet spot. |
| Sparse graph | A graph with few edges (close to V); prefer running Dijkstra V times instead. |
Core Concepts¶
1. The Distance Matrix¶
Floyd-Warshall stores everything in one V × V matrix called dist. Initialize it like this:
dist[i][i] = 0 // zero cost to stay put
dist[i][j] = weight(i, j) // if there is a direct edge i→j
dist[i][j] = INF // otherwise (no known path yet)
By the time the algorithm finishes, dist[i][j] holds the true shortest distance from i to j.
2. The Intermediate-Vertex Idea (the DP)¶
Define a subproblem indexed by k:
dp[k][i][j]= the length of the shortest path fromitojthat is only allowed to use vertices{0, 1, …, k-1}as intermediate stops.
When k = 0, no intermediates are allowed, so dp[0][i][j] is just the direct edge (or INF). When k = V, every vertex is allowed, so dp[V][i][j] is the true answer.
The recurrence grows k by one each step. To compute dp[k+1], consider the shortest i→j path allowed to use vertex k. Either:
- It does not use
k: the answer isdp[k][i][j](unchanged), or - It does use
k: the path goesi → … → k → … → j, costingdp[k][i][k] + dp[k][k][j].
Take the smaller:
3. The In-Place O(V²) Optimization¶
The 3D dp[k][i][j] table is wasteful — we only ever need the previous k layer. It turns out you can drop the k dimension entirely and update a single 2D dist matrix in place:
for k in 0..V:
for i in 0..V:
for j in 0..V:
if dist[i][k] + dist[k][j] < dist[i][j]:
dist[i][j] = dist[i][k] + dist[k][j]
Why is overwriting safe? Because when processing layer k, the row dist[k][*] and column dist[*][k] are not changed by this layer's updates (dist[k][j] only improves via dist[k][k] + dist[k][j], and dist[k][k] is 0 for graphs without negative cycles). The full proof lives in professional.md, but the practical takeaway is: one matrix, no third dimension.
4. The Loop Order — k Must Be Outermost¶
This is the single most important rule. The k loop must be the outermost loop:
If you nest them in any other order (e.g. i outside k), you compute garbage, because you would be using updated distances before the intermediate set is correct. Memorize: k, then i, then j.
5. Negative Cycles¶
Floyd-Warshall handles negative edges perfectly. If the graph has a negative cycle, "shortest path" is undefined (you could loop forever and keep shrinking the cost). After running the algorithm, check the diagonal:
A vertex with a negative distance to itself means there is a negative-weight loop reachable from it.
Big-O Summary¶
| Aspect | Complexity | Notes |
|---|---|---|
| Time | O(V³) | Three nested loops, each running V times. |
| Space (in-place) | O(V²) | One distance matrix. |
| Space (with path reconstruction) | O(V²) | One extra next (or pred) matrix. |
| Space (naive 3D DP) | O(V³) | Wasteful; always optimize to 2D. |
| Negative edges | Supported | Unlike Dijkstra. |
| Negative cycle detection | O(V) post-check | Scan the diagonal for dist[i][i] < 0. |
| Query a single pair after running | O(1) | Just read dist[i][j]. |
Compare: running Dijkstra V times is O(V · E log V), which beats O(V³) on sparse graphs but loses on dense ones (and Dijkstra cannot handle negative edges).
Real-World Analogies¶
| Concept | Analogy |
|---|---|
| All-pairs distance matrix | The mileage chart printed on the back of an old road atlas — every city's distance to every other city, in one grid. |
Intermediate vertex k | A flight-connections table where you ask, "If I'm allowed to connect through hub city k, can I get from A to B cheaper than my current best route?" |
Growing k from 0 to V | Slowly unlocking more and more allowed layover airports, recomputing the cheapest itineraries each time you add one. |
| Relaxation step | "Is flying A → hub → B cheaper than the direct A → B fare I already found?" |
| Transitive closure | A "can I even reach city B from city A by any sequence of flights?" reachability table, ignoring price. |
| Negative cycle | A bizarre frequent-flyer loop that pays you every time you fly it — making "the cheapest trip" meaningless. |
Where the analogy breaks: real flight networks are sparse, so airlines would run Dijkstra per source, not Floyd-Warshall. The mileage-chart analogy is best when nearly every city connects to nearly every other.
Pros & Cons¶
| Pros | Cons |
|---|---|
| Computes all pairs in one pass. | O(V³) is too slow for large V (thousands+). |
| Handles negative edge weights. | Cannot handle negative cycles (but detects them). |
| Detects negative cycles via the diagonal. | O(V²) memory — a million-vertex graph needs a trillion cells. |
| Tiny code — three lines of core logic. | Wasteful on sparse graphs vs V × Dijkstra or Johnson's. |
O(1) queries after the single precomputation. | Recomputing on graph changes means re-running from scratch. |
| Same template solves transitive closure, min-max paths, counting. | No early termination; always does the full V³ work. |
When to use: dense graphs, small V (≤ ~400–500), you need every pair, the graph has negative edges, or you want the simplest possible APSP code.
When NOT to use: sparse graphs (use V × Dijkstra or Johnson's algorithm), very large V, or when you only need shortest paths from one source (use 04-dijkstra or 05-bellman-ford).
Step-by-Step Walkthrough¶
Take a 4-vertex directed graph (vertices 0, 1, 2, 3):
Initial matrix (INF shown as ∞, diagonal is 0):
k = 0 (allow paths through vertex 0). Look for i → 0 → j improvements:
dist[1][3] = min(∞, dist[1][0]+dist[0][3]) = min(∞, 8+7) = 15dist[2][1] = min(∞, dist[2][0]+dist[0][1]) = min(∞, 5+3) = 8dist[3][1] = min(∞, dist[3][0]+dist[0][1]) = min(∞, 2+3) = 5dist[3][3] = min(0, dist[3][0]+dist[0][3]) = min(0, 2+7) = 0
k = 1 (now also allow vertex 1):
dist[0][2] = min(∞, dist[0][1]+dist[1][2]) = min(∞, 3+2) = 5dist[2][2],dist[3][2]improve via 1→2:dist[2][2] = min(0, dist[2][1]+dist[1][2]) = min(0, 8+2) = 0(no change)dist[3][2] = min(∞, dist[3][1]+dist[1][2]) = min(∞, 5+2) = 7
k = 2 (also allow vertex 2):
dist[0][3] = min(7, dist[0][2]+dist[2][3]) = min(7, 5+1) = 6dist[1][0] = min(8, dist[1][2]+dist[2][0]) = min(8, 2+5) = 7dist[1][3] = min(15, dist[1][2]+dist[2][3]) = min(15, 2+1) = 3
k = 3 (allow all vertices). Relaxing through 3:
dist[1][0] = min(7, dist[1][3]+dist[3][0]) = min(7, 3+2) = 5dist[1][1] = min(0, dist[1][3]+dist[3][1]) = min(0, 3+5) = 0(no change)dist[2][1] = min(8, dist[2][3]+dist[3][1]) = min(8, 1+5) = 6dist[0][0] = min(0, dist[0][3]+dist[3][0]) = min(0, 6+2) = 0(no change)
Final matrix — every cell is the true shortest distance:
No diagonal entry is negative, so there is no negative cycle.
Code Examples¶
Example: Floyd-Warshall with Path Reconstruction¶
We compute the distance matrix and a next matrix so we can rebuild the actual path.
Go¶
package main
import (
"fmt"
"math"
)
const INF = math.MaxInt64 / 4 // guard: never let INF+INF overflow
// floydWarshall returns dist and next matrices.
// next[i][j] = the first vertex to step to when going from i to j.
func floydWarshall(n int, edges [][3]int) ([][]int, [][]int) {
dist := make([][]int, n)
next := make([][]int, n)
for i := 0; i < n; i++ {
dist[i] = make([]int, n)
next[i] = make([]int, n)
for j := 0; j < n; j++ {
if i == j {
dist[i][j] = 0
} else {
dist[i][j] = INF
}
next[i][j] = -1
}
}
for _, e := range edges {
u, v, w := e[0], e[1], e[2]
dist[u][v] = w
next[u][v] = v
}
for k := 0; k < n; k++ {
for i := 0; i < n; i++ {
for j := 0; j < n; j++ {
// guard against using an unreachable intermediate
if dist[i][k] < INF && dist[k][j] < INF &&
dist[i][k]+dist[k][j] < dist[i][j] {
dist[i][j] = dist[i][k] + dist[k][j]
next[i][j] = next[i][k]
}
}
}
}
return dist, next
}
// path reconstructs i→j using the next matrix.
func path(next [][]int, i, j int) []int {
if next[i][j] == -1 {
return nil // unreachable
}
p := []int{i}
for i != j {
i = next[i][j]
p = append(p, i)
}
return p
}
func main() {
edges := [][3]int{
{0, 1, 3}, {0, 3, 7}, {1, 0, 8}, {1, 2, 2},
{2, 0, 5}, {2, 3, 1}, {3, 0, 2},
}
dist, next := floydWarshall(4, edges)
fmt.Println("dist[1][3] =", dist[1][3]) // 3
fmt.Println("path 1->3 =", path(next, 1, 3)) // [1 2 3]
}
Java¶
import java.util.ArrayList;
import java.util.List;
public class FloydWarshall {
static final int INF = Integer.MAX_VALUE / 4; // overflow guard
static int[][] dist;
static int[][] next;
static void run(int n, int[][] edges) {
dist = new int[n][n];
next = new int[n][n];
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
dist[i][j] = (i == j) ? 0 : INF;
next[i][j] = -1;
}
}
for (int[] e : edges) {
dist[e[0]][e[1]] = e[2];
next[e[0]][e[1]] = e[1];
}
for (int k = 0; k < n; k++) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
if (dist[i][k] < INF && dist[k][j] < INF
&& dist[i][k] + dist[k][j] < dist[i][j]) {
dist[i][j] = dist[i][k] + dist[k][j];
next[i][j] = next[i][k];
}
}
}
}
}
static List<Integer> path(int i, int j) {
List<Integer> p = new ArrayList<>();
if (next[i][j] == -1) return p; // unreachable
p.add(i);
while (i != j) {
i = next[i][j];
p.add(i);
}
return p;
}
public static void main(String[] args) {
int[][] edges = {
{0, 1, 3}, {0, 3, 7}, {1, 0, 8}, {1, 2, 2},
{2, 0, 5}, {2, 3, 1}, {3, 0, 2}
};
run(4, edges);
System.out.println("dist[1][3] = " + dist[1][3]); // 3
System.out.println("path 1->3 = " + path(1, 3)); // [1, 2, 3]
}
}
Python¶
INF = float("inf")
def floyd_warshall(n, edges):
dist = [[0 if i == j else INF for j in range(n)] for i in range(n)]
nxt = [[None] * n for _ in range(n)]
for u, v, w in edges:
dist[u][v] = w
nxt[u][v] = v
for k in range(n):
for i in range(n):
# small speed-up: skip a whole row if i->k is unreachable
if dist[i][k] == INF:
continue
for j in range(n):
if dist[i][k] + dist[k][j] < dist[i][j]:
dist[i][j] = dist[i][k] + dist[k][j]
nxt[i][j] = nxt[i][k]
return dist, nxt
def path(nxt, i, j):
if nxt[i][j] is None:
return [] # unreachable
p = [i]
while i != j:
i = nxt[i][j]
p.append(i)
return p
if __name__ == "__main__":
edges = [
(0, 1, 3), (0, 3, 7), (1, 0, 8), (1, 2, 2),
(2, 0, 5), (2, 3, 1), (3, 0, 2),
]
dist, nxt = floyd_warshall(4, edges)
print("dist[1][3] =", dist[1][3]) # 3
print("path 1->3 =", path(nxt, 1, 3)) # [1, 2, 3]
What it does: builds the all-pairs distance matrix and a next matrix, then reconstructs the path 1 → 2 → 3 of cost 3. Run: go run main.go | javac FloydWarshall.java && java FloydWarshall | python fw.py
Coding Patterns¶
Pattern 1: Plain All-Pairs Distances¶
Intent: You just need the distance matrix, no paths.
for k in range(n):
for i in range(n):
for j in range(n):
if dist[i][k] + dist[k][j] < dist[i][j]:
dist[i][j] = dist[i][k] + dist[k][j]
Pattern 2: Transitive Closure (Reachability)¶
Intent: Boolean "can i reach j?" Replace min/+ with OR/AND:
for k in range(n):
for i in range(n):
for j in range(n):
reach[i][j] = reach[i][j] or (reach[i][k] and reach[k][j])
This is Warshall's algorithm — same triple loop, boolean semiring.
Pattern 3: Negative Cycle Detection¶
Intent: After running, scan the diagonal.
Error Handling¶
| Error | Cause | Fix |
|---|---|---|
Integer overflow on dist[i][k] + dist[k][j] | Adding two INF sentinels (e.g. MaxInt + MaxInt wraps negative). | Use INF = MaxInt / 4, or guard with if dist[i][k] < INF && dist[k][j] < INF. |
| Wrong answers, all garbage | Loop order is not k, i, j. | Put k outermost. Always. |
| Negative cycle silently produces nonsense | Forgot to check the diagonal afterward. | After the loops, scan dist[i][i] < 0. |
IndexError building the matrix | V mismatched with the largest vertex id. | Size the matrix to max(vertex_id) + 1 or the declared V. |
| Path reconstruction loops forever | A negative cycle corrupts the next matrix. | Detect cycles first; refuse to reconstruct if one exists. |
Performance Tips¶
- Use a flat 1D array instead of a 2D nested array for the matrix; index as
dist[i*n + j]. This improves cache locality and can be 2–3× faster. - Hoist
dist[i][k]into a local inside theiloop — it is constant acrossj. - Skip the row when
dist[i][k] == INF: nothing throughkcan improve rowi. - Avoid the overflow guard inside the hottest loop by using a moderate
INFthat cannot overflow; the guard branch hurts the branch predictor. - Bitset the inner loop for transitive closure — pack
reach[i]as a bitset and OR whole rows at once for a ~64× speedup.
Best Practices¶
- Always initialize the diagonal to
0and self-loops carefully (min(0, self-edge weight)). - Pick
INFas a value comfortably below half your integer max soINF + INFnever overflows. - Keep a separate
next(orpred) matrix if you ever need the actual path, not just its cost. - Document your loop order in a comment — future readers must not "tidy" it into the wrong order.
- For sparse graphs, write a brief comment explaining why you did not use Floyd-Warshall (and point to Dijkstra/Johnson's).
Edge Cases & Pitfalls¶
- Loop order — the
kloop must be outermost. This is the classic Floyd-Warshall bug. - INF overflow —
INF + INFwrapping to a negative number can create phantom shortest paths. Guard or use a small INF. - Self-loops with negative weight —
dist[i][i]could go negative immediately; that is a (length-1) negative cycle. - Disconnected graph — unreachable pairs stay
INF. Don't treatINFas a real distance. - Undirected graphs — add both
(u,v)and(v,u)with the same weight. A single negative undirected edge is automatically a negative cycle. - Parallel edges — keep only the minimum weight when initializing
dist[u][v].
Common Mistakes¶
- Wrong loop nesting — putting
iorjoutsidek. Produces wrong distances on most inputs. - Forgetting the overflow guard —
INF(asMaxInt) plus a weight overflows and looks "shorter." - Not detecting negative cycles — reporting distances that are meaningless because a negative loop exists.
- Treating it as single-source — running the whole
O(V³)algorithm when you only needed one source's distances. - Using it on a huge sparse graph —
O(V³)andO(V²)memory explode; Dijkstra-per-source or Johnson's is far better. - Mutating the wrong cell during reconstruction — set
next[i][j] = next[i][k](not= k) when you relax.
Cheat Sheet¶
| Item | Value |
|---|---|
| Time | O(V³) |
| Space | O(V²) |
| Negative edges | Yes |
| Negative cycles | Detect via dist[i][i] < 0; cannot solve |
| Query after run | O(1) |
| Loop order | k outer, then i, then j |
Core relaxation:
Initialization:
Variants (same triple loop, different operators):
| Variant | Combine | Accumulate |
|---|---|---|
| Shortest path | min | + |
| Transitive closure | OR | AND |
| Widest / bottleneck path | max | min |
| Counting paths | + | × |
Visual Animation¶
See
animation.htmlfor an interactive visual animation of the Floyd-Warshall algorithm.The animation demonstrates: - The
distmatrix evolving asksweeps from0toV-1- The currentkrow and column highlighted - The celldist[i][j]being relaxed viadist[i][k] + dist[k][j]- The graph drawn alongside the matrix - Step / Run / Reset controls
Summary¶
Floyd-Warshall is the simplest correct way to compute all-pairs shortest paths. Its engine is one dynamic-programming recurrence — "is routing through vertex k cheaper?" — applied in a three-line triple loop with k on the outside. It runs in O(V³) time and O(V²) space, handles negative edges, and detects negative cycles via a negative diagonal. Use it for dense graphs and small V; reach for Dijkstra-per-source or Johnson's on sparse graphs. Master the loop order and the overflow guard, and you have mastered the algorithm.
Further Reading¶
- Introduction to Algorithms (CLRS), Chapter 25 — "All-Pairs Shortest Paths" — the canonical treatment.
- Robert W. Floyd, Algorithm 97: Shortest Path, CACM 5(6):345, 1962.
- Stephen Warshall, A Theorem on Boolean Matrices, JACM 9(1):11–12, 1962.
- cp-algorithms.com — "Floyd-Warshall algorithm" and "Number of paths of fixed length."
- Sibling topics:
04-dijkstra,05-bellman-ford(single-source),01-representation. - visualgo.net — "Single-Source / All-Pairs Shortest Paths" interactive visualization.