Floyd-Warshall Algorithm — Mathematical Foundations and Complexity Theory¶
Table of Contents¶
- Formal Definition
- Correctness Proof — the Intermediate-Set DP Invariant
- In-Place Optimization Correctness Proof
- Complexity — O(V³) Time, O(V²) Space
- Closed Semirings and Kleene's Algorithm (Min-Plus Semiring)
- Cache-Blocked Complexity
- Average-Case and Path-Counting Analysis
- Space-Time Trade-offs
- Comparison with Alternatives (asymptotics + constants)
- Open Problems — Subcubic APSP
- Worked k-Step Trace and Matrix Evolution
- Reference Implementations
- 12.1 Go — blocked/tiled Floyd-Warshall
- 12.2 Java — path reconstruction via the next-matrix
- 12.3 Python — bitset transitive closure
- Summary
1. Formal Definition¶
Let G = (V, E, w) be a finite directed graph with vertex set V = {0, 1, …, n-1} and weight function w : E → ℝ. Extend w to a complete weight matrix W ∈ (ℝ ∪ {∞})^{n×n}:
(If parallel edges exist, take W[i][j] = min of their weights.)
Definition 1.1 (Shortest-path distance). For i, j ∈ V, let δ(i, j) be the infimum of w(P) over all walks P from i to j, where w(P) is the sum of edge weights. If a negative-weight cycle is reachable on some i ⇝ j walk, δ(i, j) = -∞.
Definition 1.2 (Restricted distance). For k ∈ {0, 1, …, n}, let
A vertex is intermediate if it is neither the source i nor the destination j. By convention d_0(i, j) = W[i][j] (no intermediates permitted) and the target quantity is d_n(i, j) = δ(i, j) when no negative cycle is reachable.
Definition 1.3 (Floyd-Warshall recurrence).
The algorithm evaluates this recurrence for k = 0, …, n-1, collapsing the k index onto a single matrix D.
2. Correctness Proof — the Intermediate-Set DP Invariant¶
We prove that the recurrence computes d_k correctly by induction on k. Assume throughout that G contains no negative-weight cycle (we handle detection in §3.4); under that assumption every shortest walk may be taken to be a simple path on its intermediate vertices.
Theorem 2.1. For all k ∈ {0, …, n} and all i, j ∈ V, the value d_k(i, j) defined by the recurrence equals the minimum weight of any i→j walk whose intermediate vertices lie in S_k = {0, …, k-1}.
Proof. Induction on k.
Base case k = 0. S_0 = ∅, so the only admissible walks are the empty walk (i = j, weight 0) and single edges i → j (weight W[i][j]). This is exactly d_0 = W. ✓
Inductive step. Assume the claim for k; prove it for k+1. Fix i, j. Let P* be a minimum-weight i→j walk with intermediates in S_{k+1} = S_k ∪ {k}. Two cases:
-
P*does not use vertexkas an intermediate. Then all intermediates ofP*lie inS_k, so by the inductive hypothesis its weight is≥ d_k(i, j). Converselyd_k(i, j)is achievable with intermediates inS_k ⊆ S_{k+1}. Hence the optimum equalsd_k(i, j). -
P*uses vertexkas an intermediate. Because there is no negative cycle, we may assumeP*is simple on its intermediates, so it passes throughkexactly once. SplitP* = P_1 · P_2at that occurrence, whereP_1 : i ⇝ kandP_2 : k ⇝ j. All intermediates ofP_1andP_2lie inS_{k+1} \ {k} = S_k(the only occurrence ofkis the split point, which is an endpoint of each sub-walk, not an intermediate). By the inductive hypothesis,w(P_1) ≥ d_k(i, k)andw(P_2) ≥ d_k(k, j), and both bounds are achievable. Hence the optimum over case 2 equalsd_k(i, k) + d_k(k, j).
Taking the minimum over the two cases yields precisely
which is the recurrence. ∎
Corollary 2.2. After the loop completes (k = n), D[i][j] = d_n(i, j) = δ(i, j) for all i, j, since S_n = V allows every vertex as an intermediate.
The crux is the exactly once argument in case 2: simplicity on intermediates (guaranteed by the absence of negative cycles) is what lets us split at the single occurrence of k and apply the hypothesis to two S_k-restricted sub-walks.
3. In-Place Optimization Correctness Proof¶
The implemented algorithm does not keep separate d_k and d_{k+1} arrays; it overwrites a single matrix D:
We must show this in-place update computes the same values as the layered recurrence, even though D[i][k] and D[k][j] may already be partially updated during the k-th pass.
3.1 The pivot row and column are invariant during pass k¶
Lemma 3.1. During the iteration with fixed k, the entries D[k][j] (pivot row) and D[i][k] (pivot column) are never changed.
Proof. Consider any update to D[k][j] (i.e. i = k): it would set D[k][j] = min(D[k][j], D[k][k] + D[k][j]). We claim D[k][k] = 0 throughout (given no negative cycle): D[k][k] starts at 0 and an update could only make it min(0, D[k][m] + D[m][k]) for prior m; that value is ≥ 0 precisely because there is no negative cycle through k. Hence D[k][k] + D[k][j] = D[k][j], so the min leaves D[k][j] unchanged. The pivot column D[i][k] is symmetric. ∎
3.2 In-place values never undershoot the layered values¶
Lemma 3.2. At every point during pass k, every entry D[i][j] satisfies d_{k+1}(i, j) ≤ D[i][j] ≤ d_k(i, j).
Proof sketch. Every value stored in D corresponds to the weight of some actual walk with intermediates in S_{k+1}, so it is ≥ d_{k+1}(i,j) (lower bound: no value can be smaller than the true restricted optimum). The upper bound holds because the only update applied is the S_{k+1} relaxation, which can only decrease entries from their d_k value. Monotonic decrease plus a valid lower bound pins each entry between the two layers. ∎
3.3 Conclusion¶
Theorem 3.3. After pass k, D[i][j] = d_{k+1}(i, j) for all i, j.
Proof. By Lemma 3.1 the relaxation reads the correct, unmodified pivot-row and pivot-column values D[i][k] = d_k(i, k) and D[k][j] = d_k(k, j) (these equal both their pre-pass and post-pass values, since they do not change). By Lemma 3.2, D[i][j] before the relaxation is sandwiched, and the relaxation min(D[i][j], d_k(i,k) + d_k(k,j)) reaches exactly d_{k+1}(i, j). Whether D[i][j] had already been touched by an earlier (i, j) in this pass is irrelevant: the relaxation is idempotent and the operands are pivot-row/column values that are fixed. Induction over k gives correctness. ∎
This is why no double buffering is needed and why the loop order (k outermost) is essential: Lemma 3.1 requires that the entire pivot row and column for k be settled relative to d_k before any k+1 relaxation begins.
3.4 Negative-cycle detection¶
Proposition 3.4. After the algorithm, G has a negative cycle reachable from i (returning to i) iff D[i][i] < 0.
Proof. D[i][i] = δ(i, i) is the minimum weight of a closed walk at i using all vertices as intermediates. If some negative cycle is reachable from i and back, that closed walk has negative weight, so D[i][i] < 0. Conversely, D[i][i] < 0 exhibits a negative-weight closed walk at i, which contains a negative-weight cycle. ∎
If any diagonal entry is negative, δ is -∞ for affected pairs; a second O(V³) pass can propagate -∞ to every pair (i, j) with a negative-cycle vertex t on some i ⇝ t ⇝ j route.
4. Complexity — O(V³) Time, O(V²) Space¶
Time. Three nested loops over {0,…,n-1}, with O(1) work (one addition, one comparison, one conditional store) per innermost iteration:
There is no data-dependent early termination: the bound is Θ(n³) in the best, average, and worst case alike. This is unlike Dijkstra or Bellman-Ford, whose running times depend on E.
Space. The in-place algorithm uses one n × n matrix: Θ(n²). Path reconstruction adds a second n × n next (or pred) matrix: still Θ(n²). The naive layered DP would use Θ(n³) (one matrix per k); §3 eliminates that.
Word/RAM model. Assuming O(1) arithmetic on machine words, the bounds above hold. With arbitrary-precision weights (path counts, big integers), multiply by the cost of the arithmetic.
5. Closed Semirings and Kleene's Algorithm (Min-Plus Semiring)¶
Floyd-Warshall is a special case of Kleene's algorithm for computing the closure of a matrix over a closed semiring.
Definition 5.1 (Semiring). A semiring (S, ⊕, ⊗, 0̄, 1̄) has a commutative monoid (S, ⊕, 0̄), a monoid (S, ⊗, 1̄), distributivity of ⊗ over ⊕, and 0̄ absorbing under ⊗. A closed semiring additionally supports a well-defined countable ⊕-sum (a closure / star operation a* = ⊕_{i≥0} a^i).
The min-plus (tropical) semiring is (ℝ ∪ {+∞}, min, +, +∞, 0): - ⊕ = min (combine alternative paths), - ⊗ = + (concatenate path weights), - additive identity 0̄ = +∞, multiplicative identity 1̄ = 0.
Over this semiring, the shortest-path matrix is the closure W* of the weight matrix W, where (W*)[i][j] = δ(i, j). The Floyd-Warshall recurrence is exactly Kleene's incremental closure, where the k-th pass folds in the star of the k-th pivot.
Specializing the semiring yields the variants from middle.md:
| Problem | Semiring (S, ⊕, ⊗) | ⊕ (combine) | ⊗ (extend) |
|---|---|---|---|
| Shortest path | min-plus (tropical) | min | + |
| Transitive closure | Boolean ({0,1}, ∨, ∧) | OR | AND |
| Widest / bottleneck | max-min | max | min |
| Minimax | min-max | min | max |
| Most-reliable path | ([0,1], max, ×) | max | × |
| Counting walks | (ℕ, +, ×) | + | × |
Proposition 5.2. For any closed semiring, Kleene's algorithm (the Floyd-Warshall triple loop with (⊕, ⊗) in place of (min, +)) computes the matrix closure W* in Θ(n³) semiring operations. Correctness follows from the same intermediate-set induction as §2, replacing min with ⊕ and + with ⊗ and using semiring distributivity in place of the additivity of path weights.
This semiring abstraction is the deep reason one triple loop solves so many problems: they are all the same closure computation in different algebras.
6. Cache-Blocked Complexity¶
In the external-memory (I/O) model with cache size M and block (line) size B, naive Floyd-Warshall has poor locality once n² > M: each pass streams the whole matrix, giving Θ(n³ / B) cache misses (no temporal reuse across passes).
Blocked Floyd-Warshall partitions the matrix into (n/β) × (n/β) tiles of side β with β² ≤ M (or, for cache-obliviousness, recursively). Each tile's β³ updates touch Θ(β²) data reused β times.
Proposition 6.1. Blocked Floyd-Warshall (Venkataraman, Sahni & Mukhopadhyaya 2003) with tile side β = Θ(√M) incurs
cache misses, an Θ(√M) improvement over the naive Θ(n³ / B). A cache-oblivious recursive divide-and-conquer formulation (Park, Penner & Prasanna 2004) achieves the same Θ(n³ / (B√M)) bound without knowing M or B.
The work stays Θ(n³); only the memory-traffic constant improves — but on modern hardware that constant dominates wall-clock time for n beyond a few hundred.
7. Average-Case and Path-Counting Analysis¶
7.1 No average-case speedup¶
Because the triple loop is oblivious to edge presence and weights, the running time is Θ(n³) for every input — there is no average-case improvement of the leading term. (Contrast Dijkstra, whose cost depends on E.) The only input-dependent savings are micro-optimizations: skipping a row when D[i][k] = ∞ removes O(n) work per such (i, k) but does not change the asymptotic class.
7.2 Counting shortest paths¶
Augment with a count matrix C, initialized C[i][j] = 1 for edges and the diagonal, 0 otherwise. On relaxation:
if D[i][k] + D[k][j] < D[i][j]: D[i][j] = D[i][k]+D[k][j]; C[i][j] = C[i][k]*C[k][j]
if D[i][k] + D[k][j] == D[i][j]: C[i][j] += C[i][k]*C[k][j]
C[i][j] then counts distinct shortest i→j paths. Counts can be exponential in n, so use modular arithmetic for a fixed-width answer. Correctness follows from the same intermediate-set induction, with C accumulating over the disjoint "uses k" / "does not use k" decompositions.
7.3 Walks of fixed length¶
For counting walks (not shortest paths) of exactly L edges, Floyd-Warshall is the wrong tool; that is min-plus / (+,×) matrix power W^L via fast exponentiation in O(n³ log L) (see cp-algorithms "Number of paths of fixed length / shortest paths of fixed length"). Floyd-Warshall computes the closure (unbounded length), not a fixed power.
8. Space-Time Trade-offs¶
| Variant | Time | Space | Notes |
|---|---|---|---|
| Layered 3D DP | Θ(n³) | Θ(n³) | Pedagogical only; never implement. |
| In-place 2D | Θ(n³) | Θ(n²) | Standard. |
In-place + next matrix | Θ(n³) | Θ(n²) (×2 constant) | Adds path reconstruction. |
| Blocked / cache-oblivious | Θ(n³) work, Θ(n³/(B√M)) I/O | Θ(n²) | Same work, far less memory traffic. |
| Bitset transitive closure | Θ(n³ / w) (word size w) | Θ(n² / w) | Pack boolean rows; ~64× constant speedup. |
| Distance + count | Θ(n³) | Θ(n²) (×2) | Number of shortest paths. |
The Θ(n³) time is invariant across all classical variants; the levers are the space (3D → 2D) and the constant factor (blocking, bitsets, SIMD min-plus).
9. Comparison with Alternatives (asymptotics + constants)¶
9.1 All-pairs shortest path methods¶
| Method | Time | Negatives | Best regime |
|---|---|---|---|
| Floyd-Warshall | Θ(V³) | edges ✓, cycles detect | Dense, small V, simple. |
V × Dijkstra (binary heap) | O(V E log V) | ✗ | Sparse, non-negative. |
V × Dijkstra (Fibonacci heap) | O(V² log V + V E) | ✗ | Dense, non-negative (theory). |
| Johnson's | O(V E + V E log V) | ✓ (reweight) | Sparse with negatives. |
V × Bellman-Ford | O(V² E) | ✓ | Rarely optimal; baseline. |
| Min-plus matrix mult (naive) | Θ(V³ log V) | ✓ | Theoretical building block. |
For a sparse graph (E = Θ(V)), Johnson's O(V² log V) crushes Floyd-Warshall's Θ(V³). For a dense graph (E = Θ(V²)), V × Dijkstra is O(V³ log V) — worse than Floyd-Warshall by a log factor, plus Floyd-Warshall handles negatives and is simpler. This is the precise regime boundary.
9.2 Constant factors¶
Floyd-Warshall's inner loop is two arithmetic ops and a branch, fully predictable and vectorizable, on a contiguous matrix — one of the most cache-friendly Θ(n³) kernels in practice. A tuned blocked/SIMD implementation reaches a large fraction of memory bandwidth. Johnson's and V × Dijkstra carry heap overhead and pointer-chasing, so for V up to a few thousand on dense graphs Floyd-Warshall often wins wall-clock despite the log-factor asymptotic disadvantage being on the other side.
10. Open Problems — Subcubic APSP¶
The Θ(n³) bound of Floyd-Warshall has resisted improvement to genuinely polynomial-factor-faster algorithms for general real-weighted graphs.
-
The APSP conjecture. It is conjectured that APSP requires
n^{3-o(1)}time in the worst case (noO(n^{3-ε})truly subcubic algorithm for constantε > 0). APSP is a central problem of fine-grained complexity: it is subcubic-equivalent to a cluster of problems (min-plus matrix product, negative triangle detection, second-shortest paths, replacement paths, radius/median), so a subcubic algorithm for any one would break all. -
Williams' bound. Ryan Williams (2014) gave the first improvement over
n³ / polylog, an APSP algorithm running in
time. This is not truly subcubic (it is n^{3 - o(1)}), but it beats every n³ / log^c n bound for fixed c. It uses circuit-complexity tools (the polynomial method, fast evaluation of low-degree polynomials) rather than algebraic matrix multiplication.
-
Min-plus vs ring matrix product. Ordinary matrix multiplication is
O(n^{ω})withω < 2.372(Williams, Alman–Vassilevska Williams, and successors). But min-plus (tropical) product, which APSP reduces to, has no knownO(n^{ω})algorithm — the tropical semiring lacks subtraction, blocking Strassen-style cancellation. Whether min-plus product is truly subcubic is open and equivalent to subcubic APSP. -
Special cases that are faster.
- Integer weights bounded by
M: APSP inÕ(M^{ω} n^{ω})-style bounds via (max,min) and rectangular matrix multiplication (Zwick);Õ(n^{2.5})-type results for smallM. - Unweighted / undirected: Seidel's algorithm computes undirected unweighted APSP in
O(n^{ω} log n)using real matrix multiplication and a parity trick — genuinely subcubic. -
Planar graphs:
O(n²)APSP exists, exploiting planar separators. -
Approximate APSP.
(1+ε)-approximate APSP admits truly subcubic algorithms via scaling + fast matrix multiplication, sidestepping the tropical barrier.
Floyd-Warshall remains the practical champion for exact, dense, general-weight APSP precisely because the theoretical improvements (Williams') have enormous constants and the truly-subcubic results require special structure.
11. Worked k-Step Trace and Matrix Evolution¶
Concretely tracing the closure makes Lemma 3.1 (pivot row/column invariance) tangible. Take the 4-vertex directed graph with weight matrix W (rows = from, columns = to, ∞ for no edge, 0 on the diagonal):
to: 0 1 2 3
from 0: [ 0, 3, ∞, 7 ]
from 1: [ 8, 0, 2, ∞ ]
from 2: [ 5, ∞, 0, 1 ]
from 3: [ 2, ∞, ∞, 0 ]
We compute d_{k+1} = min(d_k, d_k(·,k) + d_k(k,·)) four times. Each table below is D after the pass for that k. The pivot row and column for the pass are framed with |…| to show they do not change during that pass (Lemma 3.1).
Pass k = 0 (route through vertex 0; pivot = row 0 and column 0). Updated cells: D[1][3] = D[1][0]+D[0][3] = 8+7 = 15; D[3][1] = D[3][0]+D[0][1] = 2+3 = 5; D[2][1] = D[2][0]+D[0][1] = 5+3 = 8; D[3][3] stays 0.
k=0 pivot col
to: 0 1 2 3
from 0: | 0| 3, ∞, 7 <- pivot row 0 (unchanged)
from 1: | 8| 0, 2, 15
from 2: | 5| 8, 0, 1
from 3: | 2| 5, ∞, 0
Pass k = 1 (route through vertex 1). D[0][2] = D[0][1]+D[1][2] = 3+2 = 5; D[3][2] = D[3][1]+D[1][2] = 5+2 = 7; D[2][2] stays 0.
to: 0 1 2 3
from 0: 0, 3, 5, 7
from 1: -- pivot row 1 (unchanged) ----
from 2: 5, 8, 0, 1
from 3: 2, 5, 7, 0
Pass k = 2 (route through vertex 2). D[0][3] = min(7, D[0][2]+D[2][3]) = min(7, 5+1) = 6; D[1][3] = min(15, D[1][2]+D[2][3]) = min(15, 2+1) = 3; D[3][3] stays 0.
to: 0 1 2 3
from 0: 0, 3, 5, 6
from 1: 8, 0, 2, 3
from 2: -- pivot row 2 (unchanged) ----
from 3: 2, 5, 7, 0
Pass k = 3 (route through vertex 3). D[2][0] = min(5, D[2][3]+D[3][0]) = min(5, 1+2) = 3; D[1][0] = min(8, D[1][3]+D[3][0]) = min(8, 3+2) = 5; D[0][0] stays 0.
This final matrix is W* = δ. Verify one entry by hand: δ(1,0) = 5 realized by the walk 1 → 2 → 3 → 0 with weight 2 + 1 + 2 = 5, beating the direct edge 1 → 0 of weight 8. The diagonal stayed 0 throughout, confirming Proposition 3.4 found no negative cycle.
The monotone-decrease picture (Lemma 3.2) is visible column by column: D[1][3] went ∞ → 15 → 3 → 3 → 3, never rising; D[2][0] went 5 → 5 → 5 → 5 → 3. No entry ever increased — every relaxation is a min that can only tighten.
12. Reference Implementations¶
All three implementations target the same Θ(n³) work with the constant-factor and reconstruction techniques discussed above. Code order is Go, Java, Python.
12.1 Go — blocked/tiled Floyd-Warshall¶
The blocked variant (Proposition 6.1) processes the matrix in β × β tiles in three dependency phases per pivot block, reusing each tile β times in cache before eviction.
package main
import "fmt"
const INF = 1 << 30
// blockedFW computes APSP in place on a flat n*n matrix using tile side b.
// n must be a multiple of b (pad with INF rows/cols otherwise).
func blockedFW(dist []int, n, b int) {
at := func(i, j int) int { return i*n + j }
// relaxTile relaxes destination tile rooted at (di,dj) using pivots
// drawn from k in [kk, kk+b). Reads tile (di,kk..) and (kk.., dj).
relaxTile := func(di, dj, kk int) {
for k := kk; k < kk+b; k++ {
for i := di; i < di+b; i++ {
dik := dist[at(i, k)]
if dik >= INF {
continue
}
krow := k * n
irow := i * n
for j := dj; j < dj+b; j++ {
if v := dik + dist[krow+j]; v < dist[irow+j] {
dist[irow+j] = v
}
}
}
}
}
blocks := n / b
for bk := 0; bk < blocks; bk++ {
kk := bk * b
// Phase 1: the pivot diagonal tile depends only on itself.
relaxTile(kk, kk, kk)
// Phase 2: pivot row and pivot column tiles depend on the pivot tile.
for bj := 0; bj < blocks; bj++ {
if bj != bk {
relaxTile(kk, bj*b, kk) // pivot row block
relaxTile(bj*b, kk, kk) // pivot column block
}
}
// Phase 3: every remaining tile depends on its row & column pivots.
for bi := 0; bi < blocks; bi++ {
if bi == bk {
continue
}
for bj := 0; bj < blocks; bj++ {
if bj != bk {
relaxTile(bi*b, bj*b, kk)
}
}
}
}
}
func main() {
n, b := 4, 2
d := []int{
0, 3, INF, 7,
8, 0, 2, INF,
5, INF, 0, 1,
2, INF, INF, 0,
}
blockedFW(d, n, b)
fmt.Println(d[1*n+0]) // 5 (1->2->3->0)
}
The three-phase ordering encodes the layer dependency made explicit: Phase 2 may begin only after Phase 1, and Phase 3 only after Phase 2, but all tiles within a phase are independent and parallelizable.
12.2 Java — path reconstruction via the next-matrix¶
To recover the actual path (not just its cost), keep a next[i][j] matrix: the first hop on a shortest i→j path. On each improving relaxation, inherit next[i][j] = next[i][k].
import java.util.*;
public final class FloydWarshallPaths {
static final int INF = 1 << 30;
// dist and next are n*n; next[i][j] = first vertex after i on a shortest i->j path, or -1.
static void run(int[] dist, int[] next, int n) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
next[i * n + j] = (dist[i * n + j] < INF && i != j) ? j
: (i == j ? i : -1);
}
}
for (int k = 0; k < n; k++) {
int krow = k * n;
for (int i = 0; i < n; i++) {
int dik = dist[i * n + k];
if (dik >= INF) continue;
int irow = i * n;
for (int j = 0; j < n; j++) {
int v = dik + dist[krow + j];
if (v < dist[irow + j]) {
dist[irow + j] = v;
next[irow + j] = next[irow + k]; // route i->...->k->...->j
}
}
}
}
}
static List<Integer> path(int[] next, int n, int u, int v) {
if (next[u * n + v] == -1) return Collections.emptyList();
List<Integer> p = new ArrayList<>();
p.add(u);
while (u != v) {
u = next[u * n + v];
p.add(u);
}
return p;
}
public static void main(String[] args) {
int n = 4;
int[] dist = {
0, 3, INF, 7,
8, 0, 2, INF,
5, INF, 0, 1,
2, INF, INF, 0
};
int[] next = new int[n * n];
run(dist, next, n);
System.out.println(path(next, n, 1, 0)); // [1, 2, 3, 0]
}
}
Reconstruction is O(path length) per query and the next matrix doubles memory but keeps both build and query within the Θ(n²) space class (Table in §8). After a negative cycle the next chain can loop forever, so reconstruction must be gated on D[i][i] >= 0 (Proposition 3.4).
12.3 Python — bitset transitive closure¶
For pure reachability, pack each row of the Boolean matrix into a Python integer (an arbitrary-width bitset). The inner j loop collapses to one bitwise OR over the whole word, giving the Θ(n³ / w) constant-factor win from §8.
def transitive_closure_bitset(adj):
"""adj: list of n integers; bit j of adj[i] set iff edge i->j exists.
Returns reachability rows (i can reach itself by convention)."""
n = len(adj)
reach = [adj[i] | (1 << i) for i in range(n)] # include self
for k in range(n):
bit_k = 1 << k
row_k = reach[k]
for i in range(n):
# If i reaches k, then i reaches everything k reaches.
if reach[i] & bit_k:
reach[i] |= row_k
return reach
def reaches(reach, i, j):
return (reach[i] >> j) & 1 == 1
if __name__ == "__main__":
# 0 -> 1 -> 2 -> 3 (a chain)
adj = [
0b0010, # 0 -> 1
0b0100, # 1 -> 2
0b1000, # 2 -> 3
0b0000, # 3 -> (none)
]
r = transitive_closure_bitset(adj)
print(reaches(r, 0, 3)) # True (0 reaches 3 via the chain)
print(reaches(r, 3, 0)) # False
The whole for j dimension vanishes into one machine-word OR per (i, k) pair: the dominant cost becomes n² word operations, each covering w columns, so the effective work is Θ(n³ / w) with w the word width (64 on a typical machine, or arbitrary for Python big-ints amortized at n/64 words). This is the standard way to push transitive closure on n in the low thousands into the millisecond range, and is the Boolean-semiring specialization of Kleene's algorithm from §5.
13. Summary¶
- Definition. Floyd-Warshall computes the min-plus closure
W*of the weight matrix;δ(i,j) = (W*)[i][j]. - Correctness. Induction over the intermediate set
S_k = {0,…,k-1}: the recurrenced_{k+1}(i,j) = min(d_k(i,j), d_k(i,k)+d_k(k,j))is exact because, absent negative cycles, a shortest walk useskat most once and splits into twoS_k-restricted sub-walks. - In-place. Overwriting one matrix is correct because the pivot row/column
kare invariant during passk(D[k][k]=0), and entries decrease monotonically towardd_{k+1}. This forceskto be the outermost loop. - Complexity.
Θ(V³)time in all cases,Θ(V²)space; no average-case speedup. - Semirings. It is Kleene's closure algorithm; swapping
(min,+)for(OR,AND),(max,min),(+,×), etc. yields closure, bottleneck, counting, and reliability variants. - Cache. Blocked / cache-oblivious variants cut I/O to
Θ(n³/(B√M))without changing theΘ(n³)work. - Negative cycles. Detected by
D[i][i] < 0; results are-∞on affected pairs. - Open problems. APSP is conjectured to need
n^{3-o(1)}; Williams'n³/2^{Ω(√log n)}is the best known general bound, and tropical (min-plus) matrix product has no known subcubic algorithm — the core obstacle.
Roy (1959), Warshall (1962, transitive closure), and Floyd (1962, shortest paths) introduced the method; CLRS Chapter 25 is the canonical reference; Kleene's closure framework (Aho–Hopcroft–Ullman) explains the semiring generality; Williams (2014) and fine-grained complexity frame the open questions. The algorithm is over sixty years old, fits in three lines, and remains optimal to within sub-polynomial factors for its problem — a rare standing in algorithm design.