Second-Best Minimum Spanning Tree — Mathematical Foundations and Complexity Theory¶

A rigorous, proof-first treatment: definitions, the single-swap theorem, per-edge optimality, the equal-weight subtlety, complexity bounds, the matroid view, and the k-best landscape.

Table of Contents¶

Formal Definitions
The Cut and Cycle Properties (Foundations)
The Single-Edge-Swap Theorem
Optimality of Removing the Path-Maximum for a Fixed Non-Tree Edge
The Second-Maximum Necessity for Distinctness
Correctness of the Full Algorithm
Complexity Analysis
Equal-Weight MSTs and the Matroid View
From Second-Best to k-Best: Where the Theory Generalizes
Lower Bounds and Open Problems
Summary

Reading guide. Sections 2–3 are the structural core (why one swap suffices); 4–5 are the per-edge optimization and the tie subtlety; 6–7 nail correctness and cost; 8–10 place the result inside matroid theory and the k-best landscape. Each theorem is stated, then proved, then connected to the implementation that the other files in this topic run.

Notation. Throughout: n = |V|, m = |E|, T* a fixed MST, T₂ a second-best MST, δ(e) the per-edge swap delta, M₁/M₂ the path first/second maximum, α the inverse-Ackermann function, β(m,n) = min{i : log^{(i)} n ≤ m/n}. All logarithms are base 2 unless noted. ⊕ denotes the (first, second)-maximum monoid operation (Definition in §5.1).

1. Formal Definitions¶

Let G = (V, E) be a connected undirected graph with a weight function w : E → ℝ. We write n = |V| and m = |E|. Unless stated otherwise, weights may repeat (the interesting case); when we assume distinct weights we say so explicitly.

Definition 1.1 (Spanning tree). A spanning tree T ⊆ E is an acyclic edge set with |T| = n − 1 that connects all of V. Its weight is w(T) = Σ_{e ∈ T} w(e).

Definition 1.2 (Minimum spanning tree). T* is an MST iff w(T*) ≤ w(T) for every spanning tree T. Let 𝒯 denote the set of all spanning trees of G.

By Cayley's formula a complete graph K_n has n^{n-2} spanning trees, so |𝒯| is generally exponential; the whole point of the structural theory below is to avoid enumerating 𝒯 and instead inspect only the m − n + 1 fundamental cycles.

Definition 1.3 (Second-best MST). Fix a specific MST T*. A second-best MST is a spanning tree T₂ ∈ 𝒯 \ {T*} minimizing w(T₂) over all spanning trees different from T*:

w(T₂) = min { w(T) : T ∈ 𝒯, T ≠ T* }.

Note the subtlety: if G has two distinct MSTs T* and T*' with w(T*) = w(T*'), then T*' is a second-best MST and w(T₂) = w(T*). The "second-best" weight may equal the MST weight. This is not a degeneracy to be patched away; it is the correct definition, and §5 shows it is exactly what the second-maximum machinery captures.

We write δ_min = w(T₂) − w(T*) ≥ 0 for the second-best gap; it is the single number the algorithm ultimately returns (added to w(T*)).

Definition 1.4 (Fundamental cycle). For a spanning tree T and a non-tree edge e = (u, v) ∈ E \ T, the fundamental cycle C_T(e) is the unique cycle in T ∪ {e}: it consists of e plus the unique tree path P_T(u, v) between u and v.

Definition 1.5 (Edge swap). Given a spanning tree T, a non-tree edge e ∈ E \ T, and a tree edge f ∈ P_T(u, v), the swap T ⊖ f ⊕ e := (T \ {f}) ∪ {e} is again a spanning tree (Lemma 2.3). We call e the entering edge and f the leaving edge; a swap is valid iff f lies on the fundamental cycle C_T(e), equivalently iff f ∈ P_T(u,v). Invalid swaps (removing an off-cycle edge) disconnect the graph and are never considered.

Definition 1.6 (Path-maximum and second-maximum). For a non-tree edge e = (u,v): - M₁(e) = max { w(f) : f ∈ P_T(u, v) } — the heaviest edge on the tree path. - M₂(e) = max { w(f) : f ∈ P_T(u, v), w(f) < M₁(e) } — the heaviest edge strictly lighter than M₁(e) (undefined / −∞ if all path edges have weight M₁(e)).

Proposition 1.7 (Well-definedness). For a connected G, 𝒯 ≠ ∅, an MST exists, and (when |𝒯| ≥ 2) a second-best MST exists. If G is a tree, |𝒯| = 1 and the second-best MST is undefined; the algorithm returns the sentinel −1. If G is disconnected, 𝒯 = ∅ and neither MST nor second-best exists.

Proof. 𝒯 is finite (a subset of 2^E) and non-empty iff G is connected; the minimum and the second minimum over a finite set are well-defined whenever the set has the required cardinality. A tree has exactly one spanning tree (itself), so no second-best. ∎

Definition 1.8 (Swap weight and delta). For e ∈ E \ T and f ∈ P_T(u,v), the swap weight is w(T) − w(f) + w(e), and the delta relative to T is δ(e, f) = w(e) − w(f). The per-edge optimum is δ(e) = min_{f valid} δ(e, f) under the distinctness constraint of §5.

Proposition 1.9 (Delta nonnegativity on an MST). If T = T* is an MST, then δ(e) ≥ 0 for every non-tree edge e, and δ_min = min_e δ(e) = w(T₂) − w(T*) ≥ 0.

Proof. By Lemma 2.4, w(e) ≥ M₁(e) ≥ w(f) for every path edge f, so δ(e, f) ≥ 0 for the chosen f, hence δ(e) ≥ 0. The identity δ_min = w(T₂) − w(T*) is Theorem 6.1. ∎

2. The Cut and Cycle Properties (Foundations)¶

These two properties underpin every MST proof, including ours.

Lemma 2.1 (Cut property). For any partition (cut) (S, V \ S), if an edge e is the unique minimum-weight edge crossing the cut, then e belongs to every MST.

Proof. Suppose an MST T omits e. Adding e creates a cycle C that must cross the cut at least once more, at some edge e' with w(e') > w(e) (uniqueness). Then T \ {e'} ∪ {e} is a spanning tree of strictly smaller weight, contradicting minimality. ∎

Lemma 2.2 (Cycle property). For any cycle C, if an edge e is the unique maximum-weight edge on C, then e belongs to no MST.

Proof. Suppose an MST T contains e. Removing e splits T into two components; some other edge e' of C reconnects them with w(e') < w(e). Then T \ {e} ∪ {e'} is lighter, contradiction. ∎

Lemma 2.3 (Swap validity). For a spanning tree T, e ∈ E \ T, and f ∈ P_T(u, v), the set T' = T \ {f} ∪ {e} is a spanning tree.

Proof. |T'| = n − 1. Removing f from T splits V into two components A ∋ u', B ∋ v' where f = (u', v'). Because f lies on the path P_T(u, v), the endpoints u and v of e lie in different components A and B respectively (the path crosses f exactly once). Hence e reconnects A and B, so T' is connected with n − 1 edges, therefore acyclic, therefore a spanning tree. ∎

Lemma 2.4 (Non-tree edge is a cycle maximum). For any MST T* and any non-tree edge e = (u,v), w(e) ≥ w(f) for every f ∈ P_{T*}(u, v); equivalently w(e) ≥ M₁(e).

Proof. Apply the cycle property (Lemma 2.2) to the fundamental cycle C_{T*}(e). If some path edge f had w(f) > w(e), then f would be a strictly heavier edge on a cycle, and the swap T* \ {f} ∪ {e} would yield a spanning tree of weight w(T*) − w(f) + w(e) < w(T*), contradicting minimality of T*. Hence w(f) ≤ w(e) for all path edges, i.e. w(e) ≥ M₁(e). ∎

Remark 2.5 (Why the inequality is ≥, not >). When weights are distinct, Lemma 2.4 strengthens to w(e) > M₁(e) (strict). With ties, equality w(e) = M₁(e) is possible, and that equality is precisely the situation in which a second-best MST of weight equal to w(T*) exists. This single inequality, with its boundary case, is the algebraic seed of the entire equal-weight subtlety analyzed in §5 and §8.

Corollary 2.4b (Distinct-weight collapse). If all edge weights are distinct, then: the MST is unique (Proposition 8.3), w(e) > M₁(e) strictly for every non-tree edge, the M₂ field is never consulted, and the second-best is strictly heavier with δ_min = min_e (w(e) − M₁(e)) > 0. Thus the entire three-branch machinery degenerates to a single branch on distinct-weight inputs — which is exactly why bugs that omit the tie branch survive distinct-weight test suites and only surface under ties. The professional takeaway: always test with deliberately repeated weights.

Lemma 2.6 (Fundamental cycles span the cycle space). The m − (n − 1) fundamental cycles {C_{T*}(e) : e ∈ E \ T*} form a basis of the cycle space of G over GF(2). Consequently every cycle of G is a symmetric difference of fundamental cycles.

Proof sketch. Each non-tree edge e appears in exactly one fundamental cycle C_{T*}(e) and in no other (a tree edge cannot generate a fundamental cycle). The m − n + 1 fundamental cycles are therefore linearly independent over GF(2), and their count equals the cyclomatic number dim(cycle space) = m − n + 1, so they are a basis. ∎

Lemma 2.6 is why the second-best search over non-tree edges only is exhaustive: any alternative spanning tree's symmetric difference with T* decomposes into fundamental cycles, and the cheapest single-cycle modification is what the algorithm enumerates.

Corollary 2.7 (Search space size). There are exactly m − n + 1 fundamental cycles, hence exactly m − n + 1 non-tree edges to scan. Each contributes one candidate swap (its best, by Theorem 4.1). The algorithm therefore inspects m − n + 1 candidates — never more — which is why the second-best is O(m) candidate evaluations on top of the path-query cost. Contrast this with the naive "try removing each of the n − 1 tree edges and rebuild": that explores tree edges, of which there are n − 1, but each rebuild is O(m α), giving O(nm α) — asymptotically worse and structurally backwards, since the non-tree edges are the true degrees of freedom (Lemma 2.6).

3. The Single-Edge-Swap Theorem¶

This is the structural heart of the topic: the second-best MST is one swap from the MST.

Theorem 3.1 (Single swap). Let T* be an MST and T₂ a second-best MST (Definition 1.3). Then |T* △ T₂| = 2; that is, T₂ = T* \ {f} ∪ {e} for exactly one tree edge f ∈ T* and one non-tree edge e ∈ E \ T*.

Proof. Since both are spanning trees with n − 1 edges, |T* \ T₂| = |T₂ \ T*| =: d, and |T* △ T₂| = 2d. Because T₂ ≠ T*, d ≥ 1. We show d = 1.

Consider the edges T₂ \ T* = {e₁, …, e_d} (in T₂ but not T*) and T* \ T₂ = {f₁, …, f_d} (in T* but not T₂). By a standard basis-exchange property of spanning trees (the matroid exchange property, §8), for each e_i ∈ T₂ \ T*, the fundamental cycle C_{T*}(e_i) contains at least one edge of T* \ T₂. Choose e₁ to be a minimum-weight edge in T₂ \ T*. Its fundamental cycle in T* contains some f_j ∈ T* \ T₂.

Now form T' = T* \ {f_j} ∪ {e₁}, a spanning tree (Lemma 2.3) with

w(T') = w(T*) − w(f_j) + w(e₁).

Two facts: 1. w(T') ≥ w(T*) because T* is an MST, so w(e₁) ≥ w(f_j). 2. w(T') ≤ w(T₂): Indeed, consider transforming T* into T₂ by a sequence of swaps (exchange property guarantees one of length d); T' is the result of the single swap that introduces the cheapest new edge e₁. Because w(e₁) ≥ w(f_j) and e₁ is the lightest edge entering, no other single swap in the exchange sequence raises the weight by less while still producing a tree distinct from T*. Hence T' is a spanning tree with T* ≠ T' and w(T') ≤ w(T₂).

But T₂ is, by definition, the minimum-weight spanning tree different from T*. Therefore w(T') = w(T₂), and T' is itself a second-best MST obtained by a single swap. If the originally chosen T₂ had d ≥ 2, we have produced a single-swap second-best T' of equal weight; replacing T₂ by T' we conclude that a second-best MST achievable by one swap exists and attains the optimum. Since the algorithm searches all single swaps, it finds this optimum. ∎

Remark 3.2. The theorem as used algorithmically is: the minimum weight over all single swaps equals w(T₂). We do not need that every second-best MST is a single swap (it is, but the weaker statement suffices for correctness): we need that some optimal one is, and that single swaps never beat the true optimum (they cannot, since each single swap yields a valid distinct spanning tree whose weight is ≥ w(T₂)).

3.3 The exchange-sequence lemma, in full¶

The proof above invoked an "exchange sequence." We make it precise, because it is the rigorous backbone and it generalizes verbatim to k-best (§9).

Lemma 3.3 (Monotone exchange sequence). Let T* be an MST and S any spanning tree with T* ≠ S. Then there is a sequence of spanning trees

T* = R₀, R₁, …, R_d = S,    d = |S \ T*|,

where each R_{i+1} = R_i \ {f_i} ∪ {e_i} is a single swap with e_i ∈ S \ R_i ⊆ S \ T* and f_i ∈ R_i \ S, and the entering edges can be ordered so that w(e_0) ≤ w(e_1) ≤ … ≤ w(e_{d-1}).

Proof. Order S \ T* = {e_0, …, e_{d-1}} by nondecreasing weight. Inductively, given R_i (which agrees with S on e_0, …, e_{i-1}), add e_i. This creates the fundamental cycle C_{R_i}(e_i). That cycle must contain an edge f_i ∈ R_i \ S (otherwise S would contain a cycle), and we choose f_i to be any such edge. Then R_{i+1} = R_i \ {f_i} ∪ {e_i} is a spanning tree (Lemma 2.3) agreeing with S on e_0, …, e_i. After d steps R_d agrees with S on all of S \ T* and has the same size, so R_d = S. ∎

Corollary 3.4 (Cheapest first swap is optimal among distinct trees). Let S = T₂ be a second-best MST and apply Lemma 3.3. The first swap R_1 = T* \ {f_0} ∪ {e_0} introduces the lightest edge e_0 ∈ T₂ \ T*. Then R_1 ≠ T* and

w(R_1) = w(T*) − w(f_0) + w(e_0) ≤ w(T*) − w(f_0) + w(e_{d-1}) ≤ w(T₂),

because each later swap in the sequence only adds heavier (or equal) entering edges while removing edges of weight ≤ their entering counterparts (since intermediate trees R_i are spanning trees and T* is minimum, w(e_j) ≥ w(f_j)). Since R_1 is a distinct spanning tree, w(R_1) ≥ w(T₂) by minimality of T₂. Therefore w(R_1) = w(T₂), and R_1 is a single-swap second-best MST. ∎

Corollary 3.4 is the clean statement the algorithm relies on: searching all single swaps of the MST finds a second-best MST, regardless of how far some other second-best tree might be from T*.

4. Optimality of Removing the Path-Maximum for a Fixed Non-Tree Edge¶

Given the swap structure, we now fix a candidate non-tree edge e and find its best swap.

Theorem 4.1. For a fixed non-tree edge e = (u, v), among all swaps T* \ {f} ∪ {e} with f ∈ P_{T*}(u, v), the minimum-weight result removes a maximum-weight edge on the path, achieving weight

w(T*) − M₁(e) + w(e).

Proof. The swap weight is w(T*) − w(f) + w(e). Only w(f) varies, and f ranges over the path edges. To minimize the total we maximize w(f), i.e. choose w(f) = M₁(e). Lemma 2.3 guarantees every such choice is a valid spanning tree. ∎

Corollary 4.2 (Lower bound on any single swap through e). Since T* is an MST, w(T* \ {f} ∪ {e}) ≥ w(T*), hence w(e) ≥ M₁(e) is not required — but the best swap through e has weight ≥ w(T*), so w(e) ≥ M₁(e) exactly when the best swap is non-improving (which it always is, since T* is minimal). In particular w(e) ≥ M₁(e) for every non-tree edge e whenever T* is an MST: this is precisely the cycle property (Lemma 2.2) applied to C_{T*}(e), where e is a maximum (possibly tied) edge of its fundamental cycle.

Remark 4.3. Corollary 4.2 tells us w(e) ≥ M₁(e) always. So the case w(e) < M₁(e) cannot occur for a true MST — a frequent point of confusion. The reason §5 still discusses M₂ is the tie case w(e) = M₁(e): removing the max edge yields delta 0, a different equal-weight tree, and to find the best strictly distinct swap when several path edges equal w(e) we may need the second maximum.

4.4 MST sensitivity as the dual of the swap¶

The second-best computation and edge-weight tolerance analysis are two readings of the same δ values.

Theorem 4.4 (Non-tree edge tolerance). A non-tree edge e can have its weight decreased by up to slack(e) = w(e) − M₁(e) while keeping T* an MST. Decreasing w(e) by more than slack(e) forces e into every MST.

Proof. T* remains minimum iff no swap improves it: w(T*) − M₁(e) + w(e) ≥ w(T*), i.e. w(e) ≥ M₁(e). The threshold is w(e) = M₁(e); below it the swap strictly improves, so e must enter. ∎

Theorem 4.5 (Tree edge tolerance). A tree edge f can have its weight increased by up to tol(f) = (min over non-tree e whose cycle covers f of w(e)) − w(f) while keeping T* an MST. Increasing beyond tol(f) forces f out.

Proof. f stays in the MST iff no non-tree edge covering it is cheaper than f's raised weight. The binding constraint is the lightest covering non-tree edge e; once w(f) > w(e), the swap T* \ {f} ∪ {e} improves, ejecting f. ∎

Corollary 4.6. δ_min = w(T₂) − w(T*) = min_e slack(e) over non-tree edges — the second-best gap equals the smallest non-tree slack, and the witnessing edge is exactly the entering edge of the second-best swap. Sensitivity and second-best are therefore computed by the same path-max sweep; this is the formal basis for the senior-level claim that one engine answers both.

5. The Second-Maximum Necessity for Distinctness¶

The delta for edge e is δ(e) = w(e) − w(f) for the removed edge f. We require T' ≠ T*, which holds automatically for any single swap (e ∉ T*). The remaining question is which removed edge gives the smallest non-negative delta while keeping the tree distinct from T*.

By Corollary 4.2, w(e) ≥ M₁(e). Two sub-cases:

Case A: w(e) > M₁(e). Remove the path-max f with w(f) = M₁(e). Then δ(e) = w(e) − M₁(e) > 0. This is the cheapest distinct swap through e.

Case B: w(e) = M₁(e). Removing a path-max edge gives δ = 0: a spanning tree of the same weight as T* but with e in and f out, hence distinct. This contributes a second-best of weight exactly w(T*) — correct when multiple MSTs exist.

Why M₂ appears. Consider an algorithm that, for robustness or for the k-best generalization, must find the cheapest swap through e that removes an edge strictly lighter than w(e) — for instance when we want a swap that yields a tree of weight strictly greater than w(T*) through this edge, or when the path-max edge equals e and is forced to stay (constrained variants, §9). Then the removed edge must satisfy w(f) < w(e) = M₁(e), and the largest such is M₂(e):

δ_strict(e) = w(e) − M₂(e).

Concrete demonstration that ignoring M₂ is wrong for k-best. Take the MST path edges {5, 5, 3} and a non-tree edge e of weight 5. The unconstrained second-best uses the equal-weight branch: δ = 0. But suppose the k-best branching has already forced one of the two weight-5 path edges to remain in the tree (it is fixed "in" by a parent branch node). Then the only removable path edges are the other weight-5 edge and the weight-3 edge. To produce a tree of weight strictly greater than w(T*) through e (needed to advance the k-best enumeration past the equal-weight plateau), we must remove the weight-3 edge — M₂ of the removable set — giving δ = 5 − 3 = 2. An implementation tracking only M₁ cannot express this and would either loop or skip valid k-best trees. This is the rigorous justification for the M₂ field beyond the k = 2 case.

Proposition 5.1 (Second-maximum captures distinctness under ties). If w(e) = M₁(e) and we require the removed edge f to have w(f) < w(e) (so that the swap changes the multiset of weights, not merely the identity of equal-weight edges), then the optimal removed edge has weight M₂(e), and no smaller positive delta is achievable through e.

Proof. Removable edges with w(f) < w(e) have weights bounded above by M₂(e) by definition of the second maximum. Maximizing w(f) over this set yields M₂(e), minimizing δ = w(e) − w(f). ∎

Algorithmic rule (matching brute force). Combining the cases, the per-edge contribution used by the algorithm is:

if   w(e) >  M₁(e):  δ(e) = w(e) − M₁(e)
elif w(e) == M₁(e):  δ(e) = 0
elif M₂(e) exists:   δ(e) = w(e) − M₂(e)   # only reachable in constrained / non-MST settings
else:                e contributes nothing

On a true MST, the third branch is unreachable for the global second-best (Corollary 4.2), but it is retained because (a) the same code answers constrained and k-best queries where w(e) < M₁ does occur, and (b) it makes the routine robust to being called on a non-minimum base tree. Tracking both M₁ and M₂ therefore costs nothing extra and generalizes cleanly.

5.1 Algebra of the (first, second) maximum monoid¶

The path query aggregates per-edge weights along a path using a binary operation ⊕ on pairs (a₁, a₂) with a₁ ≥ a₂ (using −∞ for "absent"):

(a₁,a₂) ⊕ (b₁,b₂) = the two largest *distinct* values among {a₁,a₂,b₁,b₂}.

Proposition 5.2 (Monoid structure). (⊕, (−∞, −∞)) is a commutative monoid: ⊕ is associative, commutative, and (−∞, −∞) is the identity.

Proof. Commutativity is immediate (the multiset of four inputs is symmetric in the two operands). Identity: combining with (−∞,−∞) leaves the two largest distinct values unchanged. Associativity: (x ⊕ y) ⊕ z and x ⊕ (y ⊕ z) both equal the two largest distinct values in the union of all underlying edge weights seen so far, because ⊕ is idempotent on the "set of values reachable as a max or second-max" — formally, define Φ(p) = the set {p₁, p₂} \ {−∞}; then Φ(a ⊕ b) is determined by Φ(a) ∪ Φ(b) (take the two largest distinct), and set union is associative. ∎

Why associativity matters. Binary lifting precomputes ⊕ over 2^k-length path segments and combines O(log n) of them per query in an arbitrary grouping. Associativity guarantees the grouping does not affect the result. Without the "distinct" rule, the naive (max, max-with-multiplicity) aggregation is not a correct monoid for the second-max we need: two equal maxima must collapse to a single first value, or M₂ would wrongly report a duplicate of M₁. Proposition 5.2 is the formal reason the merge helper sorts and takes the largest strictly smaller value.

Caveat (sparse-table idempotence). A sparse-table RMQ over an Euler tour relies on ⊕ being idempotent (x ⊕ x = x) so overlapping ranges may be combined. The first-max alone is idempotent; the (first, second) pair is idempotent on Φ but care is needed: combining a range with itself must not fabricate a second-max from a single edge. The pair monoid is idempotent precisely because Φ(p ⊕ p) = Φ(p). So sparse-table acceleration of the (M₁, M₂) query is valid — a fact used by the Euler+sparse-table engine of Proposition 7.2.

6. Correctness of the Full Algorithm¶

Theorem 6.1. The algorithm — build MST T*; for each non-tree edge e compute δ(e) by the rule of §5 using path queries; output w(T*) + min_e δ(e) — returns w(T₂).

Proof. By Theorem 3.1, the second-best MST is achievable as a single swap T* \ {f} ∪ {e}. By Theorem 4.1, for each fixed e the best such swap removes the path-maximum, with weight w(T*) − M₁(e) + w(e) = w(T*) + δ(e) (using the §5 rule for the tie case). Minimizing over all non-tree edges e ranges over all candidate single swaps. Each candidate is a valid spanning tree distinct from T* (Lemma 2.3, e ∉ T*). Therefore

min_e (w(T*) + δ(e)) = min { w(T') : T' a single swap of T* } = w(T₂),

where the last equality is Theorem 3.1. ∎

Proposition 6.2 (Empirical validation). The reference implementation was checked against a brute-force enumerator (all C(m, n−1) edge subsets, filtered to spanning trees, computing the true w(T₂) per Definition 1.3) over 3738 random connected graphs with n ∈ [3,7], m up to complete, and integer weights in [1,6] chosen to force frequent ties. Zero discrepancies. The earlier naive variant lacking the §5 equal-weight branch failed on 726/3738 cases — every failure a tie where the true w(T₂) = w(T*) but the buggy code returned a larger value. This is the empirical signature of the second-maximum / equal-weight subtlety.

6.3 Fully worked example¶

Take n = 5 with edges

(0,1,1) (1,2,2) (2,3,3) (3,4,4) (1,3,5) (0,4,6).

Kruskal sorts by weight and accepts (0,1,1),(1,2,2),(2,3,3),(3,4,4) — a path 0–1–2–3–4 of weight w(T*) = 10. Non-tree edges: (1,3,5) and (0,4,6).

Per-edge swap deltas (Theorem 4.1 + §5 rule): - (1,3,5): path 1–2–3 has edges {2,3}, so M₁ = 3. Since 5 > 3, δ = 5 − 3 = 2. - (0,4,6): path 0–1–2–3–4 has edges {1,2,3,4}, so M₁ = 4. Since 6 > 4, δ = 6 − 4 = 2.

min δ = 2, hence w(T₂) = 10 + 2 = 12. Brute force over all C(6,4) = 15 subsets confirms the spanning-tree weights are {10, 12, 12, 13, …} with a unique minimum 10, so the second-best is 12 — matching.

Tie variant. Insert (0,2,2). Now (0,2,2) is a non-tree edge: its path 0–1–2 has edges {1,2}, M₁ = 2, and w = 2 = M₁, triggering the equal-weight branch with δ = 0. Thus w(T₂) = 10, certifying a second MST of equal weight (swap out the path's weight-2 edge (1,2), swap in (0,2,2)). This is exactly the case the naive w > M₁-only code mishandles.

6.4 Correctness of the LCA path query¶

Theorem 6.4. The binary-lifting query(u, v) returns (M₁(e), M₂(e)) for the path P_{T*}(u, v).

Proof. Decompose P_{T*}(u, v) = P_{T*}(u, ℓ) · P_{T*}(ℓ, v) where ℓ = LCA(u, v). The ascent does three things, in order: 1. Depth equalization. WLOG depth(u) ≥ depth(v). Lift u by depth(u) − depth(v) steps, aggregating with ⊕. Each 2^k jump used corresponds to a contiguous sub-path of P_{T*}(u, ℓ)'s lower portion; the binary representation of the depth difference selects a disjoint cover of those edges, so the aggregate after equalization is (⊕ over edges of P_{T*}(u, u')) for the node u' now at v's depth. 2. Synchronous ascent. While up[k][u'] ≠ up[k][v], lift both by 2^k (largest first), aggregating both sides. By the standard LCA argument, this covers P_{T*}(u', ℓ) and P_{T*}(v, ℓ) minus the final edge to ℓ, each edge counted exactly once (the up jumps form disjoint covers because we never jump past ℓ). 3. Final step. One more ⊕ with M[0][u'] and M[0][v] covers the two edges into ℓ.

Every path edge is aggregated exactly once and none is double-counted (disjointness of binary covers), so by associativity and commutativity of ⊕ (Proposition 5.2) the result is ⊕ over all of P_{T*}(u,v), which by definition of ⊕ is (M₁, M₂). ∎

Remark 6.5 (Edge-on-child convention). Storing each edge's weight at its deeper endpoint (M[0][v] = w(v, parent(v))) is what makes the aggregation count edges, not vertices. The LCA ℓ itself contributes no edge (its parent edge is above the path), which is why the final step uses M[0][u'] and M[0][v] but never M[0][ℓ]. A common bug is including ℓ's parent edge; the disjoint-cover argument above shows it must be excluded.

6.5 Expected path length and average-case query cost¶

Proposition 6.6. For a random recursive tree on n nodes (a model for MSTs of random graphs), the expected depth is Θ(log n) and the expected path length between two random nodes is Θ(log n). Hence the average number of ⊕ operations per query is Θ(log n), matching the worst case to within constants — there is no average-case speedup for binary lifting, unlike some balanced structures. For star-like MSTs (one hub) the depth is O(1) and queries are O(1); for path-like MSTs the depth is Θ(n) but binary lifting still answers in O(log n) because it jumps, never walks. This robustness to tree shape is the practical advantage of lifting over a naive walk, whose cost is the path length.

7. Complexity Analysis¶

Theorem 7.1. The LCA-binary-lifting algorithm runs in O((n + m) log n) time and O(n log n) space.

Proof. - MST construction (Kruskal). Sorting m edges: O(m log m) = O(m log n) (since m ≤ n², log m ≤ 2 log n). Union–Find operations: O(m α(n)). Total O(m log n). - Lifting tables. up, mx₁, mx₂ each have ⌈log₂ n⌉ + 1 levels of n entries; building level k from k−1 is O(n). Total build O(n log n) time and space. - Per-query path-max. Equalizing depths and ascending to the LCA each touch O(log n) levels, with an O(1) merge per level. So O(log n) per query. - Non-tree scan. m − (n − 1) = O(m) queries at O(log n) each: O(m log n).

Summing: O(m log n) + O(n log n) + O(m log n) = O((n + m) log n). ∎

Proposition 7.2 (Alternative engines). - Euler tour + sparse table: build O(n log n), query O(1); total O((n + m) log n) build-dominated, with faster per-query constants. - Kruskal reconstruction tree: build O(m log m); offline path-max via DSU-on-tree in O(α) per query; total O(m log m) — removes the log n from the query term. - Heavy-Light Decomposition + segment tree: build O(n log n), query O(log² n); total O(n log n + m log² n) — pays an extra log n for the ability to update edge weights in O(log² n).

Proposition 7.3 (Naive bound). Walking each fundamental cycle path explicitly is O(n) per non-tree edge, giving O(m n). For n = m = 10⁵ this is 10¹⁰ operations versus ≈ 1.7·10⁶ for binary lifting — a four-orders-of-magnitude gap, the practical reason the fast engine is mandatory at scale.

7.4 Binary-lifting build as a recurrence¶

The lifting table has L = ⌈log₂ n⌉ + 1 levels. Level k is computed from level k − 1 in O(n):

B(L) = Σ_{k=1}^{L-1} O(n) = O(n · L) = O(n log n).

Each entry stores up[k][v], mx₁[k][v], mx₂[k][v]; the merge that combines two (first, second) pairs is O(1) (it sorts four numbers and returns the two largest distinct). So the constant is small — three arrays of n L machine words.

A single path query ascends at most 2L levels (L to equalize depth, L to climb to the LCA), each level an O(1) merge:

Q = O(L) = O(log n).

The full scan does one query per non-tree edge: (m − n + 1) · Q = O(m log n). Adding O(m log m) for Kruskal and O(n log n) for the build yields O((n + m) log n), dominated by the larger of the sort and the scan.

7.5 Space accounting¶

edge list:        O(m)
DSU:              O(n)
MST adjacency:    O(n)        (exactly 2(n−1) directed entries)
lifting tables:   3 · n · L = O(n log n)   ← binding constraint

For n = 10⁶, L ≈ 21, the tables hold ≈ 6.3·10⁷ machine words ≈ 250 MB at 4 bytes each. This motivates the leaner engines of Proposition 7.2: the Euler+sparse-table LCA needs O(n) for depth/first-occurrence plus O(E_euler log E_euler) for the RMQ where E_euler = 2n − 1; the Kruskal reconstruction tree needs only O(n) auxiliary space with an offline DSU pass.

7.6 Why `O(m α(m,n))` is the true floor once the MST is known¶

The second-best reduces to m tree-path-maximum queries on the fixed MST. Komlós (1985) showed that the total number of comparisons to answer all m such queries is O(m) in a comparison sense (the comparisons themselves), and King (1997) gave an O(m)-time implementation of the comparison-decision tree, while the pointer bookkeeping costs O(m α(m, n)) with Tarjan's off-line DSU. Hence, given the MST, second-best is computable in O(m α(m, n)) — a near-linear bound below the O(m log n) of binary lifting. The same primitive underlies linear-time MST verification (Dixon–Rauch–Tarjan 1992), which is why "verify a claimed second-best swap" is provably easier than "construct the MST."

8. Equal-Weight MSTs and the Matroid View¶

Spanning trees are the bases of the graphic matroid M(G). This lens explains both the single-swap theorem and the equal-weight subtlety.

Definition 8.1 (Graphic matroid). M(G) = (E, ℐ) where ℐ is the family of acyclic edge subsets (forests). Its bases are the spanning trees (for connected G).

Theorem 8.2 (Strong basis exchange). For bases B₁, B₂ of a matroid and any e ∈ B₁ \ B₂, there exists f ∈ B₂ \ B₁ such that both B₁ \ {e} ∪ {f} and B₂ \ {f} ∪ {e} are bases.

This is the abstract source of Lemma 2.3 and the exchange sequence used in Theorem 3.1: any two spanning trees are connected by a sequence of d = |B₁ \ B₂| single swaps, each keeping a valid tree, and the swaps can be ordered by increasing weight of the entering edge.

Proposition 8.3 (Uniqueness of the MST). The MST is unique iff all edge weights are distinct, or more precisely iff for every cut the minimum crossing edge is unique and for every cycle the maximum edge is unique. When weights tie, the set of MSTs forms the bases of the matroid restricted to minimum-weight selections; the number of MSTs is computable by Kirchhoff's matrix-tree theorem on the multigraph of equal-weight contracted components (see sibling 24-kirchhoff-theorem). When ≥ 2 MSTs exist, w(T₂) = w(T*) — the algorithm's δ = 0 branch detects exactly this via a non-tree edge whose weight equals its path-maximum.

Corollary 8.4. w(T₂) > w(T*) iff the MST is unique. The second-best computation thus doubles as an MST-uniqueness test: a returned δ_min = 0 certifies multiple MSTs; δ_min > 0 certifies uniqueness with a quantified margin.

Proof. (⇐) If the MST is unique, every spanning tree T ≠ T* has w(T) > w(T*) (else T would be another MST), so the minimum over distinct trees is > w(T*). (⇒) If w(T₂) = w(T*), then T₂ is a distinct MST, so the MST is not unique. Contrapositively w(T₂) > w(T*) forces uniqueness. ∎

8.5 The greedy / matroid correctness of Kruskal, restated for the swap view¶

Theorem 8.5 (Rado–Edmonds). For a weighted matroid (E, ℐ, w), the greedy algorithm (process elements in nondecreasing weight, add if independence is preserved) yields a minimum-weight basis. Conversely, the greedy algorithm is correct for every weight function iff the independence system is a matroid.

This is why Kruskal is correct and why the swap view is the right local move: in a matroid, a minimum-weight basis B has the property that for every e ∉ B, e is a maximum-weight element of the unique circuit C(B, e) = C_{T*}(e) (the fundamental cycle). That is exactly Lemma 2.4 lifted to abstract matroids. The second-best basis is then the cheapest single basis-exchange, and Corollary 3.4 is the matroid-generic statement that one exchange suffices.

Proposition 8.6 (Number of MSTs as a determinant). Contract each maximal set of vertices joined only by MST edges of weight < τ as τ increases through the distinct weights; at each weight τ, the choices among equal-weight edges that connect distinct super-nodes without forming a cycle are independent across the connected blocks. The total count of MSTs is the product over weight levels of the number of spanning forests of each block's "equal-weight multigraph," each computable by Kirchhoff's matrix-tree theorem (sibling 24-kirchhoff-theorem):

#MST(G) = Π_{τ} Π_{block B at level τ} det(L̃_B),

where L̃_B is a reduced Laplacian of block B's equal-weight contracted multigraph. When this product exceeds 1, δ_min = 0 and w(T₂) = w(T*).

8.6 Stability of the second-best under tie-breaking¶

A subtle but important consequence: the value w(T₂) is independent of which MST T* is fixed as the reference, even though the set of second-best trees may differ.

Proposition 8.7. For any two MSTs T* and T*' of G, the second-best weights computed relative to each are equal.

Proof. If G has a unique MST the statement is vacuous. Otherwise w(T₂) = w(T*) = w(T*') relative to either reference (Corollary 8.4 gives δ_min = 0 for both, since each admits a distinct equal-weight MST). Hence the value is the common MST weight regardless of reference. ∎

This justifies implementations that fix an arbitrary MST (whatever Kruskal happens to produce under its tie-break) and report the second-best value without worrying about which MST was selected.

Proposition 8.8 (Swap closure of the MST set). Let ℳ be the set of all MSTs of G. Any two MSTs T, T' ∈ ℳ are connected by a sequence of single equal-weight swaps that stay entirely within ℳ.

Proof. Apply Lemma 3.3 to T and T'. Each entering edge e_i ∈ T' \ T satisfies w(e_i) = M₁^{R_i}(e_i) (both are MSTs, so by Lemma 2.4 applied within ℳ, the swapped edges have equal weight; a strict inequality would make one tree lighter than the other, impossible since both are minimum). Hence every swap in the sequence is weight-preserving, so every intermediate R_i has weight w(T) = w(T') and is therefore itself an MST. The sequence stays in ℳ. ∎

Corollary 8.9. The "MST exchange graph" — vertices ℳ, edges between MSTs differing by one swap — is connected. This is the structural reason the second-best (when δ_min = 0) is reachable by a single swap from any chosen reference MST, and why fixing an arbitrary reference loses no generality (Proposition 8.7).

9. From Second-Best to k-Best: Where the Theory Generalizes¶

Theorem 9.1 (Failure of the single-swap argument for k ≥ 3). The single-swap structure (Theorem 3.1) holds only between the MST and the second-best. For k ≥ 3, the k-th best spanning tree need not be a single swap from the MST; it may be a single swap from the (k−1)-th best or a double swap from the MST.

Eppstein / Gabow framework. The k-best spanning trees are enumerated by a branching search over a partition of the solution space using constraints "edge e included" / "edge e excluded." Each node of the branching tree requires the best spanning tree consistent with its constraints and its best single swap — a constrained second-best computation, exactly the routine of §5 with forced-in / forced-out edges. A priority queue over these branch nodes yields the trees in nondecreasing weight order.

Theorem 9.2 (Gabow 1977; Katoh–Ibaraki–Mine 1981). The k smallest spanning trees can be computed in O(m log m + k·m·α(m, n)) or, with refinements, O(m log β(m, n) + k √m) style bounds; Eppstein (1992) gave O(m log β(m,n) + k) for the related smallest-spanning-trees-with-degree and k-best problems. The key reusable primitive is the constrained best-swap, i.e. the second-maximum-aware path query.

This is why §5 retains the full three-branch rule including M₂: the constrained variants used in k-best genuinely encounter w(e) < M₁(e) (because a forced-out path-max edge cannot be removed, so the relevant removable max is M₂).

9.3 The branching invariant¶

Lemma 9.3 (Partition correctness). Fix the current MST T* and its best swap e* = (in), f* = (out). The set of all spanning trees splits into two disjoint classes that together with {T*} cover everything:

class P_exclude = { trees not containing e* }      (force e* OUT)
class P_include = { trees containing e* but ≠ T*'s swap } (force e* IN, force f* OUT)

Recursing produces a binary branching tree whose leaves are individual spanning trees, each reached by a unique constraint set. A min-heap keyed by "best tree weight consistent with the node's constraints" yields the spanning trees in nondecreasing weight when expanded greedily.

Proof sketch. Every spanning tree either omits e* (lands in P_exclude) or contains e*. If it contains e* and is the specific swap T* \ {f*} ∪ {e*} it is the node's own best; otherwise it omits f* as well, landing in P_include. Disjointness holds because the constraints e* OUT and (e* IN, f* OUT) are mutually exclusive. ∎

Complexity. Each branch node solves one constrained best-swap (O(m log n) with binary lifting, or O(m α) offline). Producing k trees expands O(k) nodes, giving the O(m log m + k · m log n)-style bounds; Eppstein's refinements reduce the per-tree cost to nearly O(1) amortized after an O(m log β(m,n)) preprocessing, via persistent heaps over the swap candidates.

9.4 Why second-best does NOT need branching¶

The k = 2 case collapses the branching to a single level: the root's best swap is the answer (Corollary 3.4), so no priority queue over subproblems is required. This is the precise boundary where the elementary single-scan algorithm of this topic suffices and the heavier k-best machinery becomes necessary.

10. Lower Bounds and Open Problems¶

Theorem 10.0 (Lower bound for second-best from scratch). Any comparison-based algorithm that, given G with no precomputed MST, outputs w(T₂) requires Ω(m + n log n) comparisons in the worst case on dense graphs.

Proof. Computing w(T₂) lets one decide MST uniqueness (Corollary 8.4: check w(T₂) > w(T*)), and a fortiori one can extract the MST weight w(T*) as a byproduct (run the algorithm, then it must have inspected enough of the edge order to certify the minimum). The MST-construction problem has a comparison lower bound of Ω(m + n log n) on dense inputs (any correct MST algorithm must, in effect, partially sort the cut-crossing edges). Hence second-best-from-scratch inherits the bound. ∎

This separates two regimes: from scratch, second-best is no easier than MST (Ω(m + n log n)); given the MST, it drops to O(m α(m,n)) (Proposition 10.2). The expensive part is the MST, not the runner-up.

Proposition 10.1 (Sorting lower bound on MST verification). Any comparison-based algorithm that produces the MST must, in the worst case, perform Ω(m + n log n)-style work in dense regimes; the second-best computation inherits the MST-construction lower bound since it requires the MST as input. Dixon–Rauch–Tarjan (1992) gave a linear-time MST verification algorithm; combined with King's (1997) linear verification, this means verifying a claimed second-best swap can be done in O(m) even though constructing the MST may cost more.

Proposition 10.2 (Path-max queries are the bottleneck). The second-best computation is asymptotically dominated by m path-maximum queries on a static tree. Offline, all m queries can be answered in O(m α(m, n)) total via the King / Komlós tree-path-maximum technique (the same primitive used in linear-time MST verification), giving an overall O(m α(m, n)) second-best algorithm once the MST is known — beating the O(m log n) online binary-lifting bound. This is rarely implemented due to constant factors but is the theoretically optimal route.

Proposition 10.3 (Reduction certifying hardness of constrained variants). Deciding whether a graph has a spanning tree of weight ≤ W containing a prescribed edge set and avoiding another is polynomial (constrained MST), but adding a global degree bound makes the minimum-bounded-degree spanning tree problem NP-hard (it generalizes Hamiltonian path). The unconstrained second-best inherits none of this hardness — it is strictly polynomial — but second-best subject to side constraints can jump complexity classes. This delineates exactly how far the elegant single-swap result extends.

Proposition 10.4 (Information-theoretic floor for the answer's bits). The output w(T₂) is a single number, but a certificate of optimality (the entering and leaving edges plus the path-max witness) is O(log n) bits. Verifying such a certificate requires confirming (a) e* ∉ T*, (b) f* lies on P_{T*}(u,v), and (c) f* is a maximum on that path — the last via the O(m) tree-path-maximum verification of King (1997). Thus second-best, like MST, admits a linear-time-verifiable certificate even when construction is super-linear, placing it in the same "easy to check" regime as MST verification.

Open / advanced directions. 1. Fully dynamic second-best. Maintaining w(T₂) under edge insert/delete with polylog update time, layered on Holm–de Lichtenberg–Thorup dynamic MST (O(log⁴ n) amortized per update), is delicate; the second-best rides on dynamic tree-path-maximum, for which sub-log² n worst-case bounds are not standard. 2. Second-best with degree or other matroid-intersection constraints (e.g., bounded-degree spanning trees) inherits the hardness of the constrained base problem (Proposition 10.3). 3. Counting the number of distinct second-best MSTs efficiently when many ties exist relates to weighted matrix-tree computations (Proposition 8.6) and is more intricate than counting MSTs; it is #P-flavored in adversarial multigraph instances. 4. Parallel / distributed second-best on massive graphs, reusing distributed MST (Borůvka-style) plus distributed path-max, is an active engineering area; the path-max step parallelizes well because fundamental-cycle queries are independent. 5. Approximate second-best via spanners: on an (1+ε)-spanner the second-best weight is preserved up to (1+ε), trading exactness for a sparser graph and faster queries. 6. Output-sensitive enumeration of all second-best trees. When δ_min = 0 there may be exponentially many distinct equal-weight MSTs; enumerating them with polynomial delay per tree (à la Avis–Fukuda reverse search over the exchange graph of Corollary 8.9) is solved in principle but the constants and the interaction with weighted ties are intricate.

Proposition 10.11 (No comparison speedup below MST for the gap). Computing only the gap δ_min = w(T₂) − w(T*) (not the trees) is still Ω(m + n log n) from scratch: knowing δ_min = 0 vs > 0 decides MST uniqueness, which is as hard as constructing a witness MST in the comparison model. So even the "cheapest" output (one bit of information, uniqueness) does not escape the MST lower bound when no MST is given. The asymmetry of §7.6 — near-linear given the MST — is therefore the sharp characterization: the runner-up is free once the champion is known, and not before.

10.5 Complexity summary table¶

Stage / engine	Time	Space	Notes
Kruskal MST	`O(m log m)`	`O(m)`	sort-dominated
Binary-lifting build	`O(n log n)`	`O(n log n)`	three tables `up, M₁, M₂`
Per path query (lifting)	`O(log n)`	`O(1)`	`2 log n` `merge`s
Full second-best (lifting)	`O((n+m) log n)`	`O(n log n)`	the standard route
Euler + sparse-table	`O((n+m) log n)` build, `O(1)` query	`O(n log n)`	faster queries, static
Kruskal reconstruction tree	`O(m log m)`, `O(α)` query offline	`O(n)`	leanest space
King/Komlós optimal	`O(m α(m,n))` total (MST given)	`O(m)`	theoretical floor (§7.6)
Naive path walk	`O(m n)`	`O(n)`	verification only
k-best (Eppstein)	`O(m log β(m,n) + k)`	`O(m + k)`	branching over constrained swaps

10.5b Connection to bottleneck and minimax path structure¶

The path-maximum M₁(e) is itself the bottleneck (minimax) edge between u and v within the MST, and the MST path is a minimax path of the whole graph:

Theorem 10.7 (MST minimax property). For any two vertices u, v, the maximum edge weight on P_{T*}(u, v) equals the minimum over all u–v paths in G of the path's maximum edge:

M₁(u,v) = min_{paths Q: u→v} max_{f ∈ Q} w(f).

Proof. (≥) The MST path is one u–v path, so its max is ≥ the minimax. (≤) Suppose some path Q had a smaller bottleneck b < M₁(u,v). Then all of Q's edges have weight < M₁(u,v). Adding Q's edges to T* and using the cut property, the heaviest MST-path edge (M₁) could be replaced by a lighter Q-edge crossing the same cut, contradicting MST optimality. Hence no such Q exists and M₁(u,v) is the minimax. ∎

Consequence. The second-best swap removes the global minimax edge between the non-tree edge's endpoints — a satisfying structural reading: we replace the "tightest bottleneck" on the cycle with the new chord. This also links the topic to widest-path / minimax-path algorithms and to the fact that the MST is simultaneously a minimum bottleneck spanning tree.

10.5c Numerical model and real-valued weights¶

The theory above is stated over w : E → ℝ. Implementations use machine numbers; two precision concerns deserve formal statement.

Proposition 10.8 (Integer exactness). If all weights are integers in [−B, B], then w(T*), every δ(e), and w(T₂) are integers in [−(n−1)B, (n−1)B]. Choosing a 64-bit accumulator suffices whenever (n−1)B < 2^{63}, which holds for any realistic n, B. No rounding occurs; the equal-weight branch triggers on exact integer equality, so the brute-force agreement of Proposition 6.2 is mathematically exact, not approximate.

Proposition 10.9 (Floating-point fragility of the tie branch). With real-valued weights stored as IEEE-754 doubles, the predicate w(e) == M₁(e) is ill-conditioned: two weights that are equal in the modeled reals may differ in the last bit after the additions/comparisons of Kruskal and the ⊕ aggregation, flipping the algorithm between the δ = 0 and δ = w − M₁ > 0 branches. The remedy is one of: (a) use exact rationals or scaled integers; (b) replace == with |w − M₁| ≤ ε for a domain-calibrated ε, accepting that the second-best may then be reported as w(T*) for near-ties; (c) symbolic tie-breaking by (weight, edge-id) lexicographic order, which makes every comparison strict and removes the tie branch entirely at the cost of treating equal-weight trees as distinct. Option (c) is the cleanest when an exact w(T₂) value is needed but distinctness of equal trees is acceptable — it is what most contest implementations do implicitly by breaking ties on input order.

Corollary 10.10. Under strict total tie-breaking (option c), the MST is unique by construction, δ_min > 0 always, and the algorithm reduces to the w > M₁ branch alone — but the reported w(T₂) then equals the true second-distinct-weight, which may exceed the multiset-second-best when genuine ties existed. The choice of model must be documented; the three .md code files in this topic use exact integers and the explicit == branch, matching Definition 1.3.

10.6 Annotated algorithm with loop invariants¶

For completeness, the canonical algorithm with the invariants that make Theorem 6.1 self-evident:

SECOND-BEST-MST(G):
  # Phase 1: Kruskal
  sort E by weight
  T* ← ∅;  W ← 0;  DSU ← MakeSet(V)
  for e = (u,v,w) in sorted E:
      if Find(u) ≠ Find(v):
          Union(u,v);  T* ← T* ∪ {e};  W ← W + w
  # INV1: after the loop, T* is an MST and W = w(T*); if |T*| < n−1, G is disconnected → return ⊥

  # Phase 2: path-max engine over T*
  build LCA lifting tables (up, M₁, M₂) on T* rooted at r
  # INV2: query(u,v) returns (M₁,M₂) of P_{T*}(u,v)  [Prop 5.2 guarantees order-independence]

  # Phase 3: scan non-tree edges
  best ← +∞
  for e = (u,v,w) ∉ T*:
      (m1,m2) ← query(u,v)
      if   w >  m1: δ ← w − m1            # INV3a: δ > 0, valid distinct swap
      elif w == m1: δ ← 0                 # INV3b: distinct equal-weight tree
      elif m2 ≠ −∞: δ ← w − m2            # INV3c: constrained / non-MST case
      else:         continue              # no valid distinct swap through e
      best ← min(best, δ)
      # INV4: best = min over scanned edges of δ(e) = (running second-best gap)
  return ⊥ if best = +∞ else W + best     # INV5: W + best = w(T₂)  [Theorem 6.1]

INV1 holds by Kruskal's correctness (Theorem 8.5). INV2 holds by the monoid associativity of ⊕ (Proposition 5.2), so the O(log n) segment combination per query is order-independent. INV3a–c enumerate the cases of §5 and each yields a distinct spanning tree (Lemma 2.3, e ∉ T*). INV4 is maintained by the min update. INV5 is the contraction of INV4 with Theorem 6.1. The invariants make the proof obligation local: each line preserves exactly one invariant, and the conjunction at the return is the correctness statement.

Termination. Phase 1 iterates m times, Phase 2 builds O(n log n) table entries, Phase 3 iterates m − n + 1 times — all finite. No loop can run forever because each touches a fixed finite index set once.

10.7 Relationship to the cp-algorithms formulation¶

The widely cited cp-algorithms treatment defines the second-best MST via the same single-swap principle and the first/second path-maximum, computing the maxima with binary lifting in O((V + E) log V). The formulation here matches it and adds the rigorous justifications that the reference treats informally:

cp-algorithms claim	Rigorous statement here
"second-best differs by one edge"	Theorem 3.1 + exchange sequence Lemma 3.3 / Corollary 3.4
"remove the max edge on the path"	Theorem 4.1 (per-edge optimality)
"track the second maximum for equal weights"	§5, Proposition 5.1, and the k-best necessity in §9.2
"use LCA to get path maxima"	Theorem 6.4 (query correctness) + Proposition 5.2 (monoid)
`O((V+E) log V)`	Theorem 7.1, with the `O(m α)` floor of §7.6 / Prop 10.2

The one substantive subtlety the reference under-emphasizes, and which the empirical study of Proposition 6.2 exposes, is that the equal-weight branch is mandatory: omitting it is correct on distinct-weight inputs and silently wrong on 19.4% of tie-heavy random graphs in our fuzz test. A reader implementing from the informal description who writes only δ = w − M₁ will pass distinct-weight tests and fail in production — precisely the failure mode this document proves cannot occur with the full three-branch rule.

11. Summary¶

Definition. The second-best MST is the minimum-weight spanning tree different from a fixed MST T*; its weight may equal w(T*) when multiple MSTs exist (Definition 1.3, Corollary 8.4).
Structure. It differs from T* by exactly one edge swap (Theorem 3.1), a consequence of the matroid strong-exchange property (Theorem 8.2).
Per-edge optimum. For a fixed non-tree edge, the best swap removes the path-maximum (Theorem 4.1); by the cycle property w(e) ≥ M₁(e) always holds (Corollary 4.2).
The tie subtlety. When w(e) = M₁(e) the swap gives a distinct equal-weight tree (δ = 0); the second-maximum M₂(e) is required for constrained and k-best variants where the path-max is forced to stay (Proposition 5.1, §9).
Correctness. The algorithm equals w(T*) + min_e δ(e) = w(T₂) (Theorem 6.1), validated against brute force on 3738 tie-heavy graphs with zero errors; the naive equal-weight-blind variant failed 726 of them (Proposition 6.2).
Complexity. O((n + m) log n) time, O(n log n) space with binary lifting (Theorem 7.1); O(m α(m, n)) is achievable offline via King/Komlós tree-path-maximum (Proposition 10.2); naive path-walking is O(mn) (Proposition 7.3).
Generalization. The single-swap argument stops at k = 2; k-best needs Eppstein/Gabow branching over constrained best-swaps, which is precisely why the full M₁/M₂ rule is worth keeping (Theorem 9.1, 9.2).
Sensitivity duality. The same δ values give edge-weight tolerance ranges: δ_min = min_e slack(e) is both the second-best gap and the smallest non-tree decrease that would change the MST (Corollary 4.6).
Matroid lens. Spanning trees are bases of the graphic matroid; the swap is a basis exchange (Theorem 8.2, 8.5), MST uniqueness is δ_min > 0 (Corollary 8.4), and the second-best value is reference-independent (Proposition 8.7).
Bottleneck reading. The removed path-max is the global minimax edge between the chord's endpoints (Theorem 10.7), tying second-best to minimum-bottleneck spanning trees.
Lower bounds. From scratch, second-best is no easier than MST (Ω(m + n log n), Theorem 10.0); given the MST it is near-linear (O(m α(m,n)), Proposition 10.2) with a linear-time-verifiable certificate (Proposition 10.4).

Foundational references: CLRS Ch. 23 and problem 23-1; Tarjan, Data Structures and Network Algorithms; Eppstein (1992), "Finding the k smallest spanning trees"; King (1997), linear-time MST verification; Komlós (1985), tree-path maxima; Dixon, Rauch & Tarjan (1992), verification and sensitivity of MSTs; Holm, de Lichtenberg, Thorup (2001), fully dynamic connectivity and MST; Gabow (1977), two algorithms for the k smallest spanning trees; Katoh, Ibaraki & Mine (1981), an algorithm for the k smallest spanning trees; Oxley, Matroid Theory (for §8). Sibling topics: 10-mst-kruskal-prim, 13-lca, 14-heavy-light-decomposition, 24-kirchhoff-theorem.

The arc of this document: a single inequality (Lemma 2.4, w(e) ≥ M₁(e)) and a single exchange (Lemma 3.3) generate the entire theory — one swap suffices (Theorem 3.1), remove the path-max (Theorem 4.1), mind the tie via M₂ (§5), aggregate with a (first, second) monoid over a static tree (Proposition 5.2, Theorem 6.4), all in O((n+m) log n) (Theorem 7.1) with a near-linear floor once the MST is in hand (§7.6). The matroid lens (§8) explains the ties; the k-best landscape (§9) shows where the simplicity ends; the lower bounds (§10) pin the cost to the MST itself. It is one of the cleanest "structure dictates algorithm" stories in graph theory.