Tarjan's SCC — Middle Level¶
Focus: Why the lowlink test is correct, how Tarjan compares with Kosaraju and Gabow, and how SCCs compose into the condensation DAG that powers 2-SAT, topological processing, and dependency analysis.
Table of Contents¶
- Introduction
- Deeper Concepts
- Worked Trace — disc/low on a DFS Tree
- Comparison with Alternatives
- Advanced Patterns
- Graph and Tree Applications
- Algorithmic Integration
- Code Examples
- Iterative Tarjan + Condensation (Go/Java/Python)
- Error Handling
- Performance Analysis
- Best Practices
- Visual Animation
- Summary
Introduction¶
At junior level Tarjan's algorithm is "track disc, low, a stack, and pop when low[u] == disc[u]." At middle level you start asking why it works and what to do with the answer:
- Why does
low[v] == disc[v]precisely identify an SCC root, and never a false one? - Why must back/cross edges use
disc[v]and notlow[v]? - When is Kosaraju's two-pass approach preferable to Tarjan, and when is Gabow's stack-pair variant the cleanest?
- How do I turn raw SCCs into the condensation DAG so I can topologically process them?
- How does this become a 2-SAT solver, a deadlock detector, or a build-cycle analyzer?
These are the questions that separate "I copied a Tarjan template" from "I can adapt it."
Deeper Concepts¶
Lowlink semantics — the precise definition¶
For a min-heap we had one invariant; for Tarjan the key quantity is the lowlink. Define, for each vertex v:
low[v] = min over:
disc[v]
disc[w] for every w that is on the DFS stack and reachable from v
using tree edges of v's subtree plus exactly ONE non-tree
edge into an on-stack vertex
Intuitively, low[v] is the smallest discovery index of any vertex that v's subtree can "reach back to" while staying inside the currently open part of the DFS. The single back/cross edge is the one that closes a cycle.
Why low[u] == disc[u] marks an SCC root¶
Claim: when DFS finishes u, low[u] == disc[u] iff u is the first-discovered vertex of its SCC (its root), and the vertices still on the stack above u are exactly the rest of that SCC.
Sketch of the reasoning:
- Every vertex
winu's SCC is reachable fromu, and since they are mutually reachable,uis reachable fromw. So during DFS fromu, all of them are discovered insideu's subtree and pushed afteru. They remain on the stack until the SCC closes. - If
uwere not the root, there would be an ancestorain the same SCC withdisc[a] < disc[u]. Mutual reachability gives a path fromu's subtree back toavia a back edge to an on-stack vertex, which would drivelow[u]down todisc[a] < disc[u]. Solow[u] < disc[u]. - Conversely, if
uis the root, nothing in its subtree can reach a strictly-older on-stack vertex (that would putuin an older SCC), solow[u]cannot drop belowdisc[u]; it stays equal.
Hence low[u] == disc[u] exactly characterizes roots. Popping everything above u (inclusive) yields the component. A full invariant-style proof is in the Professional file.
Why back/cross edges use disc[v], not low[v]¶
This is the single most common Tarjan bug. Suppose u → v is an edge and v is on the stack (older). We want to record that u can reach back to v. The correct contribution is disc[v] — the actual discovery time of v.
If instead you used low[v], you would be claiming that u can reach back to whatever v can reach back to. But v's lowlink may reflect a vertex in a different, deeper part of the DFS that u cannot independently reach without going through v. Worse, low[v] can still be changing. Using it can merge two genuinely-separate SCCs into one. The discovery index disc[v] is a stable, conservative fact: "v is reachable from u and v is older." That is exactly what lowlink needs.
(For tree edges it is the opposite: you recurse first, low[v] is finalized for v's subtree, and inheriting min(low[u], low[v]) is both correct and necessary.)
The on-stack predicate¶
A vertex leaves the stack only when its whole SCC is popped. So "on the stack" means "discovered but its SCC is not yet closed." An edge into an already-popped vertex points into a finished SCC and must be ignored — using it would corrupt lowlinks. This is why we keep the explicit onStack[] flag instead of merely "visited."
Three edge kinds, three actions¶
Every directed edge u → v examined during DFS falls into exactly one of three buckets. Memorize this table — it is the algorithm:
| Edge kind | Test at examination time | Action on low[u] | Why |
|---|---|---|---|
| Tree edge | v unvisited (disc[v] unset) | recurse, then low[u] = min(low[u], low[v]) | v's subtree is finished; its lowlink is final and fully belongs to u's subtree |
| Back / cross to open SCC | v visited and onStack[v] | low[u] = min(low[u], disc[v]) | v is older and still open → u can close a cycle through v; record the stable fact disc[v] |
| Edge to closed SCC | v visited and not onStack[v] | do nothing | v's SCC already emitted; it is a strict predecessor in the condensation, irrelevant to u's lowlink |
The third row is as important as the second: forgetting the onStack[v] guard and updating low[u] for every visited v pulls lowlinks down across already-closed components and silently merges SCCs.
low is not "the smallest reachable disc"¶
A frequent misconception: "low[v] is the minimum disc of anything reachable from v." That is false — that quantity would always be disc[root-of-whole-graph] for a connected graph, which is useless. The lowlink restricts to one non-tree edge and to vertices inside v's own subtree plus one hop back onto the open stack. That bounded definition is what makes low[u] == disc[u] a local test the algorithm can check the instant u finishes, with no global recomputation.
Worked Trace — disc/low on a DFS Tree¶
Take this graph (the same one used in the code section):
Adjacency: 0:[1] 1:[2] 2:[0,3] 3:[4] 4:[5] 5:[3]. DFS starts at vertex 0, counter t begins at 0.
ASCII DFS tree (solid │ = tree edge, dashed ‑‑> = non-tree edge to an on-stack vertex):
0 disc=0 low=0
│
1 disc=1 low=0
│
2 disc=2 low=0
╱ ╲
(‑‑>0) 3 tree edge 2→3
│ (2‑‑>0 is a back edge to on-stack 0: low[2]=min(2,disc[0]=0)=0)
4 disc=4 low=3
│
5 disc=5 low=3
╲
‑‑>3 (5‑‑>3 back edge to on-stack 3: low[5]=min(5,disc[3]=3)=3)
Step-by-step (push order, then unwind):
| Step | Event | disc | low | Stack (bottom→top) | onStack |
|---|---|---|---|---|---|
| 1 | visit 0 | d0=0 | l0=0 | [0] | {0} |
| 2 | visit 1 | d1=1 | l1=1 | [0,1] | {0,1} |
| 3 | visit 2 | d2=2 | l2=2 | [0,1,2] | {0,1,2} |
| 4 | edge 2→0, 0 on stack | — | l2=min(2,d0=0)=0 | [0,1,2] | — |
| 5 | edge 2→3, 3 unvisited → recurse | — | — | — | — |
| 6 | visit 3 | d3=3 | l3=3 | [0,1,2,3] | +3 |
| 7 | visit 4 | d4=4 | l4=4 | [0,1,2,3,4] | +4 |
| 8 | visit 5 | d5=5 | l5=5 | [0,1,2,3,4,5] | +5 |
| 9 | edge 5→3, 3 on stack | — | l5=min(5,d3=3)=3 | … | — |
| 10 | 5 finishes: l5=3 ≠ d5=5 → not a root | — | — | … | — |
| 11 | back in 4: l4=min(4,l5=3)=3; 4 finishes l4=3≠d4=4 → not root | — | l4=3 | … | — |
| 12 | back in 3: l3=min(3,l4=3)=3; l3=3 == d3=3 → ROOT | — | l3=3 | pop 5,4,3 → SCC {3,4,5} | −5,−4,−3 |
| 13 | back in 2: l2=min(0,?) already 0; 2 finishes l2=0≠d2=2 → not root | — | l2=0 | [0,1,2] | — |
| 14 | back in 1: l1=min(1,l2=0)=0; not root | — | l1=0 | [0,1,2] | — |
| 15 | back in 0: l0=min(0,l1=0)=0; l0=0 == d0=0 → ROOT | — | l0=0 | pop 2,1,0 → SCC {0,1,2} | empty |
Output order: {3,4,5} then {0,1,2}. The condensation edge runs {0,1,2} → {3,4,5} (because of 2→3), so the algorithm emitted the sink-side SCC first — that is the reverse topological order property in action. Notice low[3] stayed at 3 (never reached an older on-stack vertex), which is exactly why it was correctly flagged as a root, while low[2] dropped to 0 because of the back edge 2→0, correctly demoting it.
Comparison with Alternatives¶
| Attribute | Tarjan (1972) | Kosaraju (1978) | Gabow / Cheriyan–Mehlhorn (path-based) |
|---|---|---|---|
| DFS passes | 1 | 2 | 1 |
| Transpose graph needed | No | Yes | No |
| Extra arrays | disc, low, onStack | visited, finish-order stack | two stacks (vertices + boundaries), disc |
| Lowlink bookkeeping | Yes (low[]) | None | None (uses a second stack instead) |
| Time | O(V + E) | O(V + E) | O(V + E) |
| Space | O(V) | O(V + E) (transpose) | O(V) |
| Conceptual difficulty | Medium (lowlink rules) | Low (just two DFS) | Medium |
| Output order | reverse topo of condensation | topo-ish via finish stack | reverse topo |
| Practical speed | Fast (one pass) | Slower (builds transpose) | Fast, fewer comparisons |
Choose Tarjan when: you want one pass, no transpose, minimal memory, and you can get the lowlink rules right.
Choose Kosaraju when: clarity matters more than constants, you already have the transpose, or you are teaching/learning. Two simple DFS passes are hard to get wrong.
Choose Gabow's path-based algorithm when: you want a single pass like Tarjan but find the lowlink arithmetic error-prone. Gabow replaces low[] with a second stack that stores potential SCC boundaries; when a back edge to an on-stack vertex is found, it pops the boundary stack down to that vertex's position. Same asymptotics, arguably simpler to reason about, slightly fewer comparisons.
Gabow in one paragraph¶
Gabow keeps the same vertex stack S as Tarjan plus a boundary stack B. On visiting v, push v onto both. On edge v → w: if w is unvisited, recurse; if w is on S, pop B while disc[B.top] > disc[w] (collapse the tentative boundaries past w). When v finishes and B.top == v, pop S down to v to form the SCC and pop v off B. No low[] array at all — the boundary stack is the lowlink, represented structurally.
Advanced Patterns¶
Pattern: Build the condensation DAG¶
Once every vertex has an SCC id, contract each SCC to a node and add an edge comp[u] → comp[v] whenever u → v crosses components. Deduplicate to keep it a simple DAG.
def condensation(adj, comp_id, num_comps):
dag = [set() for _ in range(num_comps)]
for u in range(len(adj)):
for v in adj[u]:
if comp_id[u] != comp_id[v]:
dag[comp_id[u]].add(comp_id[v])
return [sorted(s) for s in dag]
The condensation is always acyclic — if it had a cycle, those SCCs would be mutually reachable and hence one larger SCC, a contradiction.
Pattern: Topological order of SCCs for free¶
Tarjan emits SCCs in reverse topological order of the condensation. So if you number components 0, 1, 2, ... in the order Tarjan produces them, every condensation edge goes from a higher id to a lower id. To process the DAG in forward topological order, iterate components in decreasing Tarjan-output order (or reverse the emitted list). No separate topological sort needed.
Pattern: 2-SAT preview¶
A 2-SAT formula is a conjunction of clauses (a ∨ b). Build an implication graph over 2n literals: clause (a ∨ b) adds edges ¬a → b and ¬b → a. The formula is satisfiable iff no variable x has x and ¬x in the same SCC. If satisfiable, assign each variable true when its positive literal's SCC appears later in topological order than its negation's SCC. Tarjan gives both the SCCs and (via output order) the topological order in one pass. The dedicated 2-SAT topic is a forward-reference sibling.
Pattern: Iterative Tarjan (overflow-proof)¶
Recursion depth can reach O(V). On a path graph of 10⁶ vertices, recursive Tarjan overflows. The iterative version simulates the call stack explicitly, tracking which child index each frame is on. (Full code in the Senior file.) Keep the same disc/low/onStack semantics; only the control flow changes.
Graph and Tree Applications¶
Deadlock detection¶
Model a wait-for graph: edge T1 → T2 means "transaction T1 waits for a lock held by T2." A deadlock is a cycle, which is exactly an SCC of size ≥ 2. Run Tarjan; any non-trivial SCC is a set of mutually-blocked transactions to abort.
Dependency-cycle analysis¶
Module/package import graphs, build target graphs, microservice call graphs — all directed. A circular dependency is an SCC with more than one node. Reporting the SCC tells the user the exact cycle to break.
Algorithmic Integration¶
Once you have the condensation DAG, many problems become DAG-DP:
- Reachability counting: number of SCCs reachable from each SCC.
- Longest chain of dependencies: longest path in the condensation (DAG longest path is linear).
- Minimum edges to make strongly connected: count source SCCs and sink SCCs in the condensation; the answer is
max(sources, sinks)(for a condensation with > 1 node).
SCC + condensation is a standard reduction: "directed graph with cycles" → "DAG", after which DAG algorithms apply unchanged.
Code Examples¶
Kosaraju's two-pass SCC + condensation¶
Go¶
package main
import "fmt"
func kosaraju(n int, adj [][]int) []int {
visited := make([]bool, n)
order := []int{} // vertices by finish time
var dfs1 func(u int)
dfs1 = func(u int) {
visited[u] = true
for _, v := range adj[u] {
if !visited[v] {
dfs1(v)
}
}
order = append(order, u) // push on finish
}
for u := 0; u < n; u++ {
if !visited[u] {
dfs1(u)
}
}
// transpose
radj := make([][]int, n)
for u := 0; u < n; u++ {
for _, v := range adj[u] {
radj[v] = append(radj[v], u)
}
}
compID := make([]int, n)
for i := range compID {
compID[i] = -1
}
var dfs2 func(u, c int)
dfs2 = func(u, c int) {
compID[u] = c
for _, v := range radj[u] {
if compID[v] == -1 {
dfs2(v, c)
}
}
}
c := 0
for i := n - 1; i >= 0; i-- { // reverse finish order
u := order[i]
if compID[u] == -1 {
dfs2(u, c)
c++
}
}
return compID
}
func condensation(n int, adj [][]int, compID []int, numComps int) [][]int {
seen := make(map[[2]int]bool)
dag := make([][]int, numComps)
for u := 0; u < n; u++ {
for _, v := range adj[u] {
cu, cv := compID[u], compID[v]
if cu != cv && !seen[[2]int{cu, cv}] {
seen[[2]int{cu, cv}] = true
dag[cu] = append(dag[cu], cv)
}
}
}
return dag
}
func main() {
adj := [][]int{{1}, {2}, {0, 3}, {4}, {5}, {3}}
compID := kosaraju(len(adj), adj)
numComps := 0
for _, c := range compID {
if c+1 > numComps {
numComps = c + 1
}
}
fmt.Println("compID:", compID)
fmt.Println("DAG:", condensation(len(adj), adj, compID, numComps))
}
Java¶
import java.util.*;
public class Kosaraju {
static List<List<Integer>> adj;
static boolean[] visited;
static List<Integer> order = new ArrayList<>();
static int[] compID;
static void dfs1(int u) {
visited[u] = true;
for (int v : adj.get(u)) if (!visited[v]) dfs1(v);
order.add(u); // finish
}
static void dfs2(int u, int c, List<List<Integer>> radj) {
compID[u] = c;
for (int v : radj.get(u)) if (compID[v] == -1) dfs2(v, c, radj);
}
public static int[] run(List<List<Integer>> g) {
adj = g;
int n = g.size();
visited = new boolean[n];
compID = new int[n];
Arrays.fill(compID, -1);
for (int u = 0; u < n; u++) if (!visited[u]) dfs1(u);
List<List<Integer>> radj = new ArrayList<>();
for (int i = 0; i < n; i++) radj.add(new ArrayList<>());
for (int u = 0; u < n; u++)
for (int v : g.get(u)) radj.get(v).add(u);
int c = 0;
for (int i = n - 1; i >= 0; i--) {
int u = order.get(i);
if (compID[u] == -1) { dfs2(u, c, radj); c++; }
}
return compID;
}
public static void main(String[] args) {
List<List<Integer>> adj = List.of(
List.of(1), List.of(2), List.of(0, 3),
List.of(4), List.of(5), List.of(3));
System.out.println(Arrays.toString(run(adj)));
}
}
Python¶
import sys
def kosaraju(adj):
sys.setrecursionlimit(1 << 25)
n = len(adj)
visited = [False] * n
order = []
def dfs1(u):
visited[u] = True
for v in adj[u]:
if not visited[v]:
dfs1(v)
order.append(u) # finish order
for u in range(n):
if not visited[u]:
dfs1(u)
radj = [[] for _ in range(n)]
for u in range(n):
for v in adj[u]:
radj[v].append(u)
comp_id = [-1] * n
def dfs2(u, c):
comp_id[u] = c
for v in radj[u]:
if comp_id[v] == -1:
dfs2(v, c)
c = 0
for u in reversed(order): # reverse finish order
if comp_id[u] == -1:
dfs2(u, c)
c += 1
return comp_id, c
def condensation(adj, comp_id, num_comps):
dag = [set() for _ in range(num_comps)]
for u in range(len(adj)):
for v in adj[u]:
if comp_id[u] != comp_id[v]:
dag[comp_id[u]].add(comp_id[v])
return [sorted(s) for s in dag]
if __name__ == "__main__":
adj = [[1], [2], [0, 3], [4], [5], [3]]
comp_id, k = kosaraju(adj)
print("comp_id:", comp_id)
print("DAG:", condensation(adj, comp_id, k))
Iterative Tarjan + Condensation (Go/Java/Python)¶
The Kosaraju code above is the easy-to-audit oracle. Here is Tarjan itself, written iteratively so it survives O(V) recursion depth, paired with a condensation builder. Each frame remembers which neighbor index it is processing (pc, a "program counter"), which is how we resume after a child returns.
Go¶
package main
import "fmt"
func tarjanSCC(adj [][]int) (compID []int, numComps int) {
n := len(adj)
disc := make([]int, n)
low := make([]int, n)
onStack := make([]bool, n)
compID = make([]int, n)
for i := range disc {
disc[i] = -1
compID[i] = -1
}
timer := 0
var stack []int // the SCC stack
type frame struct{ v, pc int } // pc = next neighbor index to scan
for s := 0; s < n; s++ {
if disc[s] != -1 {
continue
}
callStack := []frame{{s, 0}}
for len(callStack) > 0 {
f := &callStack[len(callStack)-1]
v := f.v
if f.pc == 0 { // first entry into v: discover it
disc[v] = timer
low[v] = timer
timer++
stack = append(stack, v)
onStack[v] = true
}
recursed := false
for f.pc < len(adj[v]) {
w := adj[v][f.pc]
f.pc++
if disc[w] == -1 { // tree edge: descend
callStack = append(callStack, frame{w, 0})
recursed = true
break
} else if onStack[w] { // back/cross to open SCC
if disc[w] < low[v] {
low[v] = disc[w]
}
}
}
if recursed {
continue
}
// v is finished
if low[v] == disc[v] { // v is an SCC root
for {
w := stack[len(stack)-1]
stack = stack[:len(stack)-1]
onStack[w] = false
compID[w] = numComps
if w == v {
break
}
}
numComps++
}
callStack = callStack[:len(callStack)-1]
if len(callStack) > 0 { // fold low into parent (tree-edge rule)
p := &callStack[len(callStack)-1]
if low[v] < low[p.v] {
low[p.v] = low[v]
}
}
}
}
return compID, numComps
}
func condensation(adj [][]int, compID []int, numComps int) [][]int {
seen := make(map[[2]int]bool)
dag := make([][]int, numComps)
for u := range adj {
for _, v := range adj[u] {
cu, cv := compID[u], compID[v]
if cu != cv && !seen[[2]int{cu, cv}] {
seen[[2]int{cu, cv}] = true
dag[cu] = append(dag[cu], cv)
}
}
}
return dag
}
func main() {
adj := [][]int{{1}, {2}, {0, 3}, {4}, {5}, {3}}
compID, k := tarjanSCC(adj)
fmt.Println("compID:", compID, "numComps:", k)
fmt.Println("condensation:", condensation(adj, compID, k))
}
Java¶
import java.util.*;
public class TarjanIterative {
public static int[] tarjan(List<List<Integer>> adj, int[] outNumComps) {
int n = adj.size();
int[] disc = new int[n], low = new int[n], compID = new int[n], pc = new int[n];
boolean[] onStack = new boolean[n];
Arrays.fill(disc, -1);
Arrays.fill(compID, -1);
int timer = 0, numComps = 0;
Deque<Integer> stack = new ArrayDeque<>(); // SCC stack
Deque<Integer> call = new ArrayDeque<>(); // explicit call stack (vertices)
for (int s = 0; s < n; s++) {
if (disc[s] != -1) continue;
call.push(s);
while (!call.isEmpty()) {
int v = call.peek();
if (pc[v] == 0) { // discover v
disc[v] = low[v] = timer++;
stack.push(v);
onStack[v] = true;
}
boolean recursed = false;
while (pc[v] < adj.get(v).size()) {
int w = adj.get(v).get(pc[v]++);
if (disc[w] == -1) { call.push(w); recursed = true; break; }
else if (onStack[w]) low[v] = Math.min(low[v], disc[w]);
}
if (recursed) continue;
if (low[v] == disc[v]) { // root: pop component
int w;
do {
w = stack.pop();
onStack[w] = false;
compID[w] = numComps;
} while (w != v);
numComps++;
}
call.pop();
if (!call.isEmpty()) { // fold low into parent
int p = call.peek();
low[p] = Math.min(low[p], low[v]);
}
}
}
outNumComps[0] = numComps;
return compID;
}
public static List<List<Integer>> condensation(
List<List<Integer>> adj, int[] compID, int numComps) {
List<List<Integer>> dag = new ArrayList<>();
for (int i = 0; i < numComps; i++) dag.add(new ArrayList<>());
Set<Long> seen = new HashSet<>();
for (int u = 0; u < adj.size(); u++)
for (int v : adj.get(u)) {
int cu = compID[u], cv = compID[v];
if (cu != cv && seen.add((long) cu << 32 | cv)) dag.get(cu).add(cv);
}
return dag;
}
public static void main(String[] args) {
List<List<Integer>> adj = List.of(
List.of(1), List.of(2), List.of(0, 3),
List.of(4), List.of(5), List.of(3));
int[] k = new int[1];
int[] comp = tarjan(adj, k);
System.out.println("compID: " + Arrays.toString(comp) + " numComps=" + k[0]);
System.out.println("condensation: " + condensation(adj, comp, k[0]));
}
}
Python¶
def tarjan_scc(adj):
n = len(adj)
disc = [-1] * n
low = [0] * n
on_stack = [False] * n
comp_id = [-1] * n
timer = 0
num_comps = 0
scc_stack = []
for start in range(n):
if disc[start] != -1:
continue
call = [(start, 0)] # (vertex, next-neighbor-index)
while call:
v, pc = call[-1]
if pc == 0: # discover v
disc[v] = low[v] = timer
timer += 1
scc_stack.append(v)
on_stack[v] = True
recursed = False
while pc < len(adj[v]):
w = adj[v][pc]
pc += 1
if disc[w] == -1: # tree edge
call[-1] = (v, pc)
call.append((w, 0))
recursed = True
break
elif on_stack[w]: # back/cross to open SCC
low[v] = min(low[v], disc[w])
if recursed:
continue
call[-1] = (v, pc)
if low[v] == disc[v]: # root: pop the component
while True:
w = scc_stack.pop()
on_stack[w] = False
comp_id[w] = num_comps
if w == v:
break
num_comps += 1
call.pop()
if call: # fold low into parent (tree-edge rule)
p = call[-1][0]
low[p] = min(low[p], low[v])
return comp_id, num_comps
def condensation(adj, comp_id, num_comps):
dag = [set() for _ in range(num_comps)]
for u in range(len(adj)):
for v in adj[u]:
if comp_id[u] != comp_id[v]:
dag[comp_id[u]].add(comp_id[v])
return [sorted(s) for s in dag]
if __name__ == "__main__":
adj = [[1], [2], [0, 3], [4], [5], [3]]
comp_id, k = tarjan_scc(adj)
print("comp_id:", comp_id, "num_comps:", k)
print("condensation:", condensation(adj, comp_id, k))
All three print two SCCs — {0,1,2} and {3,4,5} — with the condensation edge from the {0,1,2} component to the {3,4,5} component. The component ids match the reverse topological order: the sink-side SCC {3,4,5} is assigned id 0 (emitted first), the source-side {0,1,2} id 1.
Error Handling¶
| Scenario | What goes wrong | Correct approach |
|---|---|---|
Used low[v] for an on-stack edge in Tarjan | Two distinct SCCs get merged silently | Always min(low[u], disc[v]) for non-tree edges |
| Kosaraju forgets to reverse finish order | Components computed in wrong order; some merged | Process the finish stack from last-finished to first |
| Transpose built with wrong direction | SCCs come out wrong | For each u → v, add v → u to the transpose |
| Recursion overflow on deep graph | Crash on large input | Iterative DFS for both Tarjan and Kosaraju |
| Self-loops treated as cycles incorrectly | Vertex with v→v reported as size-2 SCC | Self-loop is still a size-1 SCC; it only makes the graph cyclic |
| Condensation has duplicate edges | DAG bloated, DP slow | Deduplicate cross-component edges with a set |
Performance Analysis¶
Both Tarjan and Kosaraju are O(V + E). The practical gap:
| Aspect | Tarjan | Kosaraju |
|---|---|---|
| DFS traversals | 1 | 2 |
| Builds transpose | no | yes (one full edge copy) |
| Extra memory | O(V) | O(V + E) |
| Typical wall-clock | ~1.0× (baseline) | ~1.6–2.0× |
| Cache behavior | better (no transpose) | worse (random writes to transpose) |
For a graph with E = 5·10⁶, building the transpose alone costs a full pass over all edges plus the memory to store them again — that is what makes Kosaraju noticeably slower despite identical asymptotics.
import time, random, sys
def gen(n, m):
return [[random.randrange(n) for _ in range(m // n)] for _ in range(n)]
# Tarjan vs Kosaraju on the same random graph — Tarjan is typically
# ~1.5-2x faster because it avoids constructing and traversing the transpose.
The recursion-depth cost is the real-world differentiator: both recursive versions can hit O(V) depth, so on adversarial chains you must go iterative regardless of which algorithm you pick.
Best Practices¶
- Prefer Tarjan in production for one-pass efficiency; keep a Kosaraju reference implementation as an easy-to-audit oracle for tests.
- Always compute
comp_id[v], not just the list of components — almost every downstream use needs the per-vertex label. - Build the condensation immediately if you plan any DAG processing; it is the natural interface.
- Exploit the reverse-topo output order instead of running a separate topological sort.
- Switch to iterative the moment input sizes could push recursion past ~10⁴ depth.
- Test against brute force (Floyd–Warshall reachability) on small random graphs:
u, vsame SCC iffreach[u][v] && reach[v][u].
Visual Animation¶
See
animation.htmlfor an interactive view.The middle-level animation highlights: -
disc/lowupdating per edge, distinguishing tree edges from on-stack back edges - The explicit stack and the on-stack flag in real time - The exact momentlow[u] == disc[u]fires and an SCC is popped and colored - SCCs appearing in reverse topological order
Summary¶
Tarjan's algorithm is correct because low[u] == disc[u] precisely identifies the first-discovered vertex of each SCC — no false roots, no missed ones — and the explicit stack lets you peel off exactly that component. The crucial subtlety is the lowlink update: inherit low[v] across tree edges, but use disc[v] for edges to on-stack vertices. Kosaraju (two passes, transpose) trades speed for simplicity; Gabow (two stacks) trades the lowlink array for a boundary stack. Whichever you pick, the real payoff is the condensation DAG: contract each SCC and you have turned a cyclic directed graph into a DAG, unlocking 2-SAT, deadlock detection, dependency analysis, and DAG dynamic programming.