Path Compression — Junior Level¶
One-line summary: Path compression is the Find-side optimization of a Disjoint Set (Union-Find): while walking from a node up to its root, you re-point the nodes you touch directly (or closer) to the root, so the tree gets flatter and every future
findon those nodes is much faster.
Table of Contents¶
- Introduction
- Prerequisites
- Glossary
- Core Concepts
- Big-O Summary
- Real-World Analogies
- Pros & Cons
- Step-by-Step Walkthrough
- Code Examples
- Coding Patterns
- Error Handling
- Performance Tips
- Best Practices
- Edge Cases & Pitfalls
- Common Mistakes
- Cheat Sheet
- Visual Animation
- Summary
- Further Reading
Introduction¶
A Disjoint Set (Union-Find) keeps a collection of elements partitioned into groups. Each group is a tree, and each tree is identified by its root — the representative of the group. Two operations matter:
find(x)— followparent[x]pointers upward until you reach the root, and return that root.union(a, b)— link the root ofa's tree to the root ofb's tree, merging the two groups.
The basic structure (covered in the sibling topic 01-union-find) already works. But there is a problem hiding in find. If your unions build a long, skinny chain — 0 → 1 → 2 → 3 → ... → n — then a single find(0) has to walk the entire chain to reach the root. That is O(n) per call. Do it a million times and you have an O(n²) disaster.
Path compression fixes this with one simple idea: every time you walk a path to the root, flatten it. Re-point the nodes you visited so they hang directly off the root (or at least closer to it). The first find still pays the full walk, but every later find on those nodes is nearly instant — they now point straight at the root.
This single optimization turns Union-Find from a slow structure into one of the fastest data structures in all of computer science. Combined with union by rank/size (the sibling topic 03-union-by-rank), the amortized cost per operation becomes O(α(n)) — the inverse Ackermann function — which is less than 5 for any input that fits in the observable universe. Effectively constant.
In this file we focus on the Find side: the three classic ways to compress, with full code in Go, Java, and Python.
Prerequisites¶
Before you read this file you should be comfortable with:
- Arrays and zero-based indexing — the DSU is stored as a
parent[]array. - The basic Disjoint Set —
findandunionwithout optimizations (01-union-find). - Recursion basics — full path compression is naturally recursive.
- Loops and
whileconditions — path halving and splitting are iterative. - Big-O notation —
O(n),O(log n), and the idea of amortized cost.
Optional but helpful:
- The idea of a tree drawn with the root at the top (here we draw roots on top, children below).
- The notion of a self-loop: a root is its own parent,
parent[root] == root.
Glossary¶
| Term | Meaning |
|---|---|
| Disjoint Set / Union-Find / DSU | A structure that maintains a partition of elements into groups, each group a tree. |
parent[] | The array where parent[x] is the node directly above x. For a root, parent[root] == root. |
| Root / representative | The top of a tree; the unique node r with parent[r] == r. Identifies the whole group. |
find(x) | Returns the root of x's tree by following parents upward. |
union(a, b) | Links the root of one tree under the root of the other. |
| Path | The sequence of nodes from x up to its root: x, parent[x], parent[parent[x]], …, root. |
| Path compression | Re-pointing nodes on the find-path closer to (or directly to) the root, flattening the tree. |
| Full path compression | Re-point every node on the path directly to the root (two-pass, or the recursive one-liner). |
| Path halving | During the walk, point each node to its grandparent, halving the path length. Single pass. |
| Path splitting | Like halving but also advance one node each step; points each node to its grandparent. Single pass. |
| Rank / size | A per-root tag used by union-by-rank/size to decide which tree hangs under which. |
| α(n) | The inverse Ackermann function — grows so slowly it is ≤ 5 for all practical n. |
| Amortized cost | Average cost per operation over a long sequence, even if a single op is occasionally expensive. |
Core Concepts¶
1. The problem: naive find walks the whole path¶
A Disjoint Set is an array parent[]. A root points to itself. find walks up:
If a tree is a long chain, this walk is O(height). With arbitrary union and no compression, the height can grow to O(n):
Chain after a bad sequence of unions:
4 (root, parent[4]=4)
|
3 (parent[3]=4)
|
2 (parent[2]=3)
|
1 (parent[1]=2)
|
0 (parent[0]=1)
find(0) visits 0 → 1 → 2 → 3 → 4 : five hops.
Every find(0) pays the full walk. That is the cost we want to kill.
2. Full path compression (the recursive one-liner)¶
The cleanest version. As recursion unwinds, set each visited node's parent to the returned root:
find(x):
if parent[x] != x:
parent[x] = find(parent[x]) # re-point x straight to the root
return parent[x]
After find(0) on the chain above, every node on the path now points directly at 4:
One find flattened the whole path. This is two-pass in spirit: the recursion goes up to find the root (pass 1), then unwinds re-pointing every node (pass 2).
3. Path halving (single pass, iterative)¶
Recursion can be elegant but on a chain of a million nodes it can overflow the call stack. Path halving avoids recursion entirely. As you walk up, point each node to its grandparent, then jump to that grandparent:
find(x):
while parent[x] != x:
parent[x] = parent[parent[x]] # skip a generation
x = parent[x]
return x
Each step the path length is roughly halved. No stack, no recursion, one pass. The tree is not perfectly flat after one call, but it gets flat fast over repeated calls — and the asymptotics are identical to full compression.
4. Path splitting (single pass, iterative)¶
Almost the same as halving, but you re-point a node and then advance to the node's old parent, not the new one:
find(x):
while parent[x] != x:
next = parent[x]
parent[x] = parent[parent[x]] # x now points to its grandparent
x = next # advance to the OLD parent
return x
The difference is subtle: halving re-points every other node on the path; splitting re-points every node to its grandparent. Both are single-pass, both flatten the tree, both give the same O(α(n)) when combined with union by rank.
5. They are all the same asymptotically¶
| Variant | Passes | Recursion? | What each node ends up pointing to |
|---|---|---|---|
| Full compression | two (recursion up, re-point down) | yes (or explicit two loops) | the root, exactly |
| Path halving | one | no | its former grandparent (every other node) |
| Path splitting | one | no | its former grandparent (every node) |
All three give amortized O(α(n)) per operation when paired with union by rank/size. Path compression alone (with arbitrary union) gives amortized O(log n) — still a massive win over O(n). Use halving or splitting in production to dodge stack overflow.
Big-O Summary¶
| Scenario | find cost | Notes |
|---|---|---|
Naive find, no compression, arbitrary union | O(n) worst case | A chain forces a full walk. |
| Path compression alone, arbitrary union | O(log n) amortized | Flattening pays off over many finds. |
| Compression + union by rank/size | O(α(n)) amortized | Tarjan 1975; effectively constant (α ≤ 5). |
First find after building a deep tree | up to O(height) | You still pay once to walk, then flatten. |
find on an already-flat node | O(1) | Node points straight at the root. |
| Space | O(n) | One parent[] array (plus optional rank[]). |
The amortized proofs live in
professional.md. TheO(α(n))bound for the combination with union by rank is detailed in the sibling03-union-by-rank.
Real-World Analogies¶
| Concept | Analogy |
|---|---|
| Naive find on a chain | Asking "who is the CEO?" by asking your boss, who asks their boss, all the way up a 10-level org chart. Every single employee re-walks the whole ladder. |
| Full path compression | After you finally learn the CEO's name, you tell everyone you passed on the way up: "the CEO is Dana." Next time any of them is asked, they answer instantly. |
| Path halving | You don't tell everyone — you just tell every other person to "skip to your grandboss." The ladder shrinks each time someone asks. |
| Path splitting | Everyone you pass gets bumped up to report to their grandboss, and you keep climbing the old ladder. The org chart flattens steadily. |
| Why halving/splitting beats recursion | The recursive "tell everyone" version stacks up one held thought per level; on a 1,000,000-level ladder your brain (the call stack) overflows. The iterative versions hold nothing. |
Where the analogy breaks: in a real org chart you cannot instantly re-point reporting lines; in parent[] it is a single array write.
Pros & Cons¶
| Pros | Cons |
|---|---|
Turns O(n) finds into amortized O(log n) (alone) or O(α(n)) (with rank). | The first find on a deep node still pays the full walk. |
| Trivial to add: one line for full compression. | Full (recursive) compression can overflow the stack on huge near-linear trees. |
| Halving/splitting are iterative — no stack risk. | Mutates parent[] during a read-like find, which surprises beginners and breaks persistence. |
| Excellent cache behavior after flattening (short paths). | Cannot easily "undo" a compressed find — bad for rollback/persistent DSU (see senior.md). |
| Pairs perfectly with union by rank/size. | Concurrent compression needs care (races on parent[] writes — see senior.md). |
When to use: essentially always in a DSU. It is the default. Use halving or splitting for production safety.
When NOT to use: when you need a persistent or rollback-able DSU (compression destroys old structure), or in some lock-free concurrent settings where you must reason about writes carefully.
Step-by-Step Walkthrough¶
Start with a chain of 5 nodes. parent = [1, 2, 3, 4, 4] — that is 0→1→2→3→4, and 4 is the root.
Full compression: find(0)¶
Recursion climbs 0 → 1 → 2 → 3 → 4, finds root 4, then unwinds re-pointing each node:
parent before: [1, 2, 3, 4, 4]
parent after : [4, 4, 4, 4, 4]
4
/ | | \
0 1 2 3 every node now points directly at the root
A later find(0) is now a single hop. Done.
Path halving: find(0) on the same fresh chain¶
parent before: [1, 2, 3, 4, 4]
step: x=0, parent[0] = parent[parent[0]] = parent[1] = 2 ; x = 2
step: x=2, parent[2] = parent[parent[2]] = parent[3] = 4 ; x = 4
x == parent[x]=4 → stop, return 4
parent after : [2, 2, 4, 4, 4]
Node 0 now skips to 2, node 2 skips to 4. The path got shorter (length 4 → effectively length 2 from node 0). A second find(0) finishes the flattening.
Path splitting: find(0) on the same fresh chain¶
parent before: [1, 2, 3, 4, 4]
x=0: next=1; parent[0]=parent[1]=2; x=1
x=1: next=2; parent[1]=parent[2]=3; x=2
x=2: next=3; parent[2]=parent[3]=4; x=3
x=3: next=4; parent[3]=parent[4]=4; x=4
stop, return 4
parent after : [2, 3, 4, 4, 4]
Every node on the path now points to its old grandparent. One pass, no recursion, tree noticeably flatter.
Code Examples¶
All three variants below assume a DSU initialized with parent[i] = i.
Variant A — Full path compression (recursive)¶
Go¶
package main
import "fmt"
type DSU struct {
parent []int
}
func NewDSU(n int) *DSU {
p := make([]int, n)
for i := range p {
p[i] = i
}
return &DSU{parent: p}
}
// Find with full path compression (recursive one-liner).
func (d *DSU) Find(x int) int {
if d.parent[x] != x {
d.parent[x] = d.Find(d.parent[x]) // re-point x straight to root
}
return d.parent[x]
}
func (d *DSU) Union(a, b int) {
ra, rb := d.Find(a), d.Find(b)
if ra != rb {
d.parent[ra] = rb // arbitrary union (no rank here)
}
}
func main() {
d := NewDSU(5)
d.parent = []int{1, 2, 3, 4, 4} // build chain 0->1->2->3->4
fmt.Println(d.Find(0)) // 4
fmt.Println(d.parent) // [4 4 4 4 4] — fully flattened
}
Java¶
public class DSUFull {
private final int[] parent;
public DSUFull(int n) {
parent = new int[n];
for (int i = 0; i < n; i++) parent[i] = i;
}
// Full path compression (recursive).
public int find(int x) {
if (parent[x] != x) {
parent[x] = find(parent[x]); // re-point straight to root
}
return parent[x];
}
public void union(int a, int b) {
int ra = find(a), rb = find(b);
if (ra != rb) parent[ra] = rb;
}
public static void main(String[] args) {
DSUFull d = new DSUFull(5);
int[] chain = {1, 2, 3, 4, 4};
System.arraycopy(chain, 0, d.parent, 0, 5);
System.out.println(d.find(0)); // 4
System.out.println(java.util.Arrays.toString(d.parent)); // [4, 4, 4, 4, 4]
}
}
Python¶
import sys
class DSUFull:
def __init__(self, n: int):
self.parent = list(range(n))
def find(self, x: int) -> int:
# Full path compression (recursive). Watch the stack on deep chains.
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, a: int, b: int) -> None:
ra, rb = self.find(a), self.find(b)
if ra != rb:
self.parent[ra] = rb
if __name__ == "__main__":
sys.setrecursionlimit(1_000_000) # needed for deep chains!
d = DSUFull(5)
d.parent = [1, 2, 3, 4, 4] # chain 0->1->2->3->4
print(d.find(0)) # 4
print(d.parent) # [4, 4, 4, 4, 4]
Variant B — Path halving (iterative)¶
Go¶
func (d *DSU) FindHalving(x int) int {
for d.parent[x] != x {
d.parent[x] = d.parent[d.parent[x]] // point x to its grandparent
x = d.parent[x] // jump up to the new parent
}
return x
}
Java¶
public int findHalving(int x) {
while (parent[x] != x) {
parent[x] = parent[parent[x]]; // point to grandparent
x = parent[x]; // advance to new parent
}
return x;
}
Python¶
def find_halving(self, x: int) -> int:
while self.parent[x] != x:
self.parent[x] = self.parent[self.parent[x]] # grandparent
x = self.parent[x]
return x
Variant C — Path splitting (iterative)¶
Go¶
func (d *DSU) FindSplitting(x int) int {
for d.parent[x] != x {
next := d.parent[x]
d.parent[x] = d.parent[d.parent[x]] // x -> its grandparent
x = next // advance to the OLD parent
}
return x
}
Java¶
public int findSplitting(int x) {
while (parent[x] != x) {
int next = parent[x];
parent[x] = parent[parent[x]]; // x -> grandparent
x = next; // advance to old parent
}
return x;
}
Python¶
def find_splitting(self, x: int) -> int:
while self.parent[x] != x:
nxt = self.parent[x]
self.parent[x] = self.parent[self.parent[x]] # grandparent
x = nxt # old parent
return x
What they do: all three return the root of x's tree and flatten the path on the way. Full compression flattens it completely in one call; halving and splitting flatten it progressively without recursion. Run: go run main.go | javac DSUFull.java && java DSUFull | python dsu.py
Coding Patterns¶
Pattern 1: Default DSU = iterative compression + union by rank¶
In real code, never ship naive find. Pair iterative halving with union by rank/size:
class DSU:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x): # path halving
while self.parent[x] != x:
self.parent[x] = self.parent[self.parent[x]]
x = self.parent[x]
return x
def union(self, a, b):
ra, rb = self.find(a), self.find(b)
if ra == rb:
return
if self.rank[ra] < self.rank[rb]:
ra, rb = rb, ra
self.parent[rb] = ra
if self.rank[ra] == self.rank[rb]:
self.rank[ra] += 1
Pattern 2: Connectivity query¶
def connected(self, a, b):
return self.find(a) == self.find(b) # both finds compress as a side effect
Pattern 3: Counting components¶
Error Handling¶
| Error | Cause | Fix |
|---|---|---|
RecursionError / stack overflow | Recursive full compression on a very deep chain. | Use iterative halving/splitting, or raise the recursion limit (Python) only as a stopgap. |
IndexOutOfRange in find | x is not a valid element index. | Validate 0 <= x < n before calling, or size the array correctly. |
| Wrong root returned | You forgot to update x after re-pointing in the iterative loop. | After parent[x] = parent[parent[x]], set x = parent[x] (halving) or x = next (splitting). |
Infinite loop in find | parent[] has a cycle that does not include a self-loop (a corrupt DSU). | Only union should write parents; a root must satisfy parent[r] == r. |
| Compression "lost" between calls | You copied the DSU by value and compressed the copy. | Pass the DSU by reference/pointer so parent[] mutations persist. |
Performance Tips¶
- Prefer iterative halving or splitting in production. They are one pass, no recursion, no stack risk, and just as fast asymptotically.
- Always combine with union by rank/size. Compression alone is
O(log n)amortized; with rank it isO(α(n)). - Don't pre-flatten. You cannot flatten a node before you walk it; the first walk is unavoidable. Compression makes the next walks cheap.
- Batch queries when possible. If you will query the same nodes repeatedly, the first compressed
findmakes all later ones near-O(1). - Use a flat
int[]/slice/list, not objects with pointers, for cache locality.
Best Practices¶
- Initialize
parent[i] = ifor alli; a root is its own parent. - Make
findthe only place compression happens; keepunioncallingfindfirst. - Document which variant you use at the top of the file — halving and splitting look almost identical and reviewers will ask.
- In Python, if you must use recursive full compression, call
sys.setrecursionlimitand know the depth; otherwise switch to iterative. - Test the invariant: after any sequence of operations,
find(x)for two elements in the same group must return the same root.
Edge Cases & Pitfalls¶
- Single element —
find(x)on a root returnsximmediately; no compression needed. - Already flat — compression on a node that already points at the root is a no-op extra read.
- Self-find —
find(root)must returnroot; ensureparent[root] == root. - Deep chain in Python — recursive compression on
n = 10^6will crash without a raised limit. Use iterative. - Compression during iteration — if you loop
for i in range(n): find(i)you are mutatingparent[]as you go; that is fine and intended, but be aware roots may shift representative only viaunion, notfind. - No union by rank — compression alone still leaves you at
O(log n); that is acceptable but not optimal.
Common Mistakes¶
- Forgetting
x = parent[x]after re-pointing in halving — the loop then re-reads the same node and may loop forever or return the wrong node. - Mixing up halving and splitting — in splitting you advance to the old parent (
next), not the new one. Getting this wrong still works but changes the flattening pattern. - Returning
xfrom the recursive version instead ofparent[x]— after compressionparent[x]is the root; returning the pre-compressionxis just wrong. - Using recursive compression on huge inputs and getting a stack overflow in production.
- Compressing in a persistent/rollback DSU — compression overwrites old parents and you can no longer undo (see
senior.md). - Calling
unionwithoutfindfirst — you must link roots, not arbitrary nodes, or you corrupt the forest.
Cheat Sheet¶
| Variant | One-liner idea | Passes | Recursion | Amortized (with rank) |
|---|---|---|---|---|
| Full compression | parent[x] = find(parent[x]) | 2 | yes | O(α(n)) |
| Path halving | parent[x] = parent[parent[x]]; x = parent[x] | 1 | no | O(α(n)) |
| Path splitting | next = parent[x]; parent[x] = parent[parent[x]]; x = next | 1 | no | O(α(n)) |
Naive find (no compression) : O(n) worst case
Compression alone (arbitrary union) : O(log n) amortized
Compression + union by rank/size : O(α(n)) amortized (≈ constant)
Root test: parent[r] == r. Initialize: parent[i] = i.
Visual Animation¶
See
animation.htmlfor an interactive visual animation of path compression.The animation demonstrates: - A tall DSU chain flattened by a single
find- Toggle between full compression / halving / splitting modes - The traversed path highlighted node by node - Theparent[]array view updating live as nodes are re-pointed - A Reset button and step-by-step narration
Summary¶
Path compression is the Find-side optimization that makes Union-Find fast. The naive find walks O(height) and degrades to O(n) on a chain. By re-pointing visited nodes closer to the root, you flatten the tree so future finds are near-instant. There are three classic variants — full compression (recursive, flattens completely), path halving, and path splitting (both iterative, point each node to its grandparent). All three are asymptotically equal; halving and splitting avoid recursion and the stack-overflow risk. Compression alone gives amortized O(log n); combined with union by rank/size it gives O(α(n)) — effectively constant. In production, default to iterative halving plus union by rank.
Further Reading¶
- Introduction to Algorithms (CLRS), Chapter 21 — "Data Structures for Disjoint Sets."
- R. E. Tarjan, "Efficiency of a Good But Not Linear Set Union Algorithm," JACM 22(2), 1975 — the
O(α(n))result. - Sibling topic
01-union-find— the basic DSU this builds on. - Sibling topic
03-union-by-rank— the union-side optimization and the combined amortized proof. - cp-algorithms.com — "Disjoint Set Union" — practical implementations and variants.