Prüfer Code — Middle Level¶
One-line summary: Beyond "peel the smallest leaf," the Prüfer code is a bijection you can compute in linear time with a pointer trick, generalize to forests and prescribed-degree trees (multinomial coefficients), and exploit for uniform random tree sampling and tree compression.
Table of Contents¶
- Recap and Mental Model
- Comparison with Alternatives
- The O(n) Pointer Trick
- Advanced Patterns
- Generalizations
- Code Examples
- Complexity Deep-Dive
- Testing the Bijection
- Pitfalls at This Level
- Practice Checklist
- Summary
Recap and Mental Model¶
A Prüfer code is a sequence of n − 2 labels that encodes a labeled tree on vertices 1..n. Two ideas drive everything:
- Encode removes the smallest leaf
n − 2times, recording each removed leaf's unique neighbor. - Decode reverses it using the invariant
appearances(v) = degree(v) − 1.
At the junior level we used a heap (O(n log n)). At this level we care about three upgrades:
- a linear-time
O(n)algorithm, - the generalizations (forests, prescribed degrees, complete bipartite trees),
- the applications (uniform sampling, counting, compression) and how to test them.
Keep this invariant table in mind:
encode: code[i] = neighbor of i-th smallest-leaf removal
decode: degree[v] = count(v in code) + 1
# of times v appears = degree(v) - 1
vertices never in code = leaves of the tree
last vertex n = ALWAYS one of the final two (never the
smallest leaf if it has high degree); in
the standard algorithm n is never recorded
as a leaf removal because we stop at 2 nodes
Comparison with Alternatives¶
How does a Prüfer code stack up against other ways to represent or count labeled trees?
| Representation | Size | Canonical? | Reversible? | Random-uniform draw | Notes |
|---|---|---|---|---|---|
| Edge list | n − 1 pairs | No (order/labeling varies) | Yes | Hard (rejection) | Natural but verbose |
| Adjacency matrix | n² bits | No | Yes | Hard | Wasteful for sparse trees |
| Prüfer code | n − 2 ints | Yes | Yes | O(n) trivial | The canonical compact form |
| Parent array (rooted) | n − 1 ints | Depends on root | Yes | Easy if root fixed | Encodes rooted trees; n^(n−1) of them |
| Nested parentheses | O(n) chars | For ordered trees | Yes | — | For ordered/unlabeled trees, not labeled |
Two counting facts worth contrasting:
- Labeled trees on
nvertices:n^(n−2)(Cayley) — Prüfer code lengthn − 2. - Labeled rooted trees:
n^(n−1)(choose a root × Cayley) — matches a parent-array of lengthn − 1.
And vs. the sibling 24-kirchhoff-theorem: Kirchhoff counts spanning trees of any graph via a determinant; Prüfer only handles the complete graph K_n but does so constructively (it builds the trees, not just counts them).
The O(n) Pointer Trick¶
The heap is O(n log n). We can get O(n) by observing a subtle monotonicity.
Encoding in O(n). Maintain degree[]. Walk a pointer ptr from 1 upward to find the smallest leaf. Key insight:
When you remove a leaf and decrement its neighbor
u's degree,ubecomes a new leaf only ifdegree[u]hits 1. Ifu < ptr(already passed), process it immediately in a tight inner loop. Otherwise the outerptrwalk will reach it naturally. The pointer never moves backward overall, so total work isO(n).
ptr = 1
leaf = 1
repeat n-2 times:
# find next leaf
while degree[leaf] != 1:
ptr += 1
leaf = ptr
u = unique remaining neighbor of leaf # via adjacency or sum trick
code.append(u)
degree[leaf] = 0
degree[u] -= 1
if degree[u] == 1 and u < ptr:
leaf = u # process the small new leaf right away
else:
ptr += 1
leaf = ptr
Finding "the unique remaining neighbor" in O(1) without scanning is the trick: keep a per-vertex running structure (set, or for a rooted tree the parent pointer). A common competitive-programming shortcut for decoding avoids adjacency entirely (degrees only), and the same pointer monotonicity gives O(n) there too.
Decoding in O(n). Same idea — the smallest available leaf index only moves forward except when a freshly-created leaf is smaller than the current pointer, and that happens at most once per code entry, so amortized O(n).
degree[v] = count(v in code) + 1
ptr = smallest v with degree==1
leaf = ptr
for x in code:
edge(leaf, x)
degree[x] -= 1
if degree[x] == 1 and x < ptr:
leaf = x
else:
ptr += 1 until degree[ptr]==1; leaf = ptr
connect last two degree-1 vertices
Advanced Patterns¶
Pattern 1 — Uniform random labeled tree¶
Because the map is a bijection, sampling a code uniformly samples a tree uniformly.
random_labeled_tree(n):
code = [ random int in 1..n for _ in range(n-2) ]
return prufer_decode(code)
No rejection, no bias, O(n). This is the canonical answer to "generate a random spanning tree of K_n."
Pattern 2 — Counting trees with prescribed degrees¶
The number of labeled trees where vertex i has degree d_i (with Σ d_i = 2(n−1)) equals the multinomial coefficient
because such a tree's Prüfer code is exactly a sequence where label i appears d_i − 1 times. Counting those sequences is "arrange a multiset" → multinomial.
Pattern 3 — Spanning trees of complete bipartite graphs¶
K_{m,n} has m^(n−1) · n^(m−1) spanning trees — derivable via a bipartite Prüfer-style code. (The general multipartite count follows similarly.) Use this as a closed form instead of running Kirchhoff on a huge matrix.
Pattern 4 — Tree compression / canonical hashing¶
Store or hash a labeled tree by its Prüfer code (n − 2 ints). Two trees are equal iff their codes are equal — O(n) equality and a natural hash key.
Generalizations¶
| Object | Code length / form | Count |
|---|---|---|
Labeled tree on n | n − 2 over {1..n} | n^(n−2) |
| Labeled rooted tree | parent array, n − 1 | n^(n−1) |
Tree with degrees d_i | code with i repeated d_i−1 | multinomial above |
Labeled forest with k trees, roots r_1..r_k | generalized Prüfer of length n − k − 1 | k · n^(n−k−1) |
Spanning trees of K_{m,n} | bipartite Prüfer | m^(n−1) n^(m−1) |
The forest result: the number of labeled forests on n vertices consisting of k trees, where k specified vertices are the roots (one per tree), is k · n^(n−k−1). Setting k = 1 recovers Cayley.
Code Examples¶
O(n) pointer-trick decode (degrees only, no adjacency)¶
package main
import "fmt"
type Edge struct{ U, V int }
// DecodeLinear rebuilds a tree from a Prüfer code in O(n).
func DecodeLinear(code []int) []Edge {
n := len(code) + 2
degree := make([]int, n+1)
for v := 1; v <= n; v++ {
degree[v] = 1
}
for _, x := range code {
degree[x]++
}
// ptr: smallest index that is currently a leaf
ptr := 1
for degree[ptr] != 1 {
ptr++
}
leaf := ptr
edges := make([]Edge, 0, n-1)
for _, x := range code {
edges = append(edges, Edge{leaf, x})
degree[leaf]--
degree[x]--
if degree[x] == 1 && x < ptr {
leaf = x
} else {
ptr++
for degree[ptr] != 1 {
ptr++
}
leaf = ptr
}
}
// last edge: the two remaining degree-1 vertices
u := -1
for v := 1; v <= n; v++ {
if degree[v] == 1 {
if u == -1 {
u = v
} else {
edges = append(edges, Edge{u, v})
break
}
}
}
return edges
}
func main() {
fmt.Println(DecodeLinear([]int{4, 4, 6, 3}))
}
import java.util.*;
public class DecodeLinear {
public static int[][] decode(int[] code) {
int n = code.length + 2;
int[] degree = new int[n + 1];
Arrays.fill(degree, 1, n + 1, 1);
for (int x : code) degree[x]++;
int ptr = 1;
while (degree[ptr] != 1) ptr++;
int leaf = ptr;
int[][] edges = new int[n - 1][2];
int e = 0;
for (int x : code) {
edges[e][0] = leaf;
edges[e][1] = x;
e++;
degree[leaf]--;
degree[x]--;
if (degree[x] == 1 && x < ptr) {
leaf = x;
} else {
ptr++;
while (degree[ptr] != 1) ptr++;
leaf = ptr;
}
}
int u = -1;
for (int v = 1; v <= n; v++) {
if (degree[v] == 1) {
if (u == -1) u = v;
else { edges[e][0] = u; edges[e][1] = v; break; }
}
}
return edges;
}
public static void main(String[] args) {
for (int[] ed : decode(new int[]{4, 4, 6, 3}))
System.out.println(Arrays.toString(ed));
}
}
def decode_linear(code):
n = len(code) + 2
degree = [0] + [1] * n
for x in code:
degree[x] += 1
ptr = 1
while degree[ptr] != 1:
ptr += 1
leaf = ptr
edges = []
for x in code:
edges.append((leaf, x))
degree[leaf] -= 1
degree[x] -= 1
if degree[x] == 1 and x < ptr:
leaf = x
else:
ptr += 1
while degree[ptr] != 1:
ptr += 1
leaf = ptr
rem = [v for v in range(1, n + 1) if degree[v] == 1]
edges.append((rem[0], rem[1]))
return edges
if __name__ == "__main__":
print(decode_linear([4, 4, 6, 3]))
Counting trees with prescribed degrees¶
package main
import "fmt"
func factorial(n int) int {
r := 1
for i := 2; i <= n; i++ {
r *= i
}
return r
}
// CountTreesWithDegrees: # labeled trees with deg[i] for i=1..n.
// Requires sum(deg) == 2*(n-1); else 0.
func CountTreesWithDegrees(deg []int) int {
n := len(deg)
sum := 0
for _, d := range deg {
if d < 1 {
return 0
}
sum += d
}
if sum != 2*(n-1) {
return 0
}
res := factorial(n - 2)
for _, d := range deg {
res /= factorial(d - 1)
}
return res
}
func main() {
// n=4, degrees [1,2,2,1]: a path. Multinomial 2!/(0!1!1!0!) = 2.
fmt.Println(CountTreesWithDegrees([]int{1, 2, 2, 1})) // 2
}
public class CountTrees {
static long fact(int n) {
long r = 1;
for (int i = 2; i <= n; i++) r *= i;
return r;
}
static long countTreesWithDegrees(int[] deg) {
int n = deg.length, sum = 0;
for (int d : deg) { if (d < 1) return 0; sum += d; }
if (sum != 2 * (n - 1)) return 0;
long res = fact(n - 2);
for (int d : deg) res /= fact(d - 1);
return res;
}
public static void main(String[] args) {
System.out.println(countTreesWithDegrees(new int[]{1, 2, 2, 1})); // 2
}
}
from math import factorial
def count_trees_with_degrees(deg):
n = len(deg)
if any(d < 1 for d in deg) or sum(deg) != 2 * (n - 1):
return 0
res = factorial(n - 2)
for d in deg:
res //= factorial(d - 1)
return res
if __name__ == "__main__":
print(count_trees_with_degrees([1, 2, 2, 1])) # 2
Uniform random labeled tree¶
package main
import (
"fmt"
"math/rand"
)
func RandomLabeledTree(n int) []Edge {
if n == 1 {
return nil
}
if n == 2 {
return []Edge{{1, 2}}
}
code := make([]int, n-2)
for i := range code {
code[i] = rand.Intn(n) + 1 // 1..n
}
return DecodeLinear(code)
}
func main() {
rand.Seed(1)
fmt.Println(RandomLabeledTree(6))
}
import java.util.*;
public class RandomTree {
static int[][] randomLabeledTree(int n, Random rng) {
if (n == 2) return new int[][]{{1, 2}};
int[] code = new int[n - 2];
for (int i = 0; i < code.length; i++) code[i] = rng.nextInt(n) + 1;
return DecodeLinear.decode(code);
}
public static void main(String[] args) {
for (int[] e : randomLabeledTree(6, new Random(1)))
System.out.println(Arrays.toString(e));
}
}
import random
def random_labeled_tree(n, rng=random):
if n == 1:
return []
if n == 2:
return [(1, 2)]
code = [rng.randint(1, n) for _ in range(n - 2)]
return decode_linear(code)
if __name__ == "__main__":
random.seed(1)
print(random_labeled_tree(6))
Complexity Deep-Dive¶
| Variant | Time | Space | When to use |
|---|---|---|---|
| Heap encode/decode | O(n log n) | O(n) | Clear, default; fine to n ≈ 10^6 |
| Pointer-trick encode/decode | O(n) | O(n) | Tight loops, very large n, contests |
| Count w/ degrees | O(n) (with big-int for large n) | O(1) | Closed-form counting |
| Random tree | O(n) | O(n) | Uniform sampling, fuzz tests |
Why the pointer trick is truly O(n): the pointer ptr is non-decreasing across the whole run and bounded by n; the only "extra" leaf processed out of order (x < ptr) is paid for by the code entry that created it — at most one per entry. Amortized total: O(n).
Testing the Bijection¶
The single most valuable test is the round-trip property (see sibling skill property-based-testing):
def test_round_trip(n, trials=2000):
import random
for _ in range(trials):
# build a random tree directly, encode, decode, compare edge sets
code = [random.randint(1, n) for _ in range(n - 2)]
edges = decode_linear(code)
# re-encode and check we get the same code back
adj = {v: [] for v in range(1, n + 1)}
for u, w in edges:
adj[u].append(w); adj[w].append(u)
assert prufer_encode(n, adj) == code
This checks encode(decode(seq)) == seq for random sequences, which (because decode is onto trees) also validates decode(encode(tree)) == tree.
A second test: count check. Enumerate all n^(n−2) codes for small n (e.g. n = 4 → 16 codes), decode each, and assert all decoded trees are distinct and all are valid trees. This empirically confirms the bijection and Cayley's count.
Pitfalls at This Level¶
- Big factorials overflow.
CountTreesWithDegreesusesint; forn > 20use big integers (Python is safe by default; Go needsmath/big; JavaBigInteger). rand.Intn(n) + 1— forgetting the+1yields label 0, which is out of range.- Pointer trick correctness hinges on the
x < ptrcondition — using<=or omitting it reintroduces wrong leaves. - Forest variant indexing — the length is
n − k − 1, notn − 2; mixing them silently corrupts results. - Equality by edge set, not edge list — order and endpoint order differ; normalize unordered pairs.
Practice Checklist¶
- Implement both heap and pointer-trick decode; assert they agree on random inputs.
- Write
count_trees_with_degreesand verifyΣ over all degree sequences = n^(n−2)forn = 5. - Generate 10⁵ random trees on
n = 7; check the empirical distribution of, say, vertex-1 degree against the theoretical(n−1 choose d−1)(n−1)^...ratio. - Derive
K_{2,3}spanning-tree count2^(3−1)·3^(2−1) = 4·3 = 12and verify by brute force. - Handle
n = 1andn = 2in every function.
Summary¶
At the middle level, the Prüfer code becomes a toolkit: a linear-time encode/decode via the pointer trick, a family of generalizations (rooted trees n^(n−1), prescribed-degree multinomials, forests k·n^(n−k−1), bipartite m^(n−1)n^(m−1)), and concrete applications (uniform random trees in O(n), canonical hashing, compression). Test everything with the round-trip property and small-n exhaustive counts. Compared to alternatives, the Prüfer code is the uniquely canonical, compact, reversible, and uniformly-samplable representation of labeled trees — and the constructive counterpart to the determinant-based 24-kirchhoff-theorem.