Prüfer Code — Junior Level¶
One-line summary: A Prüfer code (or Prüfer sequence) is a unique sequence of
n − 2numbers that encodes any labeled tree onnvertices, and from that sequence you can rebuild the exact same tree — a perfect, reversible "fingerprint" of the tree.
Table of Contents¶
- Introduction
- Prerequisites
- Glossary
- Core Concepts
- Big-O Summary
- Real-World Analogies
- Pros & Cons
- Step-by-Step Walkthrough
- Code Examples
- Coding Patterns
- Error Handling
- Performance Tips
- Best Practices
- Edge Cases & Pitfalls
- Common Mistakes
- Cheat Sheet
- Visual Animation
- Summary
- Further Reading
Introduction¶
Imagine you have a tree — not a binary tree, just a connected graph with no cycles — drawn on a whiteboard, with each of its n vertices wearing a distinct number badge from 1 to n. These are called labeled trees, because the vertices have identities (labels), not just shapes.
Now ask a strange-sounding question: can I write down a short list of numbers that completely captures this tree, so that a friend who never saw my drawing can reproduce it exactly?
The answer, discovered by Heinz Prüfer in 1918, is a resounding yes. Every labeled tree on n vertices corresponds to exactly one sequence of n − 2 numbers, each between 1 and n. And every such sequence corresponds to exactly one tree. This perfect one-to-one pairing is called a bijection.
The encoding rule is almost suspiciously simple:
Repeatedly find the leaf with the smallest label, write down the label of its only neighbor, and delete that leaf. Stop when only two vertices remain.
That's it. After n − 2 deletions you have written down n − 2 numbers — the Prüfer code.
Why should anyone care? Three big reasons:
- Counting. Because the bijection is exact, the number of labeled trees on
nvertices equals the number of sequences of lengthn − 2over{1, …, n}, which isn^(n−2). This is the famous Cayley's formula — and the Prüfer code is its most beautiful proof. - Random generation. Want a uniformly random labeled tree? Just pick
n − 2random numbers and decode. Done. No rejection sampling, no bias. - Compression. A tree on
nvertices normally needsn − 1edges (pairs of numbers). The Prüfer code stores it inn − 2single numbers — a compact, canonical form.
A sibling topic, 24-kirchhoff-theorem, proves Cayley's formula a completely different way (via a determinant). The Prüfer code gives you the constructive, hands-on proof.
Prerequisites¶
Before reading this file you should be comfortable with:
- What a graph is — vertices and edges.
- What a tree is — a connected graph with no cycles. A tree on
nvertices always has exactlyn − 1edges. - Leaf — a vertex with exactly one edge (degree 1).
- Degree — the number of edges touching a vertex.
- Arrays / lists and basic loops.
- Adjacency lists — storing a graph as "for each vertex, the list of its neighbors."
Optional but helpful:
- A min-heap / priority queue (used to make encoding fast) — see sibling section 10-heaps.
- Light exposure to combinatorics (
n^(n−2), multinomial coefficients) for the counting applications.
Glossary¶
| Term | Meaning |
|---|---|
| Labeled tree | A tree whose n vertices carry distinct labels 1..n. Two labeled trees differ if any edge differs, even if the drawings look the same. |
| Prüfer code / Prüfer sequence | The length n − 2 sequence produced by repeatedly removing the smallest-labeled leaf and recording its neighbor. |
| Leaf | A vertex of degree 1 (exactly one neighbor). |
| Degree | Number of neighbors of a vertex. |
| Encode | Tree → sequence direction. Peel leaves. |
| Decode | Sequence → tree direction. Rebuild edges. |
| Bijection | A perfect one-to-one pairing: every tree ↔ exactly one sequence, no tree or sequence left out, none shared. |
| Cayley's formula | The count n^(n−2) of labeled trees on n vertices. |
| 1-indexed | Labels run 1..n (the classic convention). Watch out: arrays in Go/Java/Python are 0-indexed. |
Core Concepts¶
1. A tree is more than its shape — the labels matter¶
Consider these two trees on 3 vertices:
As shapes they are identical (a path of 3 nodes). But as labeled trees they are different, because the middle vertex is 2 in the first and 1 in the second. Prüfer codes distinguish labeled trees, so these two get different codes.
For n = 3 the code length is n − 2 = 1. The code is simply the label of the center vertex, because that center is the neighbor of the smallest leaf. There are exactly 3^(3−2) = 3^1 = 3 labeled trees on 3 vertices, and indeed there are 3 possible single-number codes: [1], [2], [3].
2. Encoding: peel the smallest leaf¶
The encode algorithm is a loop that runs n − 2 times:
while more than 2 vertices remain:
leaf = the leaf with the smallest label
code.append(the single neighbor of leaf)
remove leaf (and its edge)
Every step removes one vertex, so after n − 2 steps exactly 2 vertices are left — and we stop. We never record anything about those final two; the last edge is implied.
3. Decoding: degree counting¶
To rebuild the tree we use a key fact:
In the final tree, vertex
vappears in the Prüfer code exactlydegree(v) − 1times.
So if we count appearances, we recover every degree: degree(v) = (appearances of v in code) + 1. A vertex that never appears in the code has degree 1 — it is a leaf of the original tree.
Decoding then mirrors encoding:
compute degree[v] = count(v in code) + 1 for all v
for each value x in the code (left to right):
leaf = smallest vertex with degree == 1
add edge (leaf, x)
degree[leaf] -= 1 # now 0, removed
degree[x] -= 1 # x may become a leaf
finally, connect the two remaining vertices with degree 1
4. The bijection¶
Encoding and decoding are exact inverses: decode(encode(tree)) == tree and encode(decode(seq)) == seq. Because each direction is deterministic and total, the trees and the sequences line up one-for-one. That is the whole magic.
Big-O Summary¶
| Operation | Naive | Optimized | Space |
|---|---|---|---|
| Encode (find smallest leaf each step) | O(n²) linear scan | O(n log n) with a min-heap, or O(n) with a pointer trick | O(n) |
| Decode | O(n²) naive scan | O(n log n) heap / O(n) pointer | O(n) |
| Count trees (Cayley) | — | O(1) formula n^(n−2) | O(1) |
| Random tree | — | encode-free: O(n) random ints + decode | O(n) |
The O(n) linear variants exist because the smallest leaf only ever moves forward when you process the code left to right — that's the pointer trick explained in middle.md.
Real-World Analogies¶
- A recipe vs. the finished dish. The Prüfer code is the recipe (a short ordered list of instructions); the tree is the dish. Same recipe → same dish, every time.
- A ZIP file for trees. Encoding compresses; decoding decompresses. It is lossless — you get the exact same tree back.
- A lottery for trees. Pick
n − 2random numbers, decode, and you have drawn a tree perfectly uniformly at random from alln^(n−2)of them — like a fair lottery where every ticket (tree) is equally likely. - Library call numbers. Two physically identical-looking books are still distinct items with distinct call numbers. Labels make trees distinct even when their shapes match.
Pros & Cons¶
Pros
- Compact: stores a tree in
n − 2integers instead ofn − 1edges. - Canonical: each tree has exactly one code — great for hashing, deduplication, equality checks.
- Reversible: lossless round-trip.
- Enables uniform random tree sampling in
O(n)with zero rejection. - Proves Cayley's formula constructively.
Cons
- Only works for labeled trees — not unlabeled (shape-only) trees, not general graphs, not forests directly (though forests have a variant).
- Loses local structure in a non-intuitive way — neighboring code entries don't correspond to neighboring vertices.
- Edge cases at
n = 1andn = 2need special handling (the code is empty). - Off-by-one bugs are easy: 1-indexed labels vs 0-indexed arrays.
Step-by-Step Walkthrough¶
Let's encode this tree on n = 6 vertices (labels 1..6):
Adjacency (neighbors):
Leaves (degree 1) are: 1, 2, 5. We always peel the smallest leaf.
Step 1. Leaves = {1, 2, 5}. Smallest = 1. Its neighbor is 4. Record 4. Delete vertex 1. Code so far: [4]. Remaining degrees drop: 4 now has neighbors [2, 6].
Step 2. Leaves now = {2, 5}. Smallest = 2. Neighbor = 4. Record 4. Delete vertex 2. Code: [4, 4]. Now 4 has only neighbor [6] → 4 becomes a leaf.
Step 3. Leaves = {4, 5}. Smallest = 4. Neighbor = 6. Record 6. Delete vertex 4. Code: [4, 4, 6].
Step 4. Leaves = {5} (and 6 is now also a leaf). Vertices remaining: 6, 3, 5. Leaves = {5, 6}. Smallest = 5. Neighbor = 3. Record 3. Delete 5. Code: [4, 4, 6, 3].
Stop. Only 2 vertices remain (6 and 3). We have written n − 2 = 4 numbers.
Final Prüfer code: [4, 4, 6, 3].
Sanity check via degrees: count appearances → 3 appears 1×, 4 appears 2×, 6 appears 1×, others 0×. So degree(3) = 2, degree(4) = 3, degree(6) = 2, and degree(1) = degree(2) = degree(5) = 1. Compare with the original picture — exactly right.
Decoding it back¶
Start from code [4, 4, 6, 3], n = 6.
Degrees: deg = [_, 1, 1, 2, 3, 1, 2] (index by label; deg[v] = appearances + 1).
| Step | Code value x | Smallest leaf (deg=1) | Edge added | deg updates |
|---|---|---|---|---|
| 1 | 4 | 1 | (1,4) | deg[1]→0, deg[4]→2 |
| 2 | 4 | 2 | (2,4) | deg[2]→0, deg[4]→1 |
| 3 | 6 | 4 | (4,6) | deg[4]→0, deg[6]→1 |
| 4 | 3 | 5 | (5,3) | deg[5]→0, deg[3]→1 |
Two vertices still have degree 1: 3 and 6. Add the final edge (3, 6).
Edges produced: (1,4), (2,4), (4,6), (5,3), (3,6) — identical to the original tree. The round-trip works.
Code Examples¶
Encode: smallest-leaf peeling (heap-based, clear version)¶
package main
import (
"container/heap"
"fmt"
)
// IntHeap is a min-heap of ints.
type IntHeap []int
func (h IntHeap) Len() int { return len(h) }
func (h IntHeap) Less(i, j int) bool { return h[i] < h[j] }
func (h IntHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *IntHeap) Push(x interface{}) { *h = append(*h, x.(int)) }
func (h *IntHeap) Pop() interface{} {
old := *h
n := len(old)
x := old[n-1]
*h = old[:n-1]
return x
}
// PruferEncode takes adjacency for vertices 1..n and returns the code (length n-2).
func PruferEncode(n int, adj map[int][]int) []int {
degree := make([]int, n+1)
nb := make(map[int]map[int]bool) // neighbor sets, mutable
for v := 1; v <= n; v++ {
nb[v] = map[int]bool{}
}
for v, list := range adj {
for _, u := range list {
nb[v][u] = true
degree[v]++
}
}
h := &IntHeap{}
heap.Init(h)
for v := 1; v <= n; v++ {
if degree[v] == 1 {
heap.Push(h, v)
}
}
code := make([]int, 0, n-2)
for len(code) < n-2 {
leaf := heap.Pop(h).(int)
// leaf has exactly one remaining neighbor
var neighbor int
for u := range nb[leaf] {
neighbor = u
}
code = append(code, neighbor)
// remove the edge
delete(nb[neighbor], leaf)
delete(nb[leaf], neighbor)
degree[neighbor]--
if degree[neighbor] == 1 {
heap.Push(h, neighbor)
}
}
return code
}
func main() {
adj := map[int][]int{
1: {4}, 2: {4}, 3: {6, 5},
4: {1, 2, 6}, 5: {3}, 6: {4, 3},
}
fmt.Println(PruferEncode(6, adj)) // [4 4 6 3]
}
import java.util.*;
public class PruferEncode {
// adj: 1-indexed adjacency lists for vertices 1..n
public static int[] encode(int n, List<List<Integer>> adj) {
int[] degree = new int[n + 1];
// mutable neighbor sets
List<Set<Integer>> nb = new ArrayList<>();
for (int i = 0; i <= n; i++) nb.add(new HashSet<>());
for (int v = 1; v <= n; v++) {
for (int u : adj.get(v)) {
nb.get(v).add(u);
degree[v]++;
}
}
PriorityQueue<Integer> pq = new PriorityQueue<>();
for (int v = 1; v <= n; v++) {
if (degree[v] == 1) pq.add(v);
}
int[] code = new int[Math.max(0, n - 2)];
int idx = 0;
while (idx < n - 2) {
int leaf = pq.poll();
int neighbor = nb.get(leaf).iterator().next();
code[idx++] = neighbor;
nb.get(neighbor).remove(leaf);
nb.get(leaf).remove(neighbor);
if (--degree[neighbor] == 1) pq.add(neighbor);
}
return code;
}
public static void main(String[] args) {
List<List<Integer>> adj = new ArrayList<>();
for (int i = 0; i <= 6; i++) adj.add(new ArrayList<>());
adj.get(1).add(4); adj.get(2).add(4);
adj.get(3).addAll(List.of(6, 5));
adj.get(4).addAll(List.of(1, 2, 6));
adj.get(5).add(3);
adj.get(6).addAll(List.of(4, 3));
System.out.println(Arrays.toString(encode(6, adj))); // [4, 4, 6, 3]
}
}
import heapq
def prufer_encode(n, adj):
"""adj: dict {vertex(1..n): [neighbors]}. Returns code of length n-2."""
nb = {v: set(adj.get(v, [])) for v in range(1, n + 1)}
degree = {v: len(nb[v]) for v in range(1, n + 1)}
leaves = [v for v in range(1, n + 1) if degree[v] == 1]
heapq.heapify(leaves)
code = []
while len(code) < n - 2:
leaf = heapq.heappop(leaves)
neighbor = next(iter(nb[leaf])) # the only remaining neighbor
code.append(neighbor)
nb[neighbor].discard(leaf)
nb[leaf].discard(neighbor)
degree[neighbor] -= 1
if degree[neighbor] == 1:
heapq.heappush(leaves, neighbor)
return code
if __name__ == "__main__":
adj = {1: [4], 2: [4], 3: [6, 5], 4: [1, 2, 6], 5: [3], 6: [4, 3]}
print(prufer_encode(6, adj)) # [4, 4, 6, 3]
Decode: degree-counting reconstruction¶
package main
import (
"container/heap"
"fmt"
)
type IntHeap []int
func (h IntHeap) Len() int { return len(h) }
func (h IntHeap) Less(i, j int) bool { return h[i] < h[j] }
func (h IntHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *IntHeap) Push(x interface{}) { *h = append(*h, x.(int)) }
func (h *IntHeap) Pop() interface{} {
old := *h
x := old[len(old)-1]
*h = old[:len(old)-1]
return x
}
type Edge struct{ U, V int }
// PruferDecode rebuilds the tree (n = len(code)+2) as a list of edges.
func PruferDecode(code []int) []Edge {
n := len(code) + 2
degree := make([]int, n+1)
for v := 1; v <= n; v++ {
degree[v] = 1
}
for _, x := range code {
degree[x]++
}
h := &IntHeap{}
for v := 1; v <= n; v++ {
if degree[v] == 1 {
heap.Push(h, v)
}
}
edges := make([]Edge, 0, n-1)
for _, x := range code {
leaf := heap.Pop(h).(int)
edges = append(edges, Edge{leaf, x})
degree[leaf]--
degree[x]--
if degree[x] == 1 {
heap.Push(h, x)
}
}
// final edge between the two remaining degree-1 vertices
u := heap.Pop(h).(int)
v := heap.Pop(h).(int)
edges = append(edges, Edge{u, v})
return edges
}
func main() {
fmt.Println(PruferDecode([]int{4, 4, 6, 3}))
// [{1 4} {2 4} {4 6} {5 3} {3 6}]
}
import java.util.*;
public class PruferDecode {
public static int[][] decode(int[] code) {
int n = code.length + 2;
int[] degree = new int[n + 1];
Arrays.fill(degree, 1, n + 1, 1);
for (int x : code) degree[x]++;
PriorityQueue<Integer> pq = new PriorityQueue<>();
for (int v = 1; v <= n; v++) {
if (degree[v] == 1) pq.add(v);
}
int[][] edges = new int[n - 1][2];
int e = 0;
for (int x : code) {
int leaf = pq.poll();
edges[e][0] = leaf;
edges[e][1] = x;
e++;
degree[leaf]--;
if (--degree[x] == 1) pq.add(x);
}
edges[e][0] = pq.poll();
edges[e][1] = pq.poll();
return edges;
}
public static void main(String[] args) {
int[][] edges = decode(new int[]{4, 4, 6, 3});
for (int[] ed : edges) System.out.println(Arrays.toString(ed));
// [1,4] [2,4] [4,6] [5,3] [3,6]
}
}
import heapq
def prufer_decode(code):
"""Returns list of edges for the tree on n = len(code)+2 vertices."""
n = len(code) + 2
degree = [0] + [1] * n # degree[1..n] = 1 initially
for x in code:
degree[x] += 1
leaves = [v for v in range(1, n + 1) if degree[v] == 1]
heapq.heapify(leaves)
edges = []
for x in code:
leaf = heapq.heappop(leaves)
edges.append((leaf, x))
degree[leaf] -= 1
degree[x] -= 1
if degree[x] == 1:
heapq.heappush(leaves, x)
u = heapq.heappop(leaves)
v = heapq.heappop(leaves)
edges.append((u, v))
return edges
if __name__ == "__main__":
print(prufer_decode([4, 4, 6, 3]))
# [(1, 4), (2, 4), (4, 6), (5, 3), (3, 6)]
Coding Patterns¶
- Degree array, not a full graph copy. For decoding you never need adjacency — just
degree[v] = count(v) + 1and a min-heap of current leaves. - Min-heap for "smallest leaf." Both directions repeatedly ask for the smallest degree-1 vertex; a priority queue answers that in
O(log n). - Push a vertex into the heap the moment its degree drops to 1, never before. Pushing too early breaks the smallest-leaf rule.
- Last two vertices are implicit on encode, explicit on decode. Encode stops with 2 vertices unrecorded; decode finishes by connecting the final two degree-1 vertices.
- Keep labels 1-indexed inside the algorithm, convert at the boundary if your input is 0-indexed.
Error Handling¶
n < 2: A tree on 1 vertex has an empty code, and there is no edge. A tree on 2 vertices also has an empty code (n − 2 = 0) and one implicit edge(1, 2). Guard these before the main loop.- Code values out of range: every entry must be in
1..nwheren = len(code) + 2. Reject anything else — it cannot be a valid Prüfer code. - Disconnected input to encode: the algorithm assumes a tree (connected,
n − 1edges). If you feed a forest or a graph with a cycle, results are meaningless. Validate edge count= n − 1and connectivity first if input is untrusted. - Multiple neighbors when popping a leaf: if a popped "leaf" has more than one neighbor, your degree bookkeeping is wrong — likely you pushed it too early.
def prufer_decode_safe(code):
n = len(code) + 2
if n < 2:
raise ValueError("n must be >= 2")
for x in code:
if not (1 <= x <= n):
raise ValueError(f"code value {x} out of range 1..{n}")
return prufer_decode(code)
Performance Tips¶
- Use the heap version for clarity (
O(n log n)); switch to the pointer trick (O(n), inmiddle.md) only whennis large and profiling shows the heap matters. - For random tree generation, skip encoding entirely: generate
n − 2random ints in1..nand decode. That alone is uniform. - Avoid rebuilding sets repeatedly during encode; mutate one neighbor structure in place.
- For decode, an
intdegree array beats a hash map for dense label ranges.
Best Practices¶
- Always test the round-trip
decode(encode(T)) == Ton small trees first — it catches almost every bug. - Document your indexing convention (1-indexed labels) at the top of the function.
- Sort edges before comparing two trees for equality (treat each edge as an unordered pair).
- Special-case
n = 1andn = 2explicitly and return early.
Edge Cases & Pitfalls¶
| Case | What happens | Fix |
|---|---|---|
n = 1 | Code length −1? No — define it as empty []. No edges. | Return [], no loop. |
n = 2 | Code is empty []; one implicit edge (1, 2). | Skip loop, emit final edge. |
| 0-indexed labels | Off-by-one: vertex n ignored or array overrun. | Convert to 1-indexed inside, back out. |
| Picked a non-smallest leaf | Produces a different (wrong) code; no longer a bijection. | Always use the min-heap / smallest rule. |
| Pushed vertex to heap with degree > 1 | Pops a "leaf" with several neighbors. | Push only when degree hits exactly 1. |
Common Mistakes¶
- Recording the leaf instead of its neighbor. The code stores neighbors, never the leaves themselves.
- Forgetting the final implicit edge on decode — you'll be one edge short (
n − 2instead ofn − 1). - Using "any leaf" instead of the smallest leaf. Any-leaf still gives a valid tree on decode, but it is not the canonical Prüfer code and breaks the bijection/round-trip.
- Counting degree as appearances instead of appearances + 1. Leaves appear 0 times but have degree 1.
- Mixing label bases — passing 0-indexed edges to a 1-indexed decoder.
Cheat Sheet¶
ENCODE (tree -> code, length n-2):
degree[v] = number of neighbors
push all degree-1 vertices into min-heap
repeat n-2 times:
leaf = pop smallest leaf
code.append(its one neighbor u)
degree[u]--; if degree[u]==1 push u
DECODE (code -> tree, n = len(code)+2):
degree[v] = count(v in code) + 1
push all degree-1 vertices into min-heap
for x in code:
leaf = pop smallest leaf
edge(leaf, x); degree[leaf]--; degree[x]--
if degree[x]==1 push x
connect final two degree-1 vertices
KEY FACTS:
code length = n - 2
appearances of v = degree(v) - 1
# labeled trees = n^(n-2) (Cayley)
Visual Animation¶
Open animation.html in this folder. It lets you:
- Encode a small tree step by step — watch the smallest leaf get peeled and its neighbor recorded.
- Decode a code back into the tree — watch edges appear as the heap pops leaves.
- Use Step / Run / Reset to control the pace, with a running log and the partial code/degree table shown live.
Summary¶
A Prüfer code is a length n − 2 integer sequence that uniquely encodes any labeled tree on n vertices. Encode by repeatedly peeling the smallest-labeled leaf and recording its neighbor; decode by counting appearances to recover degrees, then connecting the smallest remaining degree-1 vertex to each code entry, finishing with the last two leftover vertices. The encode/decode pair is a bijection, which directly proves Cayley's formula (n^(n−2) labeled trees) and gives a clean way to generate uniformly random trees, count trees with prescribed degrees, and store trees compactly. Mind the edge cases (n = 1, 2), the smallest-leaf rule, and 1- vs 0-indexing.
Further Reading¶
- Sibling topic 24-kirchhoff-theorem — Cayley's formula via the Matrix-Tree theorem (a determinant), a complementary proof.
- Sibling topic 10-mst-kruskal-prim — spanning trees of weighted graphs, a different lens on trees.
- Sibling section 10-heaps — the min-heap that powers fast encoding/decoding.
- The
middle.mdfile here — theO(n)pointer trick and degree-multinomial generalizations. - The
professional.mdfile here — a full proof of the bijection and the Cayley corollary.