Huffman Coding — Mathematical Foundations and Complexity Theory¶
Table of Contents¶
- Formal Definitions — Codes, Prefix-Freeness, Kraft
- The Optimality Theorem (statement)
- Proof of Optimality — Exchange Argument + Induction
- The Kraft–McMillan Inequality
- Shannon Entropy and the Source-Coding Lower Bound
- How Close Huffman Gets: H ≤ L < H + 1
- Length-Limited Codes — Package-Merge Correctness
- k-ary Huffman and the Count Condition
- Complexity Analysis
- Comparison with Arithmetic Coding, ANS, Shannon–Fano, Hu–Tucker
- Open Directions and Summary
- Reference Code (Optimality Verifier + Entropy)
1. Formal Definitions — Codes, Prefix-Freeness, Kraft¶
Let the source alphabet be S = {s₁, …, s_n} with positive weights (frequencies or probabilities) w₁, …, w_n, w_i > 0. A binary code is a map C : S → {0,1}* assigning each symbol a finite binary string C(sᵢ) of length ℓᵢ = |C(sᵢ)|.
Definitions.
Cis prefix-free (a prefix code) iff no codeword is a prefix of another: fori ≠ j,C(sᵢ)is not a prefix ofC(sⱼ).Cis uniquely decodable iff the induced map on concatenationsS* → {0,1}*is injective. Every prefix-free code is uniquely decodable (decode greedily left to right).- The weighted path length (a.k.a. cost, total code length) is If
wᵢare probabilities (Σ wᵢ = 1),L(C)is the average codeword length.
A prefix code corresponds bijectively to a binary tree with the n symbols at its leaves: the codeword C(sᵢ) is the sequence of left(0)/right(1) edges from root to leaf sᵢ, so ℓᵢ = depth(sᵢ). Thus
Optimal code. C* is optimal for (S, w) iff L(C*) ≤ L(C) for every prefix code C over S. We may restrict attention to full binary trees (every internal node has two children): a unary internal node could be contracted to shorten a codeword, never increasing cost.
2. The Optimality Theorem (statement)¶
Theorem (Huffman, 1952). The greedy algorithm — repeatedly replace the two minimum-weight symbols
x, yby a single symbolzwith weightw(z) = w(x) + w(y), recurse, then splitzback into the pair(x, y)as children — produces a prefix code of minimum weighted path lengthL. Equivalently, the Huffman tree minimizesΣᵢ wᵢ · depth(sᵢ)over all binary trees with thensymbols at the leaves.
Two structural facts drive the proof:
- (A) Greedy-choice property. There exists an optimal tree in which the two minimum-weight symbols are sibling leaves of maximum depth.
- (B) Optimal substructure. Merging
x, yintozyields an instance of sizen − 1whose optimal solution, expanded, is optimal for the size-ninstance.
3. Proof of Optimality — Exchange Argument + Induction¶
3.1 Lemma A (greedy choice)¶
Let x, y be two symbols of smallest weight (w(x) ≤ w(y) ≤ w(s) for all other s). Let T be any optimal tree.
First, in a full optimal tree the two deepest leaves are siblings. (If the deepest leaf a had no sibling leaf, its sibling subtree could be lifted, strictly reducing cost — contradicting optimality; fullness forces a sibling, and the sibling is a leaf else a is not deepest.) Call these deepest siblings b, c, at depth d_max.
Now build T' from T by swapping x ↔ b and y ↔ c. The change in cost from swapping x and b is
ΔL = (w(x)·depth(b) + w(b)·depth(x)) − (w(x)·depth(x) + w(b)·depth(b))
= (w(b) − w(x)) · (depth(x) − depth(b)).
Since w(x) ≤ w(b) (x is minimum) and depth(x) ≤ depth(b) = d_max (b is deepest), we have (w(b) − w(x)) ≥ 0 and (depth(x) − depth(b)) ≤ 0, so ΔL ≤ 0. The same holds for swapping y and c. Hence L(T') ≤ L(T); as T was optimal, L(T') = L(T) and T' is also optimal, with x, y as the deepest sibling pair. ∎
3.2 Lemma B (optimal substructure)¶
Form the reduced instance: replace x, y by a new symbol z with w(z) = w(x) + w(y), alphabet S' = (S ∖ {x,y}) ∪ {z} of size n − 1. For any tree T' over S' in which z is a leaf, expand z into an internal node with children x, y to get a tree T over S. Then
Proof of the identity: every symbol other than x, y has the same depth in T and T'. The symbol z at depth depth(z) in T' contributes w(z)·depth(z); in T, x and y sit at depth depth(z)+1, contributing (w(x)+w(y))·(depth(z)+1) = w(z)·depth(z) + w(x)+w(y). The difference is exactly w(x)+w(y), independent of T'. ∎
3.3 The induction¶
We prove the Huffman tree is optimal by induction on n.
- Base
n = 1: the single symbol gets the empty codeword (or, by convention, the 1-bit code0); trivially optimal. - Base
n = 2: codes0and1; costw₁ + w₂, clearly minimal. - Inductive step: assume Huffman is optimal for all alphabets of size
n − 1. Given sizen, letx, ybe the two minimum-weight symbols and letHbe the Huffman tree (which mergesx, yfirst). LetH'be the Huffman tree of the reduced instanceS'. By the induction hypothesisH'is optimal forS', andHisH'withzexpanded into(x, y), so by Lemma BL(H) = L(H') + w(x)+w(y).
Now take any optimal tree T for S. By Lemma A we may assume x, y are sibling deepest leaves in T; contract them to a leaf z, yielding a tree T' for S'. By Lemma B, L(T) = L(T') + w(x)+w(y). Since H' is optimal for S', L(H') ≤ L(T'), hence
So L(H) ≤ L(T) for the optimal T, forcing L(H) = L(T): Huffman is optimal. ∎
The proof is the canonical demonstration that a greedy algorithm is correct because the problem has both the greedy-choice property (Lemma A) and optimal substructure (Lemma B).
4. The Kraft–McMillan Inequality¶
Codeword lengths are constrained, independent of the actual bit patterns.
Kraft inequality. A binary prefix code with lengths
ℓ₁, …, ℓ_nexists iffMcMillan's theorem. The same inequality is necessary for any uniquely decodable code — not just prefix codes. Consequence: prefix codes lose nothing in achievable length over the broader class of uniquely decodable codes; the optimal prefix code is optimal among all uniquely decodable codes.
Sketch (Kraft, existence direction). Order lengths ℓ₁ ≤ … ≤ ℓ_n. Assign codewords greedily: the codeword of length ℓᵢ is the next available binary string that is not an extension of a previously assigned one. The count of forbidden length-ℓᵢ prefixes is Σ_{j<i} 2^(ℓᵢ − ℓⱼ) ≤ 2^{ℓᵢ} − 1 < 2^{ℓᵢ} when the Kraft sum is ≤ 1, so a free codeword always remains. (McMillan's direction bounds (Σ 2^{−ℓᵢ})^k by counting decodable strings and lets k → ∞.) ∎
Why it matters here. Optimization over prefix codes is equivalent to: minimize Σ wᵢ ℓᵢ over integer ℓᵢ ≥ 1 subject to Σ 2^{−ℓᵢ} ≤ 1. Huffman solves this integer program exactly. The continuous relaxation (allowing real ℓᵢ) is solved by ℓᵢ = −log₂ pᵢ — which is precisely the entropy, the next section.
5. Shannon Entropy and the Source-Coding Lower Bound¶
For a probability distribution p₁, …, p_n (pᵢ > 0, Σ pᵢ = 1), the Shannon entropy is
Source-coding lower bound. Every uniquely decodable binary code has average length
L = Σ pᵢ ℓᵢ ≥ H(p).
Proof. Let qᵢ = 2^{−ℓᵢ} / Z with Z = Σ 2^{−ℓⱼ} ≤ 1 (Kraft/McMillan). Then
L − H = Σ pᵢ ℓᵢ + Σ pᵢ log₂ pᵢ
= Σ pᵢ log₂( pᵢ · 2^{ℓᵢ} )
= Σ pᵢ log₂( pᵢ / qᵢ ) − log₂ Z
= D(p ‖ q) − log₂ Z
≥ 0,
D(p‖q) ≥ 0 (Gibbs' inequality) and log₂ Z ≤ 0 (since Z ≤ 1). Equality requires qᵢ = pᵢ and Z = 1, i.e. ℓᵢ = −log₂ pᵢ — achievable only when every pᵢ is a power of two. ∎ This is the hard floor: no code beats entropy. Huffman, optimal among prefix codes, comes as close as integer lengths allow.
6. How Close Huffman Gets: H ≤ L < H + 1¶
Theorem. The Huffman code
C*for distributionpsatisfies
Proof. The lower bound H ≤ L(C*) is Section 5. For the upper bound, construct the Shannon code with lengths ℓᵢ = ⌈−log₂ pᵢ⌉. These satisfy Kraft because Σ 2^{−⌈−log₂ pᵢ⌉} ≤ Σ 2^{log₂ pᵢ} = Σ pᵢ = 1, so a prefix code with these lengths exists. Its average length is
L(C*) ≤ L_Shannon < H + 1. ∎ Tightness. Both bounds are achievable in the limit. L = H exactly when all pᵢ are powers of two (e.g. p = (1/2, 1/4, 1/8, 1/8) gives L = H = 1.75). The gap approaches 1 for a near-deterministic source: with p = (1−ε, ε/(n−1), …), the dominant symbol "wants" −log₂(1−ε) → 0 bits but is forced to ≥ 1.
Closing the gap by blocking. Coding k-symbol blocks as super-symbols of a product source pᵏ gives H(pᵏ) = k·H(p), and the per-symbol Huffman length satisfies
nᵏ). Arithmetic coding achieves the same convergence without the alphabet blow-up (Section 10). 7. Length-Limited Codes — Package-Merge Correctness¶
The length-limited problem: minimize Σ wᵢ ℓᵢ subject to 1 ≤ ℓᵢ ≤ L and the Kraft constraint Σ 2^{−ℓᵢ} ≤ 1.
Coin-collector reformulation (Larmore–Hirschberg). Each symbol i may be placed at any depth d ∈ {1, …, L}; placing it at depth d is a coin of width 2^{−d} and cost wᵢ. A valid code uses a multiset of coins whose total width is exactly (n − 1) units of 1 … precisely, one must select coins so that for each symbol the chosen depths form a valid full tree; equivalently choose the cheapest set of coins of total width n − 1 (the numismatic total), with 2(n − 1) coins selected. The optimal selection is found by the package-merge sweep:
At each width level (from finest 2^{−L} up to 2^{−1}):
(1) PACKAGE adjacent pairs of the current coin list into coins of the next coarser width;
(2) MERGE those packages with the original coins available at the next level, keep sorted by cost.
Select the cheapest 2(n−1) coins overall; symbol i's length = number of selected coins it appears in.
Theorem (Larmore–Hirschberg, 1990). Package-merge returns code lengths minimizing
Σ wᵢ ℓᵢsubject tomax ℓᵢ ≤ L, inO(n · L)time andO(n)working space (with linked structures).
Why correct (sketch). The Kraft budget of 1 is partitioned into "coins" of dyadic widths; minimizing cost under a fixed total width is a matroid-like greedy on sorted coins, and the package step exactly enforces that a node at depth d can only exist if its two children at depth d+1 were "paid for." The selection of the cheapest 2(n−1) coins corresponds to the cheapest full binary tree of depth ≤ L. When L ≥ ⌈log₂ n⌉ and large enough, the result coincides with ordinary Huffman. ∎
The price of the cap: L(C_limited) ≥ L(C_Huffman), with the excess → 0 as L → n − 1.
8. k-ary Huffman and the Count Condition¶
For an output alphabet of size k (a k-ary prefix code, leaves of a k-ary tree), each merge combines the k smallest weights. A full k-ary tree with m leaves satisfies m ≡ 1 (mod k − 1) (each internal node converts k nodes into 1, removing k − 1). Therefore the number of merges that exactly consume the alphabet requires
If not, pad with ((k−1) − ((n−1) mod (k−1))) dummy symbols of weight 0; these sink to the deepest level and cost nothing. The greedy "merge the k smallest" is then optimal by the same exchange + substructure argument generalized to k children: the k smallest weights can be made sibling deepest leaves, and merging them reduces the instance by k − 1. Binary Huffman is k = 2, where the condition (n−1) ≡ 0 (mod 1) is vacuous — no padding ever needed.
9. Complexity Analysis¶
| Stage | Time | Space | Justification |
|---|---|---|---|
| Count frequencies | Θ(N) | Θ(n) | Single pass over N input symbols, n distinct. |
| Build min-heap | Θ(n) | Θ(n) | Bottom-up heapify of n leaves. |
n − 1 merges | Θ(n log n) | Θ(n) | Each merge: two O(log n) extract-mins + one O(log n) insert. |
| Derive lengths/codes | Θ(n) | Θ(n) | One tree traversal. |
| Canonical assignment | Θ(n log n) | Θ(n) | Sort by (length, symbol). |
| Construction total | Θ(n log n) | Θ(n) | Heap merges dominate. |
| Length-limited (package-merge) | Θ(n·L) | Θ(n) | L levels, O(n) coins each. |
| Encode message | Θ(N) (output Θ(Σ wᵢ ℓᵢ) bits) | Θ(output) | Concatenate codewords. |
| Decode (tree walk) | Θ(output bits) | Θ(1) | One step per bit. |
Decode (table, MAX_BITS table) | Θ(#symbols) | Θ(2^{MAX_BITS}) | One table read per symbol. |
Sorted-input refinement. If the n weights are already sorted, Huffman runs in Θ(n) using two queues (one of leaves in sorted order, one of merged nodes, which is automatically sorted because merged weights are non-decreasing): each step takes the two smallest fronts of the two queues — no heap, no log n. This is the optimal Θ(n) construction when input is pre-sorted, and Θ(n log n) collapses to the sort.
Lower bound. Building the optimal code requires distinguishing relative orders of weights; in the comparison model, Ω(n log n) is necessary for unsorted input (reduction from sorting), so Θ(n log n) is optimal there, and Θ(n) is optimal for sorted input.
9.1 Cache behavior and constant factors¶
The merge loop touches O(n) heap nodes with near-random access (heap reorders break locality), but for n ≤ 256 the entire working set fits in L1, so the build is bound by branch prediction in the Less comparator, not memory. The two costly real-data passes — encode and decode — are the cache story that matters:
- Encode reads the input sequentially (streaming, prefetcher-friendly) and indexes a small code table (256 entries → fits L1). It is essentially memory-bandwidth bound.
- Decode (tree walk) chases child pointers — pointer-chasing with poor locality and a data-dependent branch per bit; this is the slow path.
- Decode (table) reads one entry from a
2^{MAX_BITS}table per symbol. WithMAX_BITS ≤ 15the table is ≤ 64–128 KB (L2); multi-level tables keep the hot root table in L1. Branch-light and cache-friendly — an order of magnitude faster than the walk.
9.2 Average-case length under a random source¶
For a memoryless source with distribution p, the expected Huffman codeword length equals the weighted path length of the constructed tree and satisfies H(p) ≤ E[ℓ] < H(p) + 1 (Section 6). A sharper bound by Gallager (1978) gives E[ℓ] ≤ H(p) + p_max + 0.086 where p_max is the largest symbol probability — so when no symbol dominates, Huffman is provably within a fraction of a bit of entropy, and the worst-case full-bit slack only appears for near-deterministic sources.
9.3 Space–time trade-offs¶
| Strategy | Time | Space | Trade-off |
|---|---|---|---|
| Tree-walk decode | Θ(bits), slow constant | Θ(n) | Minimal memory, poor throughput. |
| Single-level table | Θ(symbols) | Θ(2^{MAX_BITS}) | Fast; large table for big MAX_BITS. |
| Multi-level table | Θ(symbols), 1–2 reads | Θ(2^{root} + Σ sub) | Small hot set; slight indirection for long codes. |
| Canonical (lengths only) | rebuild Θ(n log n) | Θ(n) transmit | Tiny header; rebuild cost amortized over the block. |
Block (k-grams) | Θ(n^k log n^k) build | Θ(n^k) table | Approaches entropy; alphabet explodes. |
10. Comparison with Arithmetic Coding and ANS¶
| Property | Huffman | Arithmetic coding | ANS (tANS/rANS) |
|---|---|---|---|
Avg length vs H | [H, H+1) | ≤ H + ε (ε ~ a couple bits total) | ≈ H |
| Codeword granularity | integer bits | fractional (effective) | fractional (effective) |
| Per-symbol decode | table read | multiply/divide + renorm | table read / shift |
| Build cost | Θ(n log n) | model only | Θ(n) table |
| Strength | simple, fast, optimal prefix code | exact, handles any pᵢ | arithmetic-quality at table speed |
| Used in | DEFLATE, JPEG, MP3 | JPEG2000, CABAC (H.264/265) | Zstandard, LZFSE, modern codecs |
The fundamental gap: Huffman is optimal among integer-length prefix codes, but the true optimum is fractional (−log₂ pᵢ). Arithmetic coding and ANS realize fractional lengths by encoding the whole message as a single number (arithmetic) or via state transitions on a normalized frequency table (ANS), eliminating the < H + 1 slack at the cost of a state machine instead of a static table. Huffman remains the default where the slack is acceptable and patent-free simplicity is valued.
10.1 Why Huffman beats Shannon–Fano¶
Shannon–Fano coding (1948–49) splits the sorted symbol set top-down into two groups of as-equal-as-possible probability, recursing. It is near-optimal but not guaranteed minimal: the top-down split can mis-assign a length by a bit. Example distribution p = (0.35, 0.17, 0.17, 0.16, 0.15) — Shannon–Fano produces a code strictly longer than Huffman's. Huffman's bottom-up merging of the two least-likely symbols is what makes it provably optimal (Section 3), and is the reason it superseded Shannon–Fano universally. The lesson: greedy from the bottom (combine cheapest) is optimal here, while greedy from the top (split balanced) is not.
10.2 Optimal alphabetic codes (Hu–Tucker)¶
A plain Huffman code may permute symbols at the leaves arbitrarily. Some applications (searchable codes, order-preserving compression) require the leaves to appear in symbol order left to right — an alphabetic prefix code. The minimum-cost alphabetic code is not produced by Huffman; the Hu–Tucker algorithm (1971) solves it in Θ(n log n) by a more intricate "compatible merge" rule that never crosses the symbol order, then re-levels. Its cost is at most that of Huffman plus a small additive term, and equals Huffman only when the optimal Huffman tree happens to be order-preserving.
11. Open Directions and Summary¶
11.1 Advanced / related directions¶
- Optimal alphabetic codes (Hu–Tucker) — minimum-cost prefix codes that additionally preserve symbol order at the leaves (for searchable codes);
O(n log n)and not solved by plain Huffman. - Optimal codes with unequal letter costs (Varn / Karp) — when output symbols cost different amounts (e.g. Morse dot vs dash); the greedy merge no longer suffices, and it becomes an integer program / LP.
- Adaptive optimality — FGK and Vitter's algorithms maintain an optimal code online under the sibling property, with provable bounds on extra bits vs the static optimum.
- Context modeling — Huffman codes a memoryless model; PPM, CM, and CABAC exploit context, where the model, not the coder, captures the redundancy.
11.2 Historical and complexity placement¶
Huffman (1952) gave the first provably optimal prefix code, superseding Shannon–Fano (which is near-optimal but not guaranteed minimal). Kraft (1949) and McMillan (1956) pinned the length constraint; Shannon (1948) set the entropy floor. Larmore–Hirschberg (1990) solved the length-limited case; Vitter (1987) the adaptive case; Duda (2009) introduced ANS, which now displaces Huffman in many modern codecs for its near-entropy performance at comparable speed.
11.3 Summary¶
Huffman coding is the integer-length-optimal prefix code, proven optimal by an exchange argument (the two lightest symbols can be made deepest siblings) plus optimal substructure (merge then recurse), the textbook greedy correctness pair. The Kraft–McMillan inequality characterizes achievable lengths and shows prefix codes lose nothing to uniquely decodable codes; Shannon entropy H = −Σ pᵢ log₂ pᵢ is the unbeatable floor, and Huffman lands in the tight band H ≤ L < H + 1, with the up-to-1-bit slack arising from integer rounding of the ideal length −log₂ pᵢ. Construction is Θ(n log n) (or Θ(n) on sorted input via two queues), length-limited construction is Θ(n·L) via package-merge, and the residual sub-bit slack is what arithmetic coding and ANS exist to remove.
12. Reference Code (Optimality Verifier + Entropy)¶
A verifier that (1) builds the Huffman code, (2) computes L, H, and asserts H ≤ L < H + 1, (3) checks the Kraft equality Σ 2^{−ℓ} = 1 for a complete code, and (4) brute-forces optimality on tiny alphabets by trying all length vectors satisfying Kraft. All three print the same verdicts.
Go¶
package main
import (
"container/heap"
"fmt"
"math"
"sort"
)
type nd struct {
w, ord int
leaf bool
id int
left, right *nd
}
type pq []*nd
func (p pq) Len() int { return len(p) }
func (p pq) Less(i, j int) bool { if p[i].w != p[j].w { return p[i].w < p[j].w }; return p[i].ord < p[j].ord }
func (p pq) Swap(i, j int) { p[i], p[j] = p[j], p[i] }
func (p *pq) Push(x any) { *p = append(*p, x.(*nd)) }
func (p *pq) Pop() any { o := *p; n := len(o); x := o[n-1]; *p = o[:n-1]; return x }
func lengths(w []int) []int {
n := len(w)
if n == 1 {
return []int{1}
}
h := &pq{}
for i, wi := range w {
heap.Push(h, &nd{w: wi, ord: i, leaf: true, id: i})
}
ord := n
for h.Len() > 1 {
a := heap.Pop(h).(*nd)
b := heap.Pop(h).(*nd)
heap.Push(h, &nd{w: a.w + b.w, ord: ord, left: a, right: b})
ord++
}
L := make([]int, n)
var walk func(x *nd, d int)
walk = func(x *nd, d int) {
if x.leaf {
L[x.id] = d
return
}
walk(x.left, d+1)
walk(x.right, d+1)
}
walk(heap.Pop(h).(*nd), 0)
return L
}
func main() {
w := []int{5, 2, 2, 1, 1} // abracadabra weights
total := 0
for _, x := range w {
total += x
}
L := lengths(w)
avg, kraft, H := 0.0, 0.0, 0.0
for i, li := range L {
p := float64(w[i]) / float64(total)
avg += p * float64(li)
kraft += math.Pow(2, -float64(li))
H += -p * math.Log2(p)
}
fmt.Printf("lengths = %v\n", L)
fmt.Printf("H = %.4f, L = %.4f, H<=L<H+1: %v\n", H, avg, H <= avg+1e-9 && avg < H+1)
fmt.Printf("Kraft sum = %.4f (== 1 for complete code: %v)\n", kraft, math.Abs(kraft-1) < 1e-9)
// Brute force optimality on this tiny alphabet.
best := bruteOptimal(w)
fmt.Printf("Huffman L*total = %d, brute min = %d, optimal: %v\n",
int(avg*float64(total)+0.5), best, int(avg*float64(total)+0.5) == best)
_ = sort.Ints
}
func bruteOptimal(w []int) int {
n := len(w)
best := math.MaxInt32
maxLen := n // codewords never need more than n-1 bits
combo := make([]int, n)
var rec func(i int)
rec = func(i int) {
if i == n {
k := 0.0
cost := 0
for j := 0; j < n; j++ {
k += math.Pow(2, -float64(combo[j]))
cost += w[j] * combo[j]
}
if k <= 1+1e-9 && cost < best {
best = cost
}
return
}
for l := 1; l <= maxLen; l++ {
combo[i] = l
rec(i + 1)
}
}
rec(0)
return best
}
Java¶
import java.util.*;
public class Verifier {
static class N { int w, ord, id; boolean leaf; N left, right;
N(int w, int o, int id, boolean l) { this.w=w; ord=o; this.id=id; leaf=l; } }
static int[] lengths(int[] w) {
int n = w.length;
if (n == 1) return new int[]{1};
PriorityQueue<N> h = new PriorityQueue<>(
(a, b) -> a.w != b.w ? a.w - b.w : a.ord - b.ord);
for (int i = 0; i < n; i++) h.add(new N(w[i], i, i, true));
int ord = n;
while (h.size() > 1) {
N a = h.poll(), b = h.poll();
N m = new N(a.w + b.w, ord++, -1, false); m.left = a; m.right = b; h.add(m);
}
int[] L = new int[n];
walk(h.poll(), 0, L);
return L;
}
static void walk(N x, int d, int[] L) {
if (x.leaf) { L[x.id] = d; return; }
walk(x.left, d + 1, L); walk(x.right, d + 1, L);
}
static int best;
static int bruteOptimal(int[] w) {
best = Integer.MAX_VALUE;
rec(w, new int[w.length], 0);
return best;
}
static void rec(int[] w, int[] combo, int i) {
int n = w.length;
if (i == n) {
double k = 0; int cost = 0;
for (int j = 0; j < n; j++) { k += Math.pow(2, -combo[j]); cost += w[j]*combo[j]; }
if (k <= 1 + 1e-9 && cost < best) best = cost;
return;
}
for (int l = 1; l <= n; l++) { combo[i] = l; rec(w, combo, i + 1); }
}
public static void main(String[] args) {
int[] w = {5, 2, 2, 1, 1};
int total = Arrays.stream(w).sum();
int[] L = lengths(w);
double avg = 0, kraft = 0, H = 0;
for (int i = 0; i < w.length; i++) {
double p = (double) w[i] / total;
avg += p * L[i]; kraft += Math.pow(2, -L[i]); H += -p * (Math.log(p) / Math.log(2));
}
System.out.println("lengths = " + Arrays.toString(L));
System.out.printf("H = %.4f, L = %.4f, H<=L<H+1: %b%n", H, avg, H <= avg + 1e-9 && avg < H + 1);
System.out.printf("Kraft sum = %.4f (==1: %b)%n", kraft, Math.abs(kraft - 1) < 1e-9);
int huff = (int) Math.round(avg * total);
System.out.printf("Huffman cost = %d, brute = %d, optimal: %b%n",
huff, bruteOptimal(w), huff == bruteOptimal(w));
}
}
Python¶
import heapq
import math
from itertools import count, product
def lengths(w):
n = len(w)
if n == 1:
return [1]
tie = count()
heap = [[w[i], next(tie), i, None, None] for i in range(n)]
heapq.heapify(heap)
while len(heap) > 1:
a = heapq.heappop(heap)
b = heapq.heappop(heap)
heapq.heappush(heap, [a[0] + b[0], next(tie), -1, a, b])
L = [0] * n
def walk(node, d):
_, _, idx, left, right = node
if idx != -1:
L[idx] = d
else:
walk(left, d + 1); walk(right, d + 1)
walk(heap[0], 0)
return L
def brute_optimal(w):
n = len(w)
best = math.inf
for combo in product(range(1, n + 1), repeat=n):
if sum(2 ** -l for l in combo) <= 1 + 1e-9:
cost = sum(wi * l for wi, l in zip(w, combo))
best = min(best, cost)
return best
if __name__ == "__main__":
w = [5, 2, 2, 1, 1]
total = sum(w)
L = lengths(w)
avg = sum((wi / total) * li for wi, li in zip(w, L))
kraft = sum(2 ** -li for li in L)
H = -sum((wi / total) * math.log2(wi / total) for wi in w)
print("lengths =", L)
print(f"H = {H:.4f}, L = {avg:.4f}, H<=L<H+1: {H <= avg + 1e-9 < H + 1 + 1e-9}")
print(f"Kraft sum = {kraft:.4f} (== 1: {abs(kraft - 1) < 1e-9})")
huff = round(avg * total)
print(f"Huffman cost = {huff}, brute = {brute_optimal(w)}, optimal: {huff == brute_optimal(w)}")
What it does: builds the Huffman lengths, reports H, L, the Kraft sum, and brute-forces the optimal integer-length code under the Kraft constraint to prove Huffman matched the minimum on a small alphabet. Run: go run main.go | javac Verifier.java && java Verifier | python verifier.py
12.1 Implementation notes¶
- The brute-force optimality check is exponential (
n^nlength vectors) — use it only as a tiny-alphabet regression oracle, exactly as the spanning-tree brute force served Kirchhoff. - The Kraft sum equals
1for a complete Huffman code (every internal node has two children); a sum< 1signals a non-full tree (a bug, since Huffman trees are always full). - Entropy
Huses probabilitieswᵢ / Σwⱼ; assertH ≤ L < H + 1as a correctness invariant in tests. - For sorted weights, replace the heap with the two-queue method to reach
Θ(n)and confirm identical lengths.