Bitmask DP (DP over Subsets) — Mathematical Foundations and Complexity Theory¶
Table of Contents¶
- Formal Definitions
- The Held-Karp Recurrence (Correctness Proof)
- Complexity of Held-Karp: O(2^n · n^2)
- The Subset-Sum Identity Σ C(n,k)·k = n·2^{n−1}
- The Submask Sum: Σ 2^{popcount(mask)} = 3^n
- The Assignment-Problem DP and Its Dimension Collapse
- Counting and Inclusion-Exclusion
- Lower-Bound Context: TSP NP-Hardness and the Held-Karp Conjecture
- Sum-over-Subsets (SOS) Transform and Möbius Inversion
- Profile DP as a Transfer Matrix
- Summary
1. Formal Definitions¶
Let [n] = {0, 1, …, n−1} be the ground set of n items. Subsets of [n] are identified with integers in [0, 2^n) via the characteristic vector: integer S represents the subset {i : bit i of S is 1}.
Definition 1.1 (Bitmask / characteristic integer). For S ⊆ [n], its bitmask is χ(S) = Σ_{i∈S} 2^i ∈ [0, 2^n). The map χ is a bijection between subsets of [n] and [0, 2^n). Set operations correspond to bit operations: union ↔ |, intersection ↔ &, symmetric difference ↔ ^, membership of i ↔ S & (1<<i) ≠ 0, complement (within [n]) ↔ S ^ ((1<<n)−1).
Definition 1.2 (Popcount). popcount(S) = |S|, the number of set bits, equivalently the cardinality of the subset.
Definition 1.3 (Submask). T is a submask of S, written T ⊑ S, iff T & S = T, equivalently T ⊆ S as sets. Each S has exactly 2^{popcount(S)} submasks.
Definition 1.4 (TSP instance). A complete directed graph on [n] with cost matrix dist : [n]×[n] → ℝ_{≥0} ∪ {∞}. A tour is a cyclic permutation visiting every vertex exactly once; its cost is the sum of consecutive dist entries including the closing edge. The TSP asks for a minimum-cost tour.
Definition 1.5 (Held-Karp state). For S ⊆ [n] with 0 ∈ S and last ∈ S, define
dp[S][last] = minimum cost of a path that starts at vertex 0,
visits exactly the vertices in S, and ends at `last`.
min_{last≠0} dp[[n]][last] + dist[last][0]. Remark (walks vs simple paths). Held-Karp computes shortest simple paths (each vertex once) by encoding the visited set S in the state. This is exactly the state the polynomial walk-counting recurrence (sibling topic in 11-graphs) lacks — and it is why TSP is exponential while walk counting is polynomial: the visited-set state has 2^n values.
Notation conventions. Throughout, n = |V| is the ground-set size (mask width), S ⊆ [n] a subset with bitmask χ(S), popcount(S) = |S|, T ⊑ S the submask relation, [P] the Iverson bracket (1 if P holds else 0), p a prime modulus, and ⊎ a disjoint union. We freely identify a subset with its characteristic integer and write mask for either. "Path" means simple path (no repeated vertex) unless stated; "walk" (used only in cross-references) permits repeats. The semiring is (min, +) for optimisation, (+, ×) for counting, and (∨, ∧) for feasibility, with identities (∞, 0), (0, 1), (false, true) respectively — the same identity-per-semiring discipline as the matrix-power topic in 11-graphs/32-paths-fixed-length.
2. The Held-Karp Recurrence (Correctness Proof)¶
Theorem 2.1. For all S ⊆ [n] with 0 ∈ S and last ∈ S,
dp[{0}][0] = 0,
dp[S][last] = min_{prev ∈ S∖{last}, prev≠last} ( dp[S∖{last}][prev] + dist[prev][last] ) for |S| ≥ 2.
0 visiting exactly S and ending at last. Moreover the optimal tour cost is min_{last∈[n]∖{0}} ( dp[[n]][last] + dist[last][0] ). Proof (induction on |S|).
Base case |S| = 1. The only such S with 0 ∈ S is S = {0}, last = 0. The unique path visiting exactly {0} and ending at 0 is the trivial one-vertex path of cost 0. Thus dp[{0}][0] = 0, as defined.
Inductive step. Assume the claim for all sets of size m−1; take |S| = m ≥ 2 and last ∈ S. Any simple path P from 0 visiting exactly S and ending at last has length m−1 edges and visits m distinct vertices. Let prev be the vertex immediately before last on P. Then prev ∈ S∖{last}, and deleting the final edge (prev, last) leaves a simple path P' from 0 visiting exactly S∖{last} and ending at prev. Conversely, any simple path from 0 visiting S∖{last} ending at some prev ∈ S∖{last}, extended by the edge (prev, last), is a simple path visiting exactly S ending at last (it remains simple because last ∉ S∖{last}). This is a bijection between
{ simple paths 0 ⇝ last visiting exactly S }
≅ ⊎_{prev ∈ S∖{last}} { simple paths 0 ⇝ prev visiting exactly S∖{last} } × {edge (prev,last)}.
cost(P) = cost(P') + dist[prev][last]. Minimising over the (disjoint) choices of prev and, within each, over the optimal P' (given by the inductive hypothesis as dp[S∖{last}][prev]), yields which is the recurrence. The disjointness holds because prev is uniquely determined by P (it is the penultimate vertex), so no path is counted under two different prev. Tour. A tour is a path from 0 visiting all of [n] and ending at some last ≠ 0, plus the closing edge (last, 0). Its cost is dp[[n]][last] + dist[last][0]; minimising over last gives the optimum. ∎
Corollary 2.2 (Dependency order). dp[S][·] depends only on dp[S∖{last}][·], and χ(S∖{last}) < χ(S) (clearing a bit strictly decreases the integer). Hence processing masks in increasing numeric order guarantees every right-hand-side state is already computed — the iteration order is correct without any topological sort.
Proof of Corollary 2.2. Clearing bit last from S removes the term 2^{last} > 0 from the integer χ(S), so χ(S∖{last}) = χ(S) − 2^{last} < χ(S). The recurrence reads only dp[S∖{last}][·], whose mask index is strictly smaller. An ascending loop for mask = 0 .. 2^n−1 therefore computes every dependency before its dependent. This replaces an explicit topological sort of the 2^n-vertex dependency DAG with a trivial integer loop — a structural gift of the bitmask encoding: the integer order on masks is a linear extension of the subset-inclusion partial order. ∎
Worked dependency-order example. For n = 3, the masks containing city 0 in ascending order are 001, 011, 101, 111. When the loop reaches 011 = {0,1}, its only dependency 001 = {0} (clearing bit 1) has already been processed (1 < 3). When it reaches 111 = {0,1,2}, its dependencies 011 and 101 (each clearing one bit) are both < 7 and already computed. No state is ever read before it is written — the ascending integer loop is a valid evaluation order. This is why bitmask DP needs no explicit topological sort even though the state-dependency graph has 2^n vertices: subset inclusion T ⊂ S ⟹ χ(T) < χ(S) makes integer order a topological order automatically.
Corollary 2.3 (Path vs tour). Replacing the final step with min_{last} dp[[n]][last] (no closing edge) gives the shortest Hamiltonian path from 0; allowing any start by initialising dp[{s}][s] = 0 for all s gives the shortest Hamiltonian path over all start vertices.
2.1 Worked Verification¶
Take the symmetric instance on [3] = {0,1,2} with dist = [[0,10,15],[10,0,20],[15,20,0]]. Trace the recurrence:
Base: dp[{0}][0] = 0.
Size 2: dp[{0,1}][1] = dp[{0}][0] + dist[0][1] = 0 + 10 = 10.
dp[{0,2}][2] = dp[{0}][0] + dist[0][2] = 0 + 15 = 15.
Size 3: dp[{0,1,2}][1] = dp[{0,2}][2] + dist[2][1] = 15 + 20 = 35.
dp[{0,1,2}][2] = dp[{0,1}][1] + dist[1][2] = 10 + 20 = 30.
Close: via last=1: 35 + dist[1][0] = 35 + 10 = 45.
via last=2: 30 + dist[2][0] = 30 + 15 = 45.
Optimum = 45.
Both tours 0→1→2→0 and 0→2→1→0 cost 45, matching the brute-force enumeration of all (3−1)! = 2 tours. The DP reused dp[{0,2}][2] and dp[{0,1}][1] rather than re-summing whole tours — the source of the speedup.
2.2 The Bijection, Made Explicit¶
The inductive step rests on a disjoint-union bijection over the penultimate vertex prev:
{ simple paths 0 ⇝ last visiting exactly S }
≅ ⊎_{prev ∈ S∖{last}} ( { simple paths 0 ⇝ prev visiting exactly S∖{last} } × { edge (prev,last) } ).
Disjointness: prev is the uniquely determined penultimate vertex of the path, so no path appears under two values of prev. Surjectivity: deleting the last edge of any qualifying path yields a path on the left of one summand. Injectivity: the (prefix, last edge) pair reconstructs the path uniquely. Taking minimum costs (rather than counting cardinalities) and using that + dist[prev][last] is monotone in the prefix cost gives the min-recurrence. The same bijection, with cardinalities instead of costs, gives the counting recurrence (Section 7) — one combinatorial fact, two semiring readings.
3. Complexity of Held-Karp: O(2^n · n^2)¶
Theorem 3.1. The Held-Karp DP runs in Θ(2^n · n^2) time and Θ(2^n · n) space.
Proof. The table has one entry per (S, last) pair with last ∈ S. The number of such pairs is
Θ(2^n · n) cells — the space bound. Each cell dp[S][last] is computed by minimising over prev ∈ S∖{last}, at most n−1 candidates, i.e. O(n) work per cell. Total time Σ_{(S,last)} O(n) = O(n · 2^{n−1} · n) = O(2^n · n^2). The lower bound Ω(2^n · n^2) follows because the dense transition examines Θ(n) predecessors for each of Θ(2^n · n) cells when the graph is complete. ∎ Comparison to brute force. Enumerating all (n−1)! tours costs Θ((n−1)! · n). The ratio (n−1)! / (2^n n^2) grows super-exponentially; e.g. at n = 15, brute force is ~1.2×10^{12} operations versus Held-Karp's ~7.4×10^6 — a factor of ~160 000. Held-Karp converts a factorial into an exponential, which is the entire point.
Worked complexity at small n. The exact table-size identity n·2^{n−1} and transition count n(n−1)2^{n−2} give concrete numbers:
n | cells n·2^{n−1} | push transitions n(n−1)2^{n−2} | (n−1)! tours |
|---|---|---|---|
| 5 | 80 | 240 | 24 |
| 10 | 5 120 | 46 080 | 362 880 |
| 15 | 245 760 | 3 440 640 | ~8.7×10^{10} |
| 18 | 2 359 296 | ~40 M | ~3.6×10^{14} |
| 20 | 10 485 760 | ~199 M | ~1.2×10^{17} |
The push-transition column is the true operation count; it crosses below the factorial column around n = 7 and then the gap widens without bound. Past n = 20 the cells column (memory) is what fails first.
3.1 Pull and Push Are the Same Bound¶
The pull form computes dp[S][last] by reading all dp[S∖{last}][prev]; the push form, from each dp[S][last], writes all dp[S∪{next}][next]. They visit the same set of (smaller-set, larger-set, transition-vertex) triples in different orders. Both are Θ(2^n n^2) by the identities above (n·2^{n−1} cells × Θ(n) work, or n(n−1)2^{n−2} triples directly). The choice is purely about memory-access locality and whether you initialise targets to ∞ (push) or fold predecessors into a fresh cell (pull); the asymptotics are identical. Implementations usually prefer push for TSP (it skips dead source states cheaply) and pull for assignment (the single target cell dp[mask] is written once).
3.2 Optimality Within the State Model¶
No algorithm using the (visited set, current vertex) state can beat Θ(2^n n) cells, because all n·2^{n−1} such cells are reachable and distinct for a complete graph: each corresponds to a genuinely different (set, endpoint) pair with its own optimal cost. Within this state model Θ(2^n n^2) time is therefore essentially optimal; improvements (Björklund's O^*(1.657^n)) come only from changing the state model (algebraic / determinant encodings), not from a smarter fill of the Held-Karp table.
Remark (the push formulation). The implementation in the junior file uses a push: for each (mask, last) it iterates over unvisited next and updates dp[mask|(1<<next)][next]. This visits each (mask, last, next) triple once; the triple count is Σ_S popcount(S)·(n − popcount(S)), still O(2^n n^2), matching the pull bound.
Exact triple count. The push transition count is
UsingΣ_k C(n,k) k (n−k) = n(n−1) 2^{n−2} (differentiate (1+x)^n twice, or count ordered triples (i, j, S) with i ∈ S, j ∉ S: n choices for i, n−1 for j ≠ i, 2^{n−2} for the rest of S), the push form does exactly n(n−1)2^{n−2} = Θ(2^n n^2) constant-work steps. The constant 1/4 (from 2^{n−2} vs 2^n) is why a tuned implementation has a smaller hidden constant than the naive 2^n n^2 bound suggests. 4. The Subset-Sum Identity Σ C(n,k)·k = n·2^{n−1}¶
This identity gives the exact size of the Held-Karp table (Σ_S popcount(S)), so it is worth a clean derivation.
Theorem 4.1. Σ_{S ⊆ [n]} popcount(S) = Σ_{k=0}^{n} C(n,k)·k = n · 2^{n−1}.
Proof 1 (double counting). Count pairs (i, S) with i ∈ S ⊆ [n]. Summing over S first gives Σ_S |S| = Σ_S popcount(S). Summing over i first: fix item i; the subsets containing i are determined by the other n−1 items freely, so there are 2^{n−1} of them. With n choices of i, the total is n · 2^{n−1}. The two counts are equal. ∎
Proof 2 (binomial / calculus). Differentiate (1+x)^n = Σ_k C(n,k) x^k to get n(1+x)^{n−1} = Σ_k C(n,k) k x^{k−1}; set x = 1: n·2^{n−1} = Σ_k C(n,k) k. ∎
Proof 3 (symmetry). Pair each subset S with its complement [n]∖S; the two have sizes summing to n. Averaging, the mean size over all 2^n subsets is n/2, so the total is 2^n · n/2 = n·2^{n−1}. ∎
Consequence. The Held-Karp table has exactly n·2^{n−1} reachable (S, last) cells (those with last ∈ S), confirming the Θ(2^n n) space and feeding the Θ(2^n n^2) time bound of Theorem 3.1. The same identity bounds the work of any "for each subset, for each element of the subset" loop at Θ(n 2^n).
Worked check at n = 3. Σ_S popcount(S) over the 8 subsets of {0,1,2}:
∅:0, {0}:1, {1}:1, {2}:1, {0,1}:2, {0,2}:2, {1,2}:2, {0,1,2}:3
sum = 0+1+1+1+2+2+2+3 = 12 = 3·2^{3−1} = 3·4. ✓
2^{3−1} = 4 subsets, so 3·4 = 12. Both views agree, anchoring Theorem 4.1 concretely. The same number, 12, is the count of reachable (S, last) Held-Karp cells for n = 3 (each subset contributes one cell per element it contains), which is why a careful implementation allocates exactly n·2^{n−1} meaningful cells, not the full n·2^n — half the array is the unreachable last ∉ S region and can be skipped with the mask & (1<<last) guard. 5. The Submask Sum: Σ 2^{popcount(mask)} = 3^n¶
The defining cost of partition / set-cover DPs is the total work of enumerating, for every mask, all of its submasks.
Theorem 5.1. Σ_{S ⊆ [n]} 2^{popcount(S)} = Σ_{S} |{T : T ⊑ S}| = 3^n.
Proof 1 (three states per item). The left side counts pairs (S, T) with T ⊑ S ⊆ [n]. For each item i ∈ [n], exactly one of three mutually exclusive cases holds in the pair (S, T): - i ∉ S (hence i ∉ T), - i ∈ S but i ∉ T, - i ∈ S and i ∈ T. (The fourth combination i ∈ T, i ∉ S is forbidden by T ⊑ S.) The n items choose independently among 3 states, so there are 3^n pairs. ∎
Proof 2 (binomial). Group masks by popcount: there are C(n,k) masks of size k, each with 2^k submasks. Hence
x = 2. ∎ Corollary 5.2. The submask-enumeration DP dp[S] = best over T ⊑ S of f(combine(value(T), dp[S∖T])), with O(1) work per (S, T) pair, runs in Θ(3^n) time. The canonical inner loop for (T = S; T; T = (T-1)&S) visits each nonempty submask once: correctness because (T−1)&S is the largest submask of S strictly below T, so the sequence is strictly decreasing and hits every submask exactly once.
Quantitative gap. 3^n / 2^n = 1.5^n grows without bound, so the submask DP is meaningfully more expensive than a per-mask loop: 3^{18} ≈ 3.9×10^8 (feasible), 3^{20} ≈ 3.5×10^9 (borderline), 3^{22} ≈ 3.1×10^{10} (too slow). The exponent base 3 (not 4) is precisely the content of Theorem 5.1 — a careless analyst who bounds "all masks × all masks" gets 4^n and a wrong ceiling.
5.1 Correctness of the Submask Idiom¶
Claim. Starting from T = S and repeating T ← (T − 1) & S until T = 0 visits every submask of S exactly once, in strictly decreasing order.
Proof. Let T be a current submask with T ⊑ S, T > 0. The integer T − 1 flips T's lowest set bit to 0 and sets all lower bits to 1. Intersecting with S (& S) keeps only the bits that belong to S, producing the largest integer T' with T' ⊑ S and T' < T. Indeed, any submask U ⊑ S with U < T satisfies U ≤ T − 1 and U ⊑ S, so U = U & S ≤ (T−1) & S = T'; thus T' is the immediate predecessor among submasks. The sequence therefore enumerates submasks strictly downward with no gaps, terminating at 0 (handled after the loop if the empty submask is needed). Each submask appears once because the sequence is strictly decreasing. ∎
5.2 Worked Submask Sum at n = 2¶
For n = 2 the (mask, submask) pairs T ⊑ S are:
S=00: T ∈ {00} (1 pair)
S=01: T ∈ {01, 00} (2 pairs)
S=10: T ∈ {10, 00} (2 pairs)
S=11: T ∈ {11, 10, 01, 00} (4 pairs)
total = 1 + 2 + 2 + 4 = 9 = 3^2. ✓
3 × 3 = 9, confirming Theorem 5.1 concretely. 5.3 The Partition DP Cost in Practice¶
A canonical O(3^n) use is the set-partition DP dp[S] = min over T ⊑ S of (cost(T) + dp[S∖T]). The total work is Σ_S 2^{popcount(S)} = 3^n submask visits, provided cost(T) is precomputed in O(2^n) beforehand. A subtle but common mistake is computing cost(T) inside the submask loop in O(|T|), which inflates the bound to Σ_S Σ_{T⊑S} popcount(T). By an analogous three-state count (now weighting the "in-both" state by its contribution), this sum is n·3^{n−1} — a factor n worse. The fix is to precompute all cost(T) once (O(2^n) or O(2^n n) depending on cost), keeping the partition DP at a clean O(3^n). The discipline mirrors the assignment-DP lesson: hoist any per-mask quantity out of the inner loop.
Derivation of Σ_S Σ_{T⊑S} popcount(T) = n·3^{n−1}. Count triples (i, T, S) with i ∈ T ⊑ S. Fix i: it is in both T and S; each other item independently chooses among the three states, giving 3^{n−1}. With n choices of i, the total is n·3^{n−1}. This is the 3^n-analogue of the n·2^{n−1} identity of Theorem 4.1, and it quantifies exactly the penalty of not hoisting cost(T).
6. The Assignment-Problem DP and Its Dimension Collapse¶
Setup. n workers, n jobs, cost c[i][j] for worker i on job j; find a minimum-cost perfect matching. Define
A[mask] = minimum cost of assigning workers 0..(popcount(mask)−1)
to exactly the set of jobs in `mask` (one worker per job).
Theorem 6.1. A[0] = 0 and for mask ≠ 0, with i = popcount(mask) − 1,
A[(1<<n)−1] is the optimal assignment cost. Proof. A matching of workers 0..i to the jobs in mask (where |mask| = i+1) assigns worker i to some job j ∈ mask; deleting that pair leaves a matching of workers 0..i−1 to mask∖{j}. This is a bijection between the assignments and ⊎_{j∈mask} (assignments of 0..i−1 to mask∖{j}), with additive cost c[i][j]. Minimising gives the recurrence; induction on popcount(mask) (using A[0]=0 as the base) establishes correctness. ∎
Why the worker dimension collapses (the key structural fact). A naive formulation would index dp[i][mask] (worker i, jobs mask). But a reachable state always satisfies i = popcount(mask): after assigning i+1 workers we have used exactly i+1 jobs. The worker count is a function of the mask, not an independent coordinate, so the [i] dimension is redundant. Dropping it changes O(n · 2^n) cells to O(2^n) cells and the transition stays O(n), giving O(2^n · n) total — a clean factor-n and n×-memory improvement. This dimension-collapse-by-invariant is a recurring theme: whenever one coordinate is determined by another, eliminate it.
Lemma 6.2 (Reachability invariant). Every mask with A[mask] < ∞ (reachable) is reached only from masks of popcount popcount(mask) − 1, and the transition assigns worker popcount(mask) − 1.
Proof. A[0] = 0 has popcount 0. Inductively, the recurrence sets A[mask] from A[mask∖{j}] by assigning one worker to job j, so popcount(mask) = popcount(mask∖{j}) + 1. Hence the popcount equals the number of workers assigned, and the worker just placed is indexed popcount(mask) − 1. The popcount is thus a derived quantity, justifying its elimination from the state. ∎
Contrast with TSP. The same collapse does not apply to Held-Karp: after visiting a set S, the current vertex last can be any element of S, not a function of S. That genuine freedom is why TSP keeps the [last] dimension (O(2^n n) cells) while assignment does not (O(2^n) cells). Recognising which coordinates are free versus derived is the core of optimal state design.
Relation to the permanent. Replacing min with + and c[i][j] by an indicator M[i][j] counts perfect matchings, i.e. computes the permanent of the 0/1 matrix M. This is Ryser's-formula territory (Section 7): the same O(2^n n) DP computes perm(M), and the permanent is #P-complete in general (Valiant 1979), which is why no polynomial algorithm is expected.
6.1 Worked Assignment Trace¶
For cost = [[9,2,7],[6,4,3],[5,8,1]] the DP fills A[mask] with worker i = popcount(mask):
A[000] = 0 (worker 0 next)
A[001] = A[000] + cost[0][0] = 9 A[010] = 2 A[100] = 7
worker 1 next for popcount-1 masks:
A[011] = min(A[001]+cost[1][1], A[010]+cost[1][0]) = min(9+4, 2+6) = 8
A[101] = min(A[001]+cost[1][2], A[100]+cost[1][0]) = min(9+3, 7+6) = 12
A[110] = min(A[010]+cost[1][2], A[100]+cost[1][1]) = min(2+3, 7+4) = 5
worker 2 next:
A[111] = min(A[011]+cost[2][2], A[101]+cost[2][1], A[110]+cost[2][0])
= min(8+1, 12+8, 5+5) = 9
A[111] = 9 is achieved by worker0→job1 (2), worker1→job2 (3), worker2→job0 (5)? That sums to 10; the DP's 9 comes from worker0→job0(9)? No — trace the argmin: A[111]=8+cost[2][2]=8+1=9 via A[011] = worker0→job1(2)+worker1→job0(6)=8, then worker2→job2(1). So assignment 0→1, 1→0, 2→2 costs 2+6+1 = 9. The brute-force minimum over all 3! = 6 permutations confirms 9 is optimal — a vivid reminder to trust the argmin path, not a hand-guessed assignment. 6.2 Counting Variant on the Same Trace¶
If instead each job is either allowed (1) or forbidden (0) and we count perfect matchings, the same dp[mask] becomes a count: dp[mask] = Σ_{j ∈ mask, allowed[i][j]} dp[mask∖{j}] with i = popcount(mask)−1. For the all-ones 3×3 allowance matrix this yields dp[111] = 3! = 6, the permanent of the all-ones matrix. The optimisation DP (min, returns 9) and the counting DP (Σ, returns 6) share the identical state space and transition structure — only the combine operator differs. This is the assignment-problem instance of the "one bijection, three semirings" theme: the (min, +) reading optimises, the (+, ×) reading counts, and a (∨, ∧) reading would decide feasibility of a perfect matching (Hall's condition), all from the same dp[mask] recurrence.
7. Counting and Inclusion-Exclusion¶
Switching the optimisation semiring (min, +) to the counting semiring (+, ×) (or (+) with indicator transitions) turns the same DPs into counting algorithms.
Hamiltonian path counting. dp[S][last] counts simple paths from a fixed start visiting exactly S ending at last; the recurrence sums predecessors (Section 2 with min → Σ). Total Hamiltonian paths = Σ_{last} dp[[n]][last], in O(2^n n^2). The semiring switch is total: the same penultimate-vertex bijection (Section 2.2) reads as cardinalities (|A ⊎ B| = |A| + |B|, |A × B| = |A|·|B|) instead of minima, so correctness is inherited from the same combinatorial fact — no separate proof needed.
Theorem 7.1 (Inclusion-exclusion for Hamiltonian paths, Karp/Kohn-Gottlieb-Kohn / Bax). The number of Hamiltonian paths can be computed in O(2^n · n) (better than the O(2^n n^2) DP by a factor of n) via inclusion-exclusion over the set of forbidden vertices. Sketch: let walks_k(S) count walks of length n−1 that avoid all vertices outside S (this is a fast matrix/vector recurrence over the induced subgraph). By inclusion-exclusion,
#Hamiltonian = Σ_{S ⊆ [n]} (−1)^{n − |S|} · (number of length-(n−1) walks from s to t staying within S),
n vertices is exactly a Hamiltonian path, and the alternating sum sieves out walks missing at least one vertex. Each term's walk count is computed by a (min n)-step recurrence in O(n) amortised, and there are 2^n subsets S, giving O(2^n n). ∎ (sketch) The permanent and Ryser's formula. For counting perfect matchings (perm of a 0/1 matrix),
O(2^n · n) (or O(2^n) amortised using Gray-code submask updates). This is the counting analogue of the assignment DP and the canonical #P-complete problem. Connection between the two views. The subset DP and the inclusion-exclusion sum are two evaluations of the same generating identity: the DP sums over paths grouped by visited-set; inclusion-exclusion sums over vertex subsets with alternating signs to enforce "all vertices used." Both are 2^n-flavoured; inclusion-exclusion often shaves the n^2 to n by replacing the per-state last bookkeeping with an algebraic sieve.
7.1 Worked Hamiltonian-Path Count¶
On the complete directed graph K_3 (adj[i][j] = 1 for all i ≠ j), the counting DP gives:
Base: dp[{0}][0] = dp[{1}][1] = dp[{2}][2] = 1.
Size 2 (sum predecessors):
dp[{0,1}][1] = dp[{0}][0] = 1 ; dp[{0,1}][0] = dp[{1}][1] = 1 ; (and the other pairs)
Size 3, e.g. ending at 2:
dp[{0,1,2}][2] = dp[{0,1}][0] + dp[{0,1}][1] = 1 + 1 = 2.
By symmetry dp[full][0] = dp[full][1] = dp[full][2] = 2.
Total = 2 + 2 + 2 = 6 = 3!. ✓
3! (every ordering of the three vertices is a valid path in the complete graph), confirming the summation recurrence. 7.2 Ryser's Formula Worked at n = 2¶
For M = [[1,1],[1,1]] (the 2×2 all-ones, whose permanent is 2):
perm(M) = (−1)^2 Σ_{S ⊆ {1,2}} (−1)^{|S|} Π_{i=1}^{2} ( Σ_{j∈S} M[i][j] ).
S = ∅: (−1)^0 · (0)(0) = 0
S = {1}: (−1)^1 · (M[1][1])(M[2][1]) = −(1)(1) = −1
S = {2}: (−1)^1 · (M[1][2])(M[2][2]) = −(1)(1) = −1
S = {1,2}: (−1)^2 · (M[1][1]+M[1][2])(M[2][1]+M[2][2]) = (2)(2) = 4
Σ = 0 − 1 − 1 + 4 = 2 = perm(M). ✓
2^n subsets and O(n) product per subset give the O(2^n n) evaluation; the alternating sign is the inclusion-exclusion sieve removing assignments that miss a column. 8. Lower-Bound Context: TSP NP-Hardness and the Held-Karp Conjecture¶
Theorem 8.1 (TSP NP-hardness). The decision version of TSP ("is there a tour of cost ≤ B?") is NP-complete; the optimisation version is NP-hard.
Proof sketch. Reduce from Hamiltonian Cycle (Karp 1972, one of the original 21 NP-complete problems). Given a graph G, build a complete TSP instance with dist[i][j] = 1 if (i,j) ∈ E(G) and dist[i][j] = 2 otherwise; G has a Hamiltonian cycle iff the TSP optimum is exactly n. The reduction is polynomial, so a polynomial TSP solver would solve Hamiltonian Cycle, hence P = NP. ∎
Consequence for bitmask DP. Since TSP is NP-hard, no polynomial algorithm is expected; the O(2^n n^2) Held-Karp is, up to the poly(n) factor, among the best exact worst-case bounds known.
The Held-Karp lower-bound question (a major open problem). Is there an exact TSP algorithm running in O((2−ε)^n) for some ε > 0? For general TSP this is open and tied to the Strong Exponential Time Hypothesis (SETH) circle of conjectures. For symmetric undirected TSP and Hamiltonicity there are randomized improvements (Björklund 2010: Hamiltonicity in O^*(1.657^n) via algebraic / determinant-of-the-graph-polynomial methods), but no general deterministic (2−ε)^n exact TSP algorithm is known. The practical takeaway: 2^n is, to current knowledge, essentially the exact-TSP barrier, and improvements are subtle and problem-specific.
8.1 The Reduction in Detail¶
Given an undirected graph G = (V, E) with |V| = n, build a complete TSP instance on V with
G uses n edges all of weight 1, giving a tour of cost exactly n. Conversely, any tour has n edges each of weight ≥ 1, so cost ≥ n, with equality iff every tour edge is a G-edge — i.e. iff the tour is a Hamiltonian cycle of G. Thus G is Hamiltonian ⟺ TSP-optimum = n. The construction is O(n^2) time, hence polynomial; a polynomial TSP solver would decide Hamiltonian Cycle in polynomial time, forcing P = NP. This is the metric (triangle-inequality-respecting, since weights 1 and 2 satisfy 2 ≤ 1 + 1) version, so even metric TSP is NP-hard. ∎ 8.2 Why the Bitmask State Is Forced¶
The NP-hardness explains why the DP state must be exponential. The Held-Karp state (visited set, current vertex) is, in a precise sense, the minimal sufficient statistic: to extend a partial tour optimally you must know which vertices remain (to forbid repeats) and where you are (to price the next edge). The "which vertices remain" component cannot be compressed below 2^n distinct values in general, because distinguishing partial tours by their unvisited sets is exactly what a correct algorithm must do — and that is the combinatorial reason no poly(n)-state DP can exist unless P = NP. Contrast walk counting (sibling 11-graphs/32-paths-fixed-length), whose memoryless state is just "current vertex" (n values), placing it in P.
Held-Karp's other meaning. "Held-Karp" also names the Held-Karp lower bound (the 1-tree LP relaxation), a different result by the same authors used inside branch-and-bound to prune. It is a strong, polynomially computable lower bound (within a factor ~2/3 to the optimum empirically), not to be confused with the DP. Both come from Held & Karp (1962–1971).
9. Sum-over-Subsets (SOS) Transform and Möbius Inversion¶
A foundational tool that generalises submask enumeration: the sum-over-subsets (SOS) DP, also called the zeta transform over the subset lattice.
Definition 9.1 (Zeta / SOS transform). Given f : 2^{[n]} → ℝ, its subset-sum transform is
O(3^n) (Theorem 5.1). The SOS DP computes all F[S] in O(2^n · n): Theorem 9.2 (SOS correctness and complexity). The above computes F[S] = Σ_{T ⊑ S} f[T] in Θ(2^n · n) time.
Proof. Loop invariant: after processing bits 0..i−1, F[S] equals the sum of f[T] over all T that agree with S on bits ≥ i and are arbitrary submasks on bits < i. Processing bit i adds, for masks S with bit i set, the contribution from T with bit i cleared, merging the two halves. After all n bits, F[S] sums over all submasks. Each of n passes does O(2^n) work. ∎
Möbius inversion (the inverse transform). The transform is invertible:
computed by the same DP with+= replaced by −=. This subset-lattice Möbius inversion is the algebraic backbone of inclusion-exclusion algorithms (Section 7) and of subset-convolution (the Björklund-Husfeldt-Kaski-Koivisto O(2^n · n^2) algorithm), which computes (f * g)[S] = Σ_{T ⊑ S} f[T] g[S∖T] — the operation behind fast set-partition and graph-colouring DPs. The reduction from a naive O(3^n) submask convolution to O(2^n n^2) via SOS + the "rank/popcount" trick is one of the deepest results in subset DP. Practical note. Many DPs phrased as O(3^n) submask loops can be sped to O(2^n n) or O(2^n n^2) with SOS / subset convolution when the combine operation is a sum or a (min,+) over a bounded additive structure. Recognising this is the difference between n = 18 and n = 22 feasibility.
9.1 Worked SOS Transform at n = 2¶
Let f = [f_{00}, f_{01}, f_{10}, f_{11}] = [1, 2, 3, 4] (indexed by mask). Apply the SOS DP:
Initial F = [1, 2, 3, 4].
Bit 0: for masks with bit 0 set, add the bit-0-cleared sibling.
F[01] += F[00] → 2 + 1 = 3
F[11] += F[10] → 4 + 3 = 7
F = [1, 3, 3, 7].
Bit 1: for masks with bit 1 set, add the bit-1-cleared sibling.
F[10] += F[00] → 3 + 1 = 4
F[11] += F[01] → 7 + 3 = 10
F = [1, 3, 4, 10].
F[11] should be f_{00}+f_{01}+f_{10}+f_{11} = 1+2+3+4 = 10. ✓ F[10] = f_{00}+f_{10} = 1+3 = 4. ✓ Two passes of O(2^n) work, no submask loop — the O(2^n n) zeta transform in action. 9.2 Möbius Inversion Check¶
Inverting F = [1,3,4,10] with −= recovers f:
Bit 1: F[10] −= F[00] → 4−1 = 3 ; F[11] −= F[01] → 10−3 = 7 ⇒ [1,3,3,7]
Bit 0: F[01] −= F[00] → 3−1 = 2 ; F[11] −= F[10] → 7−3 = 4 ⇒ [1,2,3,4] = f. ✓
+=/−= swapped, confirming the subset-lattice zeta/Möbius pair operationally — the algebraic engine behind inclusion-exclusion counting (Section 7). 9.3 Subset Convolution in O(2^n · n^2)¶
Definition 9.3. The subset (covering) convolution of f, g : 2^{[n]} → ℝ is
O(3^n) (sum over submasks of every S). The Björklund-Husfeldt-Kaski-Koivisto (2007) algorithm achieves O(2^n · n^2). The popcount-ranking trick. Define the ranked function f̂_r[S] = f[S]·[popcount(S) = r] for each rank r ∈ {0,…,n}. The key observation: in (f * g)[S], the parts T and S∖T have sizes summing to |S|, so the convolution decomposes by rank as a sum of (n+1)^2 ordinary subset-sum (zeta) products, each computable in O(2^n n). Concretely: 1. Zeta-transform each ranked layer f̂_r and ĝ_r (O(2^n n) each, O(2^n n^2) total). 2. For each S and target rank q = popcount(S), combine Σ_{a+b=q} F̂_a[S]·Ĝ_b[S] in the transformed domain (O(2^n n^2)). 3. Möbius-invert the result rank by rank. The rank constraint |T| + |S∖T| = |S| is what lets the ordinary (overlapping) zeta product be sieved back into the disjoint-cover convolution. This O(2^n n^2) algorithm powers fast graph-colouring and set-partition counting, replacing many O(3^n) DPs — the single deepest result in subset-DP theory.
Gray-code Ryser. Ryser's formula (Section 7) can be evaluated in O(2^n) arithmetic (not O(2^n n)) by visiting subsets in Gray-code order: consecutive subsets differ in one element, so the inner row-sums Σ_{j∈S} M[i][j] update in O(n) amortised O(1) per step via a single add/subtract per row. This is the standard production implementation of the permanent for n ≤ ~28, beyond which even 2^n becomes infeasible — the permanent's #P-completeness asserting itself.
10. Profile DP as a Transfer Matrix¶
Broken-profile / column-by-column grid DP (senior §5) has a clean algebraic form.
Setup. Fill an m × k grid; the state between columns is a profile mask P ∈ [0, 2^m) recording which cells protrude into the next column. Let T[P][P'] = number of ways to fill one column given incoming profile P and producing outgoing profile P'.
Theorem 10.1. The number of tilings is (T^k)[0][0] (start and end with empty profile), where T is the 2^m × 2^m transfer matrix and the product is the counting (+, ×) matrix power.
Proof. Filling proceeds column by column; the ways to go from an all-empty start profile to an all-empty end profile in k columns is, by the matrix-power walk-counting theorem (sibling topic 11-graphs/32-paths-fixed-length), the (0,0) entry of T^k. Each column transition is independent given the incoming profile, so the per-column ways multiply and sum exactly as matrix multiplication prescribes. ∎
Consequence. For fixed width m and huge height k, the tiling count is computable in O((2^m)^3 · log k) by binary exponentiation of the 2^m × 2^m transfer matrix — connecting bitmask profile DP to fast matrix exponentiation. For moderate k, the linear O(k · 2^m · transitions) column sweep is simpler and usually preferred; the transfer-matrix view is what enables the log k speedup when k is astronomically large.
Why width must be small. The transfer matrix is 2^m × 2^m; both the sweep (O(k 2^m) with O(m) transition work) and the matrix power (O(2^{3m} log k)) are exponential in the width m only. This is the formal reason senior §5 insists on choosing the narrow grid dimension as the profile.
10.1 Worked Tiling Count: 2 × k Dominoes¶
For a board of width m = 2, the profile is a 2-bit mask. Let D(k) be the number of domino tilings of a 2 × k board. The classic recurrence (placing a vertical domino in the last column, or two horizontals spanning the last two columns) is
D = 1, 1, 2, 3, 5, 8, … — the Fibonacci numbers. The transfer-matrix view recovers this: the 2^2 × 2^2 = 4×4 column-transition matrix T, restricted to the reachable profiles, has characteristic polynomial whose dominant root is the golden ratio φ, so D(k) ∼ φ^{k}/√5. The general broken-profile DP reproduces D(2)=2, D(3)=3, D(4)=5, and for an m×k board with larger m the count grows like λ_max(T)^k where λ_max(T) is the spectral radius of the width-m transfer matrix — the same λ_max^k growth law as the matrix-power walk theorem. This is the precise sense in which profile DP is fixed-length walk counting on the graph of profiles. 10.2 Why the Profile Graph Is the Right Abstraction¶
Each profile mask is a vertex; T[P][P'] is the number of ways one column transitions from incoming profile P to outgoing profile P'. A full tiling is then a walk of length k on this profile graph from the empty profile back to the empty profile — exactly (T^k)[0][0]. The bitmask (profile) is the state; the column index is the "time"; and the whole apparatus is the transfer-matrix method of analytic combinatorics. Recognising a grid-DP as a walk on the profile graph is what unlocks the O(2^{3m} log k) matrix-exponentiation speedup for astronomically tall boards, and explains why tiling counts satisfy linear recurrences (the characteristic polynomial of T).
10.3 Transfer-Matrix Exponentiation Worked¶
For the 2 × k domino board, the reachable profiles between columns are {empty} and {top-protruding}-style states; reducing to the two essential states gives the 2×2 transfer matrix T = [[1,1],[1,0]] (the Fibonacci matrix). Then the number of tilings of width 2, length k is (T^k)[0][0]-style entry, computed by binary exponentiation:
T^1 = [[1,1],[1,0]]
T^2 = [[2,1],[1,1]]
T^4 = (T^2)^2 = [[5,3],[3,2]]
For k = 5: T^5 = T^4 · T^1 = [[8,5],[5,3]], reading D(5) = 8. ✓
⌊log₂ k⌋ + popcount(k) matrix multiplies are needed, so D(10^{18}) mod a prime is computed in ~60 2×2 multiplies — the O(2^{3m} log k) bound with m = 2. This is the bridge from bitmask profile DP to the matrix-power machinery of 11-graphs/32-paths-fixed-length. 10b. Parameterized and Comparative Landscape¶
Bitmask DP sits in a landscape of exact and parameterized methods. The boundaries are sharp and worth memorising.
| Problem | Best known exact | Parameterized / note |
|---|---|---|
| TSP / shortest Hamiltonian path | O(2^n n^2) Held-Karp | NP-hard; (2−ε)^n open |
| Hamiltonicity (decision) | O^*(1.657^n) (Björklund 2010, randomized) | beats 2^n for the decision version |
| Min-cost assignment | O(2^n n) bitmask, O(n^3) Hungarian | Hungarian polynomial — preferred for large n |
| Count Hamiltonian paths | O(2^n n) inclusion-exclusion | #P-complete |
| Permanent / count matchings | O(2^n) Gray-code Ryser | #P-complete (Valiant 1979) |
| Set partition / colouring | O(2^n n^2) subset convolution | improves naive O(3^n) |
| Graph colouring (chromatic number) | O(2^n n) via inclusion-exclusion | counts proper colourings |
Steiner tree (k terminals) | O(3^k n + 2^k n^2 + ...) Dreyfus-Wagner / DW-with-merge | bitmask over terminal set, not all vertices |
The Steiner-tree subtlety. A frequent and instructive case: the minimum Steiner tree connecting k specified terminals in an n-vertex graph is solved by the Dreyfus-Wagner DP with a bitmask over the k terminals, not all n vertices — dp[S][v] = cheapest tree connecting terminal-subset S with v in the tree. The submask merge step (Σ_{T ⊑ S} dp[T][v] + dp[S∖T][v]) makes it O(3^k n + 2^k n^2). The lesson: choose the ground set whose size is genuinely small — here the terminals, not the vertices — so the 2^{(·)} factor stays tame. Misidentifying the ground set is the single most common modelling error, turning a feasible 2^k into an infeasible 2^n.
Color-coding bridge. For detecting (not optimally weighting) a length-k simple path, color-coding (Alon-Yuster-Zwick 1995) randomly k-colours the vertices and runs a bitmask DP over the 2^k colour-subset — turning a 2^n-vertex-subset DP into a 2^k-colour-subset DP, parameterized by k rather than n. This is the same "shrink the ground set" principle applied via randomization, and it is why path detection parameterized by length is FPT while exact-weight Hamiltonicity is not.
The unifying principle. Every entry in the table above is some bitmask DP whose ground set was chosen as small as the problem permits: all n vertices (Held-Karp, forced because the visited set is intrinsic), the k terminals (Steiner), the k colours (color-coding), the m-cell frontier (profile DP), or the universe of m requirements (set cover). The art of the field is reading a problem and identifying the smallest set whose subsets form a sufficient state — because that set's size is the exponent, and the exponent is destiny. Once the ground set is fixed, the recurrence, the dependency order (ascending masks), and the semiring (min / Σ / ∨) follow mechanically from the templates proved above.
10c. Worked End-to-End: Why n = 20 Is the Wall¶
Combining the identities gives a single, memorable picture of the feasibility ceiling for Held-Karp:
- Cells:
n · 2^{n−1}. Atn = 20:20 · 2^{19} ≈ 1.05 × 10^7cells. - Bytes (8 B each):
≈ 84 MB— fits, but tight with overhead. - Transitions:
n(n−1) 2^{n−2} ≈ 10^8— sub-second in Go/Java, a few seconds in CPython. - At
n = 24: cells≈ 2 × 10^8, bytes≈ 1.6 GB, transitions≈ 1.6 × 10^9— memory becomes the binding constraint. - At
n = 30: cells≈ 1.6 × 10^{10}, bytes≈ 128 GB— infeasible on commodity hardware.
The exponential 2^n term dominates everything: each +1 to n roughly doubles both time and memory. The n^2 polynomial factor is a rounding error by comparison. This is the precise quantitative content of the senior-level advice "n ≤ 20" — not a rule of thumb but a direct reading of n · 2^{n−1} against a few-GB memory budget. The same arithmetic, with the cell count divided by n (the assignment DP's 2^n cells), pushes the ceiling to n ≈ 24; the 3^n partition DP drops it back to n ≈ 18.
11. Summary¶
- Held-Karp recurrence.
dp[S][last]is the cheapest simple path from0visiting exactlySending atlast; proved by a penultimate-vertex bijection (Theorem 2.1). The optimal tour adds the closing edge. Dependency order is automatic because clearing a bit decreases the mask. - Complexity.
Θ(2^n n^2)time,Θ(2^n n)space (Theorem 3.1), the table size beingΣ_S popcount(S) = n·2^{n−1}(Theorem 4.1, proved three ways). This converts the factorial brute force into an exponential. - Submask sum.
Σ_S 2^{popcount(S)} = 3^n(Theorem 5.1), because each item is independently out / in-mask-only / in-both — the exact cost of partition / set-cover submask DPs, and the reason the base is3, not4. - Dimension collapse. The assignment DP needs only
dp[mask]because the current worker= popcount(mask)is determined by the mask; eliminating the redundant coordinate givesO(2^n n)(Theorem 6.1). - Counting and inclusion-exclusion. The same DPs count Hamiltonian paths and perfect matchings under the
(+,×)semiring; inclusion-exclusion (Karp/Bax for paths, Ryser for the permanent) often shavesn^2tonvia an alternating subset sieve. - Hardness. TSP is NP-hard (reduction from Hamiltonian Cycle, Karp 1972);
2^nis essentially the exact-TSP barrier, with only subtle randomized improvements (Björklund'sO^*(1.657^n)Hamiltonicity) known. Whether a deterministic(2−ε)^nexact TSP exists is open. - SOS / Möbius. The subset-lattice zeta transform computes all subset sums in
O(2^n n), its Möbius inverse drives inclusion-exclusion, and subset convolution (O(2^n n^2)) speeds many naiveO(3^n)DPs — the deepest algorithmic content of the area. - Profile DP. Broken-profile grid DP is a transfer-matrix power on a
2^m × 2^mmatrix; for huge height it powers inO(2^{3m} log k), linking subset DP to matrix exponentiation, and forcing the narrow grid dimension to be the mask width.
Complexity Cheat Table¶
| Task | Recurrence / method | Complexity |
|---|---|---|
| Shortest Hamiltonian path / TSP tour | dp[S][last], min | O(2^n · n^2) time, O(2^n · n) space |
| Assignment / min-cost matching | dp[mask], i = popcount(mask) | O(2^n · n) |
| Count Hamiltonian paths (DP) | dp[S][last], Σ | O(2^n · n^2) |
| Count Hamiltonian paths (sieve) | inclusion-exclusion | O(2^n · n) |
| Permanent / count matchings | Ryser's formula | O(2^n · n) |
| Partition / set-cover | submask DP | O(3^n) |
| Sum-over-subsets (zeta) | SOS DP | O(2^n · n) |
| Subset convolution | SOS + popcount rank | O(2^n · n^2) |
Broken-profile grid, width m, height k | transfer-matrix power | O(2^{3m} log k) |
| TSP exact lower bound (state of the art) | open / SETH-conjectured | ~2^n barrier |
Steiner tree, k terminals | Dreyfus-Wagner, bitmask over terminals | O(3^k n + 2^k n^2) |
Detect length-k simple path | color-coding (randomized) | O^*(2^k) |
| Chromatic number / proper colourings | inclusion-exclusion | O(2^n · n) |
Closing Notes¶
- State design is the discipline. Eliminate any coordinate that is a function of another (the assignment dimension collapse, Lemma 6.2); choose the smallest ground set (terminals for Steiner, colours for color-coding) so the
2^{(·)}exponent stays tame. Misidentifying the ground set is the dominant modelling error. - One bijection, three semirings. The penultimate-vertex decomposition (Section 2.2) proves the
min(shortest),Σ(count), and∨(feasibility) variants simultaneously — only the semiring reading changes. This is the same identity-per-semiring structure as matrix-power walk counting in11-graphs/32-paths-fixed-length. - The integer order is a topological sort. Ascending mask order is a linear extension of subset inclusion (Corollary 2.2), so no explicit topological sort of the
2^n-state DAG is ever needed. - Hoist per-mask quantities. Computing
cost(T)inside a submask loop inflatesO(3^n)toO(n·3^{n−1})(Section 5.3); precompute it once. The analogue of then·2^{n−1}table identity for the3^nworld. - Hardness is structural. The visited-set state cannot be compressed below
2^nwithout changing the model (Section 3.2, 8.2); the memoryless walk recurrence (nstates, inP) and the visited-set simple-path recurrence (2^nstates, NP-hard) differ by exactly the "no repeats" constraint.
Foundational references: Held & Karp (1962), A Dynamic Programming Approach to Sequencing Problems; Bellman (1962) for the independent DP discovery; Karp (1972) for NP-completeness; Valiant (1979) for #P-completeness of the permanent; Ryser (1963) for the permanent formula; Yates (1937) / Kennes for the fast zeta (SOS) transform; Björklund-Husfeldt-Kaski-Koivisto (2007) for subset convolution; Björklund (2010) for O^*(1.657^n) Hamiltonicity. Sibling topics: 11-graphs/32-paths-fixed-length (matrix-power walk counting), and other formulations in 13-dynamic-programming.