Sum over Subsets DP — Mathematical Foundations and Complexity Theory¶

Table of Contents¶

Formal Definitions
The Zeta Transform on the Subset Lattice
2.1 Why the Kronecker Factorization Exists
2.2 Standard-Basis vs Transformed-Basis Picture
Correctness of the SOS Recurrence (Induction over Bits)
3.1 Worked Verification of the Invariant
3.2 The Recurrence as a Layer DP
Complexity of the SOS Recurrence
The Möbius Function and Inversion on the Boolean Lattice
5.1 Spectral View of the Elementary Block
5.2 The Möbius Function via the Binomial Transform
Möbius Inversion Correctness (The Inverse Transform)
Inclusion-Exclusion as Möbius Inversion
7.1 Numeric Inclusion-Exclusion Example
7.2 The Convolution Identity for Möbius
The Superset (Up-Zeta) Transform
8.1 The Complement Bijection, Made Precise
8.2 Worked Superset Trace
Set Convolutions and Convolution Theorems
9.1 Worked OR-Convolution
9.2 Why Three Convolutions Need Three Transforms
Yates' Algorithm and the Multidimensional FFT Connection
10.1 Worked Yates Trace
10.2 The Inverse as the Same Yates with Inverted Butterfly
Subset-Sum Convolution and the Ranked-Möbius Algebra
11.1 Worked Disjoint Convolution
11.2 The Ranked Algebra as a Polynomial-in-z Lift
11.5 Modular Correctness of the Transforms
Summary

1. Formal Definitions¶

Let [n] = {0, 1, …, n-1} and identify each subset S ⊆ [n] with its characteristic bitmask mask(S) = Σ_{i∈S} 2^i ∈ {0, …, 2^n - 1}. We freely conflate a subset and its mask, writing i ⊆ j for the containment of the corresponding sets, i.e. (i & j) = i in bitwise terms.

Definition 1.1 (Boolean / subset lattice). The set 2^{[n]} ordered by inclusion ⊆ is a lattice B_n with meet ∧ = ∩ (bitwise AND), join ∨ = ∪ (bitwise OR), bottom ∅ (mask 0), and top [n] (mask 2^n - 1). It is isomorphic to the product of n two-element chains, B_n ≅ C_2^n — the source of the "dimension-by-dimension" structure.

Definition 1.2 (Input function). Let A : 2^{[n]} → R be a function from masks to a commutative ring R (typically ℤ, ℤ_p, or ℤ/2^{64}). We store it as the array A[0 … 2^n - 1].

Definition 1.3 (Down-zeta / Sum over Subsets). The subset-sum (down-zeta) transform ζ_↓ A : 2^{[n]} → R is

(ζ_↓ A)[mask] = Σ_{sub ⊆ mask} A[sub].

We write F = ζ_↓ A. By symmetry, the superset-sum (up-zeta) transform is (ζ_↑ A)[mask] = Σ_{sup ⊇ mask} A[sup].

Definition 1.4 (popcount). |mask| = popcount(mask) is the number of set bits; a mask has exactly 2^{|mask|} submasks. Equivalently, the interval [∅, mask] in B_n is itself a Boolean lattice B_{|mask|}, a self-similarity we use repeatedly (every principal down-set is a smaller Boolean lattice).

Remark. F[∅] = A[∅], and F[[n]] = Σ_{all sub} A[sub] (the grand total), because ∅ is below and [n] above everything. The empty set is a submask of every mask, so A[∅] is a summand of every F[mask] — the lattice's bottom element leaks into all down-sums.

Definition 1.5 (Down-set / order ideal). For a mask m, its down-set ↓m = {sub : sub ⊆ m} is the principal order ideal generated by m; |↓m| = 2^{|m|}. The down-zeta sums A over ↓m. Dually ↑m = {sup : sup ⊇ m} is the principal filter, with |↑m| = 2^{n - |m|}, and the up-zeta sums over ↑m. These two families are the only ones the SOS sweep can aggregate in O(n · 2^n); an arbitrary family has no compatible per-axis factorization.

Definition 1.6 (Graded poset). B_n is graded by popcount: the rank function rk(m) = |m| satisfies rk(∅) = 0, rk([n]) = n, and covers increase rank by exactly 1 (a cover adds one bit). This grading is what Section 11 exploits to upgrade aggregation into disjoint convolution, and the rank levels \binom{n}{k} are the binomial coefficients whose alternating sum drives the Möbius collapse (Section 7.2).

2. The Zeta Transform on the Subset Lattice¶

The zeta transform is a standard construction in the incidence algebra of a locally finite poset (Rota, 1964).

Definition 2.1 (Incidence algebra). For a finite poset (P, ≤), the incidence algebra I(P, R) is the set of functions f : {(x, y) : x ≤ y} → R with convolution (f * g)(x, y) = Σ_{x ≤ z ≤ y} f(x, z) g(z, y). Its identity is the Kronecker delta δ(x, y) = [x = y].

Definition 2.2 (Zeta element). The zeta function of the poset is ζ(x, y) = [x ≤ y] (1 if x ≤ y, else 0). The down-zeta transform of a function A is the action (ζ_↓ A)[y] = Σ_x ζ(x, y) A[x] = Σ_{x ≤ y} A[x], exactly Definition 1.3.

Proposition 2.3. On B_n, the zeta transform is a linear automorphism of R^{2^n} (the free R-module of functions). Its matrix Z in the standard basis has Z[mask][sub] = [sub ⊆ mask], a lower-triangular (under any linear extension of ⊆) 0/1 matrix with unit diagonal, hence invertible over any ring.

Proof. Z is lower-triangular with respect to any total order refining ⊆ (since sub ⊆ mask ⟹ |sub| ≤ |mask|, order by popcount then arbitrarily), and its diagonal entries are Z[mask][mask] = [mask ⊆ mask] = 1. A triangular matrix with unit diagonal over a commutative ring is invertible (its determinant is 1). ∎

The inverse Z^{-1} is the Möbius matrix (Section 5). The entire SOS story is: Z and Z^{-1} both factor as products of n commuting "elementary" transforms, one per bit — and that factorization is the O(n · 2^n) algorithm.

2.1 Why the Kronecker Factorization Exists¶

B_n ≅ C_2^n (Definition 1.1), and the zeta function of a product poset is the tensor product of the factor zeta functions: ζ_{P×Q} = ζ_P ⊗ ζ_Q as operators. The zeta matrix of the single chain C_2 is [[1,0],[1,1]] (Z[0][0]=Z[1][0]=Z[1][1]=1, Z[0][1]=0). Therefore

Z = [[1,0],[1,1]] ⊗ [[1,0],[1,1]] ⊗ ⋯ ⊗ [[1,0],[1,1]]   (n factors).

A Kronecker product of n matrices of size 2 is applied to a vector by Yates' algorithm in n · 2^n operations (Section 10), not by forming the 2^n × 2^n matrix (which would be 4^n). This single algebraic fact — "the zeta of a product is the product of the zetas" — is the whole reason SOS is efficient; everything else is bookkeeping.

2.2 The Standard-Basis vs Transformed-Basis Picture¶

In the standard basis, multiplication of functions is pointwise but OR-convolution is a complicated Σ_{i∨j=m}. In the zeta-transformed basis, the roles swap: OR-convolution becomes pointwise multiplication (Theorem 9.2) while pointwise multiplication becomes complicated. This is the exact analogue of "convolution in the time domain = multiplication in the frequency domain" for the Fourier transform. The zeta transform is the change of basis that diagonalizes the OR-convolution operator, just as the DFT diagonalizes cyclic convolution. The Möbius transform is the inverse change of basis.

3. Correctness of the Dimension-by-Dimension SOS Recurrence (Induction over Bits)¶

We now prove the central algorithmic claim: the in-place bit sweep computes ζ_↓ A.

The algorithm.

F := A                                   # copy
for i := 0 to n-1:                       # dimension / bit i
  for mask := 0 to 2^n - 1:
    if mask has bit i set:
      F[mask] := F[mask] + F[mask without bit i]

Let F^{(i)} denote the array F after the outer iteration for bit i has completed (and F^{(-1)} = A, the initial copy). Write mask = (b_{n-1} … b_1 b_0)_2.

Definition 3.1 (Partial-sum invariant). For −1 ≤ i ≤ n−1, define

P_i[mask] = Σ_{ sub ⊆ mask,  sub agrees with mask on bits i+1, …, n-1 } A[sub].

In words: P_i[mask] sums A[sub] over submasks sub that may differ from mask only within the low bits 0 … i, and must equal mask on the high bits i+1 … n-1.

Lemma 3.2. P_{-1}[mask] = A[mask] and P_{n-1}[mask] = (ζ_↓ A)[mask].

Proof. For i = -1 the constraint "agree on bits 0 … n-1" forces sub = mask, so the sum is the single term A[mask]. For i = n-1 the "agree on bits n … n-1" constraint is vacuous, so sub ranges over all submasks of mask, giving Σ_{sub ⊆ mask} A[sub] = (ζ_↓ A)[mask]. ∎

Theorem 3.3 (SOS correctness). For all −1 ≤ i ≤ n−1 and all masks, F^{(i)}[mask] = P_i[mask]. In particular F^{(n-1)} = ζ_↓ A.

Proof (induction on i).

Base i = -1. F^{(-1)} = A = P_{-1} by Lemma 3.2 and the initial copy.

Inductive step. Assume F^{(i-1)} = P_{i-1}. The outer iteration for bit i updates only masks with bit i set, via F[mask] := F[mask] + F[mask \ {i}] (where mask \ {i} is mask with bit i cleared). We must show the result is P_i, splitting into two cases by bit i of mask.

Case A: bit i of mask is 0. This mask is not written during pass i, so F^{(i)}[mask] = F^{(i-1)}[mask] = P_{i-1}[mask]. We must show P_{i-1}[mask] = P_i[mask]. By definition P_{i-1}[mask] sums over sub agreeing with mask on bits i, i+1, …, n-1; P_i[mask] sums over sub agreeing on bits i+1, …, n-1 (one fewer constraint, on bit i). But since bit i of mask is 0 and sub ⊆ mask, bit i of any submask sub is also forced to 0 — so the bit-i agreement constraint is automatically satisfied and dropping it changes nothing. Hence P_{i-1}[mask] = P_i[mask]. ✓

Case B: bit i of mask is 1. During pass i, F^{(i)}[mask] = F^{(i-1)}[mask] + F^{(i-1)}[mask \ {i}]. Crucially, the read F^{(i-1)}[mask \ {i}] is the value before pass i updates anything, because mask \ {i} has bit i equal to 0 and (Case A) such masks are untouched during pass i; so reading it mid-pass yields its F^{(i-1)} value regardless of mask iteration order. By the inductive hypothesis:

F^{(i)}[mask] = P_{i-1}[mask] + P_{i-1}[mask \ {i}].

Now decompose P_i[mask] (sum over sub agreeing with mask on bits i+1 … n-1, free on bits 0 … i) by the value of bit i of sub:

sub with bit i = 1: these are exactly the submasks agreeing with mask on bits i … n-1 (since bit i of mask is 1), i.e. Σ = P_{i-1}[mask].
sub with bit i = 0: these agree with mask \ {i} on bits i … n-1 (both have bit i = 0, and agree on i+1 … n-1), and range freely on bits 0 … i-1, i.e. Σ = P_{i-1}[mask \ {i}].

Therefore P_i[mask] = P_{i-1}[mask] + P_{i-1}[mask \ {i}] = F^{(i)}[mask]. ✓

Both cases give F^{(i)} = P_i, completing the induction. By Lemma 3.2, F^{(n-1)} = P_{n-1} = ζ_↓ A. ∎

Corollary 3.4 (In-place safety, order-independence). Within pass i, every write target (mask with bit i = 1) reads only mask \ {i} (bit i = 0), which is never written during pass i. Hence there is no read-after-write hazard, the result is independent of the inner-loop iteration order, and the transform is correct fully in place. This also proves the inner loop is data-parallel across masks.

Corollary 3.5 (Commuting elementary transforms). Let D_i be the linear map "add the bit-i-cleared neighbor to each bit-i-set mask" (the single pass i). The proof shows ζ_↓ = D_{n-1} ∘ ⋯ ∘ D_1 ∘ D_0. Each D_i acts only on the bit-i axis and the D_i commute (they touch independent coordinates of the product lattice C_2^n), so the passes may be applied in any order — a fact we use again for the Möbius inverse.

3.1 Worked Verification of the Invariant¶

Take n = 3, A = [a_0, …, a_7] indexed by masks 000 … 111. Trace F^{(i)}[110] (mask = 6) through the invariant P_i:

P_{-1}[110] = a_6 (only sub = 110 agrees with 110 on all bits). ✓ matches F = A.
After bit 0 (P_0[110]): sub may vary on bit 0, must agree on bits 1,2. Since bit 0 of 110 is 0, submasks must also have bit 0 = 0 (Case A), so P_0[110] = a_6 still. The cell is not written during pass 0 because 110 has bit 0 cleared. ✓
After bit 1 (P_1[110]): sub may vary on bits 0,1, agree on bit 2. Bit 1 of 110 is 1, so this is Case B: F^{(1)}[110] = F^{(0)}[110] + F^{(0)}[100] = a_6 + a_4. Submasks of 110 agreeing with it on bit 2: 100, 110 → a_4 + a_6. ✓
After bit 2 (P_2[110]): bit 2 of 110 is 1, Case B: F^{(2)}[110] = F^{(1)}[110] + F^{(1)}[010] = (a_6 + a_4) + (a_2 + a_0). Submasks of 110: 000, 010, 100, 110 → a_0 + a_2 + a_4 + a_6. ✓ equals ζ_↓ A [110].

Each pass exactly enlarges the set of free bits by one, and the read neighbor (100, then 010) is always a mask the pass does not write — the operational content of Corollary 3.4. A full brute-force expansion of all 8 cells against Σ_{sub ⊆ mask} a_{sub} confirms the invariant for every mask.

3.2 The Recurrence as a Layer DP¶

An equivalent non-in-place phrasing makes the layer structure explicit. Define F_i[mask] = sum over submasks differing only in bits < i:

F_0[mask] = A[mask]
F_{i+1}[mask] = F_i[mask] + [bit i set in mask] · F_i[mask without bit i]

Then F_n = ζ_↓ A. This is a textbook DP over n+1 layers, each of size 2^n, with an O(1) transition — O(n · 2^n). The in-place algorithm of Section 3 is the observation that layer i+1 can overwrite layer i because the only cells it reads (mask without bit i, bit i cleared) are exactly the cells it does not write in that layer (Corollary 3.4). The two formulations compute identical values; the in-place one saves a factor of n in memory.

4. Complexity of the SOS Recurrence¶

Theorem 4.1. The SOS recurrence runs in Θ(n · 2^n) ring operations and Θ(2^n) space.

Proof. The outer loop executes n times. For each i, the inner loop visits all 2^n masks; each visit performs one bit test and, for the 2^{n-1} masks with bit i set, one ring addition. So pass i costs Θ(2^n) and the total is Σ_{i=0}^{n-1} Θ(2^n) = Θ(n · 2^n). Space is one array of 2^n ring elements (in place), Θ(2^n). ∎

Comparison with naive submask enumeration.

Corollary 4.1b (Space-time tradeoff). The non-in-place layer formulation (Section 3.2) uses Θ(2^n) extra space for one scratch layer but is otherwise identical in time; the in-place version eliminates that scratch. Neither can drop below Θ(2^n) space because the output alone has 2^n entries. There is no o(2^n)-space algorithm for the dense transform — you must at least hold the answer.

Proposition 4.2. Summing over submasks independently for every mask costs Σ_{mask} 2^{|mask|} = 3^n ring additions.

Proof. Σ_{mask=0}^{2^n-1} 2^{popcount(mask)} = Σ_{k=0}^{n} \binom{n}{k} 2^k = (1 + 2)^n = 3^n by the binomial theorem (choosing k set bits, each mask with k set bits has 2^k submasks). ∎

Thus SOS improves the exponential base from 3 to 2 (with an n factor), the asymptotic ratio being 3^n / (n 2^n) = (3/2)^n / n → ∞. The improvement comes entirely from the factorization ζ_↓ = ∏ D_i (Corollary 3.5): the naive method recomputes each D_i's contribution redundantly, whereas the sweep applies each elementary transform once.

Counting the redundancy precisely. In the naive method, the entry A[sub] is added once into F[mask] for each mask ⊇ sub, i.e. 2^{n-|sub|} times across the whole computation — and each of those additions is performed independently. In the sweep, A[sub] flows into all 2^{n-|sub|} of its supersets through a binary-tree of n - |sub| doublings (one per bit not in sub), so each value participates in only O(n) additions on its path, not 2^{n-|sub|}. Summing the path lengths over all values recovers the O(n · 2^n) total. This "share the partial sums along the lattice edges instead of recomputing them per target" is the same algorithmic move as turning a naive prefix-sum-per-query into a single linear scan.

Lower bound (informal). Any algorithm that reads all 2^n inputs and writes all 2^n outputs needs Ω(2^n). The n factor reflects the n-fold "depth" of the lattice (the longest chain ∅ ⊂ … ⊂ [n] has length n); under the natural "each output is a sum of inputs reached by adding one bit at a time" circuit model, Θ(n 2^n) additions are optimal, matching the algorithm.

Remark (why not Strassen-style sub-n factor?). One might hope to shave the n factor as fast matrix multiplication shaves the cube. The zeta matrix Z is 2^n × 2^n, so a generic multiply is O(4^n) — far worse. The O(n · 2^n) comes precisely from the tensor-product (Kronecker) factorization Z = ⊗_{i} [[1,0],[1,1]] (Section 10): applying a Kronecker product of n matrices of size 2 costs n · 2^n, not 4^n. There is no known way to beat the n factor for the dense zeta transform, and the chain-depth argument suggests n · 2^n is essentially tight for the additive-circuit model. The genuinely faster regime is when A is sparse (few nonzero masks), where one can sometimes avoid touching every cell.

5. The Möbius Function and Inversion on the Boolean Lattice¶

Definition 5.1 (Möbius function). For a locally finite poset, the Möbius function μ(x, y) (defined for x ≤ y) is the inverse of ζ in the incidence algebra: Σ_{x ≤ z ≤ y} μ(x, z) ζ(z, y) = δ(x, y). Equivalently it is defined recursively by μ(x, x) = 1 and μ(x, y) = − Σ_{x ≤ z < y} μ(x, z) for x < y.

Theorem 5.2 (Möbius function of B_n). For sub ⊆ mask in the Boolean lattice,

μ(sub, mask) = (-1)^{|mask| - |sub|}.

Proof. B_n is the product of n copies of the 2-element chain C_2 = {0 < 1}. The Möbius function of a product poset is the product of the factor Möbius functions: μ_{P×Q}((x,x'),(y,y')) = μ_P(x,y) μ_Q(x',y'). For the chain C_2, μ(0,0)=μ(1,1)=1 and μ(0,1) = −μ(0,0) = −1. Taking the product over the n bit-coordinates: a bit where sub and mask agree contributes μ = 1; a bit where they differ (necessarily sub-bit 0, mask-bit 1, since sub ⊆ mask) contributes μ = −1. There are exactly |mask| − |sub| differing bits, so μ(sub, mask) = (−1)^{|mask|−|sub|}. ∎

Theorem 5.3 (Möbius inversion formula). For functions on B_n,

F[mask] = Σ_{sub ⊆ mask} A[sub]      ⟺      A[mask] = Σ_{sub ⊆ mask} (-1)^{|mask|-|sub|} F[sub].

Proof. This is the general Möbius inversion theorem (Rota): since μ = ζ^{-1} in the incidence algebra, F = ζ_↓ A ⟺ A = μ_↓ F, where (μ_↓ F)[mask] = Σ_{sub ⊆ mask} μ(sub, mask) F[sub]. Substituting Theorem 5.2 gives the stated alternating sum. Concretely, Σ_{sub ⊆ mask} (-1)^{|mask|-|sub|} Σ_{t ⊆ sub} A[t] = Σ_{t ⊆ mask} A[t] Σ_{t ⊆ sub ⊆ mask} (-1)^{|mask|-|sub|}, and the inner sum is Σ_{sub' ⊆ (mask \ t)} (-1)^{|mask\t| - |sub'|} (reindex sub' = sub \ t) = [t = mask] by the binomial identity Σ_k \binom{m}{k}(-1)^{m-k} = 0 for m > 0 and = 1 for m = 0. Hence the double sum collapses to A[mask]. ∎

5.1 Spectral View of the Elementary Block¶

Each axis block D = [[1,0],[1,1]] has characteristic polynomial (λ-1)^2, so its only eigenvalue is 1 (with algebraic multiplicity 2 but geometric multiplicity 1 — D is a single Jordan block). Consequently Z = ⊗_i D has every eigenvalue equal to 1: Z is unipotent (Z - I is nilpotent). This re-proves invertibility (det Z = 1) and explains why the inverse is itself a finite integer matrix with no eigenvalue issues — unlike the WHT, whose block [[1,1],[1,-1]] has eigenvalues ±√2 and whose inverse therefore carries the 1/2^n normalization. The unipotence of Z is the structural reason the zeta/Möbius pair works over ℤ with no denominators, whereas Fourier-type transforms require division.

5.2 The Möbius Function via the Binomial Transform¶

Restricting attention to ranks, the down-zeta acts on the rank-generating data as the binomial transform: if a_k = Σ_{|sub|=k, sub ⊆ mask} A[sub] then F[mask] = Σ_k a_k is a plain sum, but the signed recovery weights \binom{|mask|-|t|}{·} are binomial coefficients, and μ(t, mask) = (-1)^{|mask|-|t|} is the alternating binomial kernel. This is why the Möbius inverse of B_n is "the binomial transform per chain": along each maximal chain ∅ ⊂ … ⊂ [n], zeta is the partial-sum operator and Möbius is the finite-difference operator, and the (-1)^{Δ} sign is the discrete derivative's alternating coefficient.

6. Möbius Inversion Correctness (The Inverse Transform)¶

The algorithmic Möbius inverse is the SOS sweep with subtraction:

G := F
for i := 0 to n-1:
  for mask := 0 to 2^n - 1:
    if mask has bit i set:
      G[mask] := G[mask] - G[mask without bit i]

Theorem 6.1. If F = ζ_↓ A, then the subtraction sweep produces G = A.

Proof. Recall ζ_↓ = D_{n-1} ∘ ⋯ ∘ D_0 where D_i adds the bit-i-cleared neighbor (Corollary 3.5). Each D_i acts on the single bit-i axis as the 2×2 block [[1,0],[1,1]] (the bit-i-set row gets the bit-i-cleared row added). Its inverse on that axis is [[1,0],[-1,1]] — "subtract the bit-i-cleared neighbor" — call it D_i^{-1}. Because the D_i commute (independent axes), ζ_↓^{-1} = (D_{n-1} ∘ ⋯ ∘ D_0)^{-1} = D_0^{-1} ∘ ⋯ ∘ D_{n-1}^{-1}, and since they commute this equals D_{n-1}^{-1} ∘ ⋯ ∘ D_0^{-1} in any order — in particular the natural i = 0, 1, …, n-1 order of the sweep. Applying ζ_↓^{-1} to F = ζ_↓ A yields A. ∎

Corollary 6.2. The subtraction sweep computes exactly the alternating-sign Möbius sum of Theorem 5.3, but in O(n · 2^n) rather than the O(3^n) of the explicit signed double sum. The per-axis [[1,0],[-1,1]] blocks multiply out to the global sign (-1)^{|mask|-|sub|} on each (sub, mask) pair — the bit-product structure of Theorem 5.2 is precisely the commuting-block factorization.

Remark (invertibility over any ring). The block [[1,0],[1,1]] has determinant 1, so it is invertible over any commutative ring, including ℤ, ℤ_p for any p (prime or not), and ℤ/2^{64}. No division is needed — only subtraction — which is why the SOS Möbius inverse works mod a power of two where modular inverses of arbitrary elements do not exist. (Contrast the Walsh-Hadamard inverse, which divides by 2^n and so needs 2 invertible.)

6.1 Worked Möbius Round Trip (`n = 2`)¶

Start A = [3, 5, 2, 7]. Down-zeta: F = [3, 8, 5, 17] (F[01]=3+5, F[10]=3+2, F[11]=3+5+2+7). Now Möbius (subtract bit-cleared neighbor, bit 0 then bit 1):

bit 0:  F[01] -= F[00] → 8-3=5 ;  F[11] -= F[10] → 17-5=12 ;  F=[3,5,5,12]
bit 1:  F[10] -= F[00] → 5-3=2 ;  F[11] -= F[01] → 12-5=7  ;  F=[3,5,2,7]

Recovers A = [3, 5, 2, 7] exactly. The two passes apply D_0^{-1} then D_1^{-1}; because they commute, the reverse order works too. This round trip is the single most valuable runtime test (Section 9 of the senior file): a sign error surfaces immediately as a failure to recover A.

7. Inclusion-Exclusion as Möbius Inversion¶

Proposition 7.1. The principle of inclusion-exclusion is Möbius inversion on the Boolean lattice.

Proof / illustration. Let B_i be events (or sets) and define, for a mask S ⊆ [n], the "at least these" quantity g[S] = |⋂_{i ∈ S} B_i| (intersection over the indexed events, with g[∅] = size of the universe). The "exactly these" quantity is f[S] = |{x : x ∈ B_i ⟺ i ∈ S}|. Then g[S] = Σ_{T ⊇ S} f[T] (an up-zeta), and inclusion-exclusion recovers f from g by the up-Möbius: f[S] = Σ_{T ⊇ S} (-1)^{|T|-|S|} g[T]. The classic union formula |⋃ B_i| = Σ_{∅ ≠ S} (-1)^{|S|+1} |⋂_{i∈S} B_i| is the S = ∅ instance after rearrangement. ∎

Consequence. Whenever an inclusion-exclusion must be evaluated for every mask (not just one), the SOS Möbius transform computes all 2^n signed sums in O(n · 2^n), replacing the O(3^n) naive signed double sum. This is the algorithmic payoff of recognizing IE as lattice Möbius inversion: many 2^n · poly(n) algorithms (graph coloring, counting Hamiltonian paths, set cover counting) hinge on exactly this speedup.

Worked instance (covering). The number of ways to cover [n] by exactly choosing some sets from a family, counted via "ways using only sets inside T" = h[T], satisfies "ways covering exactly T" = Σ_{S ⊆ T} (-1)^{|T|-|S|} h[S] — a down-Möbius — giving covers of the full universe at T = [n] directly from one SOS pass over h.

7.1 Numeric Inclusion-Exclusion Example¶

Count integers in [1, 30] divisible by none of {2, 3, 5} (bits 0,1,2). Let A[mask] = ⌊30 / (product of primes in mask)⌋ (count divisible by all primes in mask):

A[000]=30  A[001]=15 (÷2)  A[010]=10 (÷3)  A[100]=6 (÷5)
A[011]=5 (÷6)  A[101]=3 (÷10)  A[110]=2 (÷15)  A[111]=1 (÷30)

The count divisible by none is the full-mask Möbius value Σ_{S} (-1)^{|S|} A[S] = 30 - (15+10+6) + (5+3+2) - 1 = 30 - 31 + 10 - 1 = 8. The eight survivors are {1,7,11,13,17,19,23,29}. A superset-Möbius sweep over the array A produces this signed sum at mask 000 (and the analogous "divisible by none of the primes outside T" counts at every other mask) in one O(n · 2^n) pass — the lattice realization of the sieve.

7.2 The Convolution Identity for Möbius¶

The Möbius function satisfies the defining relation Σ_{sub ⊆ t ⊆ mask} μ(sub, t) ζ(t, mask) = δ(sub, mask), i.e. μ * ζ = δ in the incidence algebra. For B_n this is Σ_{sub ⊆ t ⊆ mask} (-1)^{|t|-|sub|} = [sub = mask]. Setting sub = ∅ and mask = [n]: Σ_{t} (-1)^{|t|} = Σ_{k=0}^{n} \binom{n}{k}(-1)^k = (1-1)^n = 0 for n > 0, the algebraic core of "the empty intersection cancels". This identity is what Theorem 5.3's collapse relied on, and it is the reason inclusion-exclusion telescopes to a clean answer.

8. The Superset (Up-Zeta) Transform¶

Definition 8.1. (ζ_↑ A)[mask] = Σ_{sup ⊇ mask} A[sup].

Theorem 8.2. ζ_↑ is computed by the sweep that, for each bit i, adds the bit-i-set neighbor to each bit-i-cleared mask:

for i: for mask: if bit i clear: F[mask] += F[mask | (1<<i)]

and its inverse subtracts the same neighbor.

Proof. ζ_↑ is the down-zeta of the dual lattice (reverse all containments), which is again a Boolean lattice B_n via complementation mask ↦ ¬mask. Formally, (ζ_↑ A)[mask] = (ζ_↓ A')[¬mask] where A'[m] = A[¬m], because sup ⊇ mask ⟺ ¬sup ⊆ ¬mask. Applying Theorem 3.3 to A' and translating the bit operations through complementation turns "set-bit pulls from cleared neighbor" into "cleared-bit pulls from set neighbor". The same commuting-block argument (Theorem 6.1) gives the subtraction inverse. ∎

Möbius for supersets. μ_↑(mask, sup) = (-1)^{|sup|-|mask|}, by the identical product-lattice computation on the dual.

8.1 The Complement Bijection, Made Precise¶

Let c(m) = m ⊕ (2^n - 1) be bitwise complement (an involution, c(c(m)) = m). Define (P A)[m] = A[c(m)] (the permutation that reverses every bit). Then:

ζ_↑ = P ∘ ζ_↓ ∘ P,

i.e. complement, run the subset transform, complement back. Proof. (P ζ_↓ P A)[m] = (ζ_↓ P A)[c(m)] = Σ_{sub ⊆ c(m)} (PA)[sub] = Σ_{sub ⊆ c(m)} A[c(sub)]. Substituting sup = c(sub), the condition sub ⊆ c(m) becomes c(sup) ⊆ c(m), i.e. sup ⊇ m (complement reverses containment). So the sum is Σ_{sup ⊇ m} A[sup] = (ζ_↑ A)[m]. ∎ Since P is an involution and P^{-1} = P, this conjugation also instantly gives ζ_↑^{-1} = P ∘ ζ_↓^{-1} ∘ P, deriving the superset Möbius from the subset one for free.

8.2 Worked Superset Trace (`n = 2`)¶

A = [a_{00}, a_{01}, a_{10}, a_{11}]. Up-zeta sweep (clear-bit masks pull from the set-bit neighbor). Axis 0: masks 00, 10 (bit 0 clear) absorb 01, 11:

after axis 0:  [a_{00}+a_{01}, a_{01}, a_{10}+a_{11}, a_{11}]

Axis 1: masks 00, 01 (bit 1 clear) absorb 10, 11:

after axis 1:  [(a_{00}+a_{01})+(a_{10}+a_{11}),  a_{01}+a_{11},  a_{10}+a_{11},  a_{11}]

So G[00] = Σ A (supersets of ∅ = everything), G[01] = a_{01} + a_{11} (supersets of 01: 01, 11), G[11] = a_{11} (only itself). Matches Definition 8.1 at every mask.

9. Set Convolutions and Convolution Theorems¶

Definition 9.1 (OR / AND / XOR convolution). For A, B : 2^{[n]} → R,

(A ⊛_∨ B)[m] = Σ_{i ∨ j = m} A[i] B[j],
(A ⊛_∧ B)[m] = Σ_{i ∧ j = m} A[i] B[j],
(A ⊛_⊕ B)[m] = Σ_{i ⊕ j = m} A[i] B[j].

Theorem 9.2 (OR convolution theorem). ζ_↓(A ⊛_∨ B) = (ζ_↓ A) ⊙ (ζ_↓ B) (pointwise product). Hence A ⊛_∨ B = ζ_↓^{-1}( ζ_↓A ⊙ ζ_↓B ).

Proof. For any mask m,

(ζ_↓ A)[m] · (ζ_↓ B)[m] = (Σ_{i ⊆ m} A[i])(Σ_{j ⊆ m} B[j]) = Σ_{i ⊆ m, j ⊆ m} A[i] B[j].

The condition i ⊆ m ∧ j ⊆ m is equivalent to (i ∨ j) ⊆ m. Reindexing by t = i ∨ j,

= Σ_{t ⊆ m} ( Σ_{i ∨ j = t} A[i] B[j] ) = Σ_{t ⊆ m} (A ⊛_∨ B)[t] = (ζ_↓(A ⊛_∨ B))[m].

Applying ζ_↓^{-1} (Möbius) to both sides recovers A ⊛_∨ B. ∎

Theorem 9.3 (AND convolution theorem). ζ_↑(A ⊛_∧ B) = (ζ_↑ A) ⊙ (ζ_↑ B). (Identical proof on the dual: i ⊇ m ∧ j ⊇ m ⟺ (i ∧ j) ⊇ m.)

Theorem 9.4 (XOR convolution / Walsh-Hadamard). Let H be the Walsh-Hadamard transform: (H A)[m] = Σ_{k} (-1)^{|m ∧ k|} A[k]. Then H(A ⊛_⊕ B) = (H A) ⊙ (H B), and H^{-1} = 2^{-n} H.

Proof. (HA)[m](HB)[m] = Σ_{i,j} (-1)^{|m∧i| + |m∧j|} A[i]B[j]. Since (-1)^{|m∧i|+|m∧j|} = (-1)^{|m ∧ (i ⊕ j)|} (because |m∧i| + |m∧j| ≡ |m ∧ (i⊕j)| (mod 2): bits where both i,j meet m cancel in parity), reindex t = i ⊕ j to get Σ_t (-1)^{|m∧t|} (A ⊛_⊕ B)[t] = (H(A ⊛_⊕ B))[m]. The transform H is its own inverse up to 2^n because H^2 = 2^n I (the rows of H are orthogonal with norm 2^n). ∎

Each of ζ_↓, ζ_↑, H is computed by a bit sweep with a fixed 2×2 "butterfly", so each convolution is O(n · 2^n).

9.1 Worked OR-Convolution¶

Take n = 1, A = [a_0, a_1], B = [b_0, b_1]. By definition (A ⊛_∨ B)[0] = a_0 b_0 (only 0 ∨ 0 = 0) and (A ⊛_∨ B)[1] = a_0 b_1 + a_1 b_0 + a_1 b_1. Via the transform:

ζ_↓ A = [a_0, a_0 + a_1]          ζ_↓ B = [b_0, b_0 + b_1]
pointwise product P = [a_0 b_0, (a_0+a_1)(b_0+b_1)]
ζ_↓^{-1} P:  C[0] = P[0] = a_0 b_0
             C[1] = P[1] - P[0] = (a_0+a_1)(b_0+b_1) - a_0 b_0
                  = a_0 b_1 + a_1 b_0 + a_1 b_1.   ✓

The Möbius subtraction P[1] - P[0] is exactly the inclusion-exclusion that removes the 0 ∨ 0 term double-counted by the product of zetas. This three-line derivation generalizes verbatim to any n by Theorem 9.2.

9.2 Why the Three Convolutions Need Three Different Transforms¶

The transform that diagonalizes a set convolution is the character table of the underlying combining operation. OR/AND make 2^{[n]} a (join/meet) semilattice, whose "characters" are the indicators [mask ⊆ ·] / [mask ⊇ ·] — giving the zeta transforms. XOR makes {0,1}^n the group (ℤ/2)^n, whose characters are m ↦ (-1)^{|m∧k|} — giving Walsh-Hadamard. There is no single transform diagonalizing all three because they impose different algebraic structures on the same index set; the unifying fact is only that all three character systems factor as n-fold tensor products (Section 10), hence all three transforms are Yates sweeps.

10. Yates' Algorithm and the Multidimensional FFT Connection¶

Definition 10.1 (Yates' algorithm). Given an n-fold tensor-product linear map M = M_1 ⊗ M_2 ⊗ ⋯ ⊗ M_n (each M_k a 2×2 matrix acting on bit-axis k), Yates' algorithm applies M to a length-2^n vector in O(n · 2^n) by sweeping each axis and applying the corresponding M_k to all 2^{n-1} aligned pairs.

Theorem 10.2. The down-zeta, up-zeta, and Walsh-Hadamard transforms are all instances of Yates' algorithm with the per-axis matrices

ζ_↓ :  [[1,0],[1,1]]       ζ_↑ :  [[1,1],[0,1]]       H :  [[1,1],[1,-1]].

Proof. Section 3's D_i is precisely the application of [[1,0],[1,1]] to axis i (the bit-i-set output = bit-i-set input + bit-i-cleared input; bit-i-cleared output unchanged). The full transform is the composition over axes = the tensor product ⊗_i M_i. The same holds for ζ_↑ with [[1,1],[0,1]] and for H with [[1,1],[1,-1]]. Yates' sweep applies exactly these 2×2 blocks axis by axis, which is the SOS loop. ∎

Connection to multidimensional FFT. The DFT of size 2^n over a vector indexed by {0,1}^n factors as the tensor product of n size-2 DFTs — and the size-2 DFT matrix is [[1,1],[1,-1]], identical to the Walsh-Hadamard butterfly. Thus the WHT is the DFT over the group (ℤ/2)^n, and the FFT-style Yates factorization is the same O(n · 2^n) "apply each axis once" recursion. The zeta/Möbius transforms are the analogous "group-algebra-of-the-lattice" transforms: the OR-convolution is the product in the subset (join-)semilattice algebra, diagonalized by ζ_↓, exactly as the cyclic convolution is the product in the group algebra ℂ[ℤ_N], diagonalized by the DFT. SOS DP, AND/OR convolution, XOR convolution, and the FFT are one family: Yates' multidimensional transform with a per-axis butterfly chosen to diagonalize the relevant convolution.

10.1 Worked Yates Trace (`n = 2`, down-zeta)¶

Apply Z = [[1,0],[1,1]] ⊗ [[1,0],[1,1]] to A = [a_{00}, a_{01}, a_{10}, a_{11}] axis by axis. Axis 0 (low bit): pair (00,01) and (10,11), replace (x, y) → (x, x+y):

after axis 0:  [a_{00}, a_{00}+a_{01}, a_{10}, a_{10}+a_{11}]

Axis 1 (high bit): pair (00,10) and (01,11), replace (x, y) → (x, x+y):

after axis 1:  [a_{00},
                a_{00}+a_{01},
                a_{00}+a_{10},
                (a_{00}+a_{01})+(a_{10}+a_{11})]

The last entry is a_{00}+a_{01}+a_{10}+a_{11} = sum over all submasks of 11, and each entry is the submask sum of its index. The full transform was 2 = n axis passes, each touching all 4 = 2^n cells via 2 = 2^{n-1} butterflies — exactly n · 2^{n-1} butterfly applications, Θ(n 2^n). Writing out the 4×4 matrix Z and multiplying Z A directly gives the identical vector, confirming the Kronecker factorization.

10.2 The Inverse as the Same Yates with Inverted Butterfly¶

Z^{-1} = ([[1,0],[1,1]])^{-1} ⊗ ⋯ = [[1,0],[-1,1]] ⊗ ⋯ because (M_1 ⊗ M_2)^{-1} = M_1^{-1} ⊗ M_2^{-1}. So the Möbius inverse is the same Yates sweep with the butterfly (x, y) → (x, y - x). This makes the forward/inverse pair structurally identical — one sign flip — which is why a single parameterized routine implements both, and why an accidental sign flip is the most common SOS bug (it computes the inverse where the forward was wanted, or vice versa).

11. Subset-Sum Convolution and the Ranked-Möbius Algebra¶

The OR-convolution counts pairs (i, j) with i ∨ j = m including overlapping pairs (i ∧ j ≠ ∅). The subset-sum convolution restricts to disjoint pairs.

Definition 11.1 (Subset-sum / disjoint convolution). (A ◇ B)[m] = Σ_{i ∨ j = m, i ∧ j = ∅} A[i] B[j] = Σ_{i ⊆ m} A[i] B[m \ i].

The ranked-Möbius theorem (Björklund-Husfeldt-Kaski-Koivisto, 2007). Disjointness i ∧ j = ∅ together with i ∨ j = m is equivalent to i ∨ j = m and |i| + |j| = |m|. Track popcount as an extra graded dimension:

Algorithm 11.2. Define Â[r][m] = A[m] if |m| = r else 0 (and likewise B̂). Apply ζ_↓ to each rank slice Â[r][·]. Form Ĉ[r][m] = Σ_{p+q=r} (ζ_↓Â[p])[m] · (ζ_↓B̂[q])[m]. Apply the Möbius ζ_↓^{-1} to each Ĉ[r][·]. Then (A ◇ B)[m] = Ĉ[|m|][m].

Theorem 11.3 (Correctness and complexity). Algorithm 11.2 computes A ◇ B correctly in O(n^2 · 2^n) time and O(n · 2^n) space.

Proof (correctness). After the rank-graded zeta and pointwise rank-convolution, Ĉ[r][m] = Σ_{p+q=r} Σ_{i ⊆ m, |i|=p} Σ_{j ⊆ m, |j|=q} A[i]B[j] = Σ_{i,j ⊆ m, |i|+|j|=r} A[i]B[j] — the OR-convolution restricted to total popcount r. The Möbius on rank r then extracts, at mask m, the pairs with i ∨ j = m and |i|+|j| = r. Selecting r = |m| enforces |i| + |j| = |m|, which combined with i ∨ j = m forces i ∧ j = ∅ (overlap would make |i|+|j| > |i∨j| = |m|). Hence Ĉ[|m|][m] = (A ◇ B)[m]. (Complexity) There are n+1 rank slices; each zeta/Möbius is O(n · 2^n), giving O(n^2 · 2^n); the rank-convolution of the pointwise products is O(n^2) per mask, O(n^2 · 2^n) total. ∎

Significance. This O(n^2 · 2^n) disjoint convolution is the engine behind fast covering/partition counting — e.g. counting set partitions, graph colorings, and the O(2^n n^2) chromatic-polynomial and Steiner-tree DPs. It generalizes SOS from "aggregate over containment" to "combine disjoint pieces", and the grading by popcount is the precise algebraic device that turns the non-disjoint OR-convolution into the disjoint one.

11.1 Worked Disjoint Convolution (`n = 2`)¶

Let A = [a_{00}, a_{01}, a_{10}, a_{11}], B = [b_{00}, b_{01}, b_{10}, b_{11}]. The disjoint convolution at m = 11 is Σ_{i|j=11, i&j=0} A[i]B[j] — the partitions of {0,1} into an ordered pair:

(i,j) ∈ {(00,11),(11,00),(01,10),(10,01)}
C[11] = a_{00}b_{11} + a_{11}b_{00} + a_{01}b_{10} + a_{10}b_{01}.

The pairs (01,01) (overlap on bit 0) and (11,11) (overlap on both) are excluded — exactly what plain OR-convolution would wrongly include. The popcount grading enforces this: at m = 11, |m| = 2, so only pairs with |i| + |j| = 2 survive, and |i|+|j| = |i∨j| = 2 forces i & j = ∅. Reading rank 2 of the Möbius-inverted graded product at mask 11 yields precisely the four-term C[11] above.

11.2 The Ranked Algebra Is a Polynomial-in-`z` Lift¶

A cleaner way to see Algorithm 11.2: attach a formal variable z tracking popcount, lifting each A[m] to A[m] · z^{|m|}. The OR-convolution in this lifted ring multiplies the z-powers, so a term survives the disjointness filter iff its z-degree equals |m|. Truncating each cell's polynomial at degree n (only n+1 coefficients matter) gives the n+1 rank slices, and the rank-convolution is polynomial multiplication mod z^{n+1}. This z-lift is the conceptual origin of the extra n factor: each cell now carries a degree-≤ n polynomial instead of a scalar, and the zeta/Möbius act coefficient-wise. The "Fourier meets Möbius" result is exactly the statement that this lifted product is computed by graded zeta/Möbius transforms.

11.5 Modular Correctness of the Transforms¶

The transforms are linear over the chosen ring R, so all results carry verbatim from ℤ to ℤ_p (or ℤ/2^{64}) by a ring homomorphism argument.

Proposition 11.5.1. Let φ : ℤ → R be a ring homomorphism (e.g. reduction mod p). Then φ commutes with the zeta, Möbius, and Yates transforms applied entrywise: φ(ζ_↓ A) = ζ_↓(φ A), and likewise for μ and H.

Proof. Each transform is a fixed integer linear combination of input entries (entries of the 0/±1 matrices Z, Z^{-1}, H). A ring homomorphism preserves sums and integer scalar multiples, so applying φ before or after the linear map gives the same result. ∎

Corollary 11.5.2. Running the entire SOS pipeline in ℤ_p (reduce after each +/-) yields exactly (ζ_↓ A) mod p. Hence the CRT pipeline — run independently mod p_1, …, p_m, recombine — recovers the exact integer transform whenever it fits below ∏ p_k. The Möbius inverse needs no division (Section 6 remark), so it is valid mod any p, prime or not, and mod 2^{64}.

Caveat (Walsh-Hadamard). The inverse WHT divides by 2^n. Over ℤ_p with odd p, replace the division by multiplication by (2^n)^{-1} mod p; over ℤ/2^{64} the division by 2^n is not invertible (2 is a zero divisor), so XOR-convolution mod a power of two must be handled by clearing the 2^n factor differently (e.g. carry exact integers if they fit, or use an odd modulus). This is the one place where the "any ring" comfort of zeta/Möbius does not extend to the WHT.

12. Summary¶

Zeta transform. (ζ_↓ A)[m] = Σ_{sub ⊆ m} A[sub] is the down-zeta of the Boolean lattice B_n, a linear automorphism whose matrix is unit-lower-triangular (hence invertible over any ring).
SOS correctness (Section 3). The bit sweep computes ζ_↓ by maintaining the invariant F^{(i)}[m] = Σ_{sub ⊆ m, sub ≡ m on bits > i} A[sub], proved by induction over bits; the key is that within pass i the read neighbor (bit i cleared) is never written, so in-place updates are hazard-free and order-independent.
Complexity (Section 4). Θ(n · 2^n) operations, Θ(2^n) space, versus 3^n for naive submask enumeration (Σ_m 2^{|m|} = 3^n). The improvement is the factorization ζ_↓ = ∏_i D_i into n commuting elementary axis transforms.
Möbius (Sections 5–6). μ(sub, m) = (-1)^{|m|-|sub|} (product of chain Möbius functions). The Möbius inverse is the same sweep with subtraction; correctness follows because each axis block [[1,0],[1,1]] inverts to [[1,0],[-1,1]] and the blocks commute. Invertibility needs no division, so it holds over ℤ, ℤ_p, and ℤ/2^{64}.
Inclusion-exclusion (Section 7) is exactly Möbius inversion on B_n; SOS evaluates all 2^n IE sums in O(n · 2^n) rather than O(3^n).
Superset transform (Section 8) is the down-zeta of the dual lattice via complementation.
Convolution theorems (Section 9). ζ_↓ diagonalizes OR-convolution, ζ_↑ diagonalizes AND-convolution, and the Walsh-Hadamard H diagonalizes XOR-convolution; each is O(n · 2^n).
Yates / FFT (Section 10). Down-zeta, up-zeta, and WHT are Yates' algorithm with per-axis matrices [[1,0],[1,1]], [[1,1],[0,1]], [[1,1],[1,-1]]; the WHT is the DFT over (ℤ/2)^n, placing SOS, set convolutions, and the FFT in one tensor-product family.
Subset-sum convolution (Section 11). Grading by popcount turns OR-convolution into the disjoint subset-sum convolution in O(n^2 · 2^n) — the basis of fast partition/covering DPs.

Complexity Cheat Table¶

Task	Transform	Complexity
Subset sum (zeta)	`ζ_↓` Yates `[[1,0],[1,1]]`	`O(n · 2^n)`
Superset sum (up-zeta)	`ζ_↑` Yates `[[1,1],[0,1]]`	`O(n · 2^n)`
Recover `A` (Möbius)	`ζ_↓^{-1}` (subtract)	`O(n · 2^n)`
All `2^n` inclusion-exclusion sums	Möbius	`O(n · 2^n)`
OR-convolution	`ζ_↓`, ⊙, `ζ_↓^{-1}`	`O(n · 2^n)`
AND-convolution	`ζ_↑`, ⊙, `ζ_↑^{-1}`	`O(n · 2^n)`
XOR-convolution	`H`, ⊙, `2^{-n}H`	`O(n · 2^n)`
Disjoint subset-sum convolution	ranked `ζ_↓`	`O(n^2 · 2^n)`
Naive submask sum (all masks)	none	`O(3^n)`
Min/max over submasks	min/max sweep (no inverse)	`O(n · 2^n)`
Zeta over a `q`-ary product lattice	`q`-prefix Yates	`O(n · q · q^n)`
Divisor-lattice GCD/LCM convolution	number-theoretic Möbius	`O(N log log N)` (sieve form)

Transform-Algebra Cheat Table¶

Transform	Per-axis butterfly	Eigenvalues	Inverse	Diagonalizes
Down-zeta `ζ_↓`	`[[1,0],[1,1]]`	`1, 1` (unipotent)	subtract neighbor	OR-convolution
Up-zeta `ζ_↑`	`[[1,1],[0,1]]`	`1, 1` (unipotent)	subtract neighbor	AND-convolution
Walsh-Hadamard `H`	`[[1,1],[1,-1]]`	`±√2`	`2^{-n} H`	XOR-convolution
DFT (size 2)	`[[1,1],[1,-1]]`	`±√2`	`2^{-n} \overline{H}`	cyclic conv. over `(ℤ/2)^n`

Closing Notes¶

Incidence-algebra view (Sections 2, 5): ζ and μ = ζ^{-1} are inverse elements of the incidence algebra of B_n; SOS is their efficient evaluation via the product-lattice factorization. The transform is the change of basis that turns OR-convolution into pointwise multiplication (Section 2.2), exactly as the Fourier transform does for cyclic convolution.
Commuting butterflies (Sections 3, 6): the n per-axis maps commute because they act on independent coordinates of C_2^n, which is why the sweep is order-independent, in-place, and trivially invertible. The zeta block is unipotent (all eigenvalues 1), so the inverse is an integer matrix with no denominators — the structural reason the pair works over any ring (Section 5.1).
One algorithm, many transforms (Section 10): zeta, Möbius, AND/OR/XOR convolution, and the FFT are all Yates' multidimensional transform with different butterflies — the deepest unifying theorem of the topic. The choice of butterfly is dictated by the character system of the combining operation (Section 9.2): semilattice characters for OR/AND, group characters for XOR.
Grading (Section 11): adding a popcount dimension upgrades aggregation-over-containment to disjoint combination, the algebraic heart of 2^n poly(n) covering algorithms. The grading is a polynomial-in-z lift (Section 11.2) where z tracks rank, and the extra n factor is the cost of carrying a degree-≤ n polynomial per cell.
Modular safety (Section 11.5): zeta and Möbius commute with any ring homomorphism, so the pipeline runs verbatim mod p or 2^{64}; only the WHT's 1/2^n normalization breaks over a power-of-two modulus.

Foundational references: Rota (1964), On the foundations of combinatorial theory I: theory of Möbius functions (incidence algebra and lattice Möbius functions); Yates (1937), The design and analysis of factorial experiments (the original multidimensional "Yates' algorithm"); Björklund-Husfeldt-Kaski-Koivisto (2007), Fourier meets Möbius: fast subset convolution (the O(n^2 2^n) disjoint convolution and ranked-Möbius algebra); Stanley, Enumerative Combinatorics Vol. 1, Ch. 3 (incidence algebras, Möbius inversion, the product theorem for μ); Knuth, TAOCP Vol. 4A (bitwise tricks and subset enumeration); the cp-algorithms.com "Sum over Subsets" and "Submask enumeration" articles, and the usaxena95 Codeforces SOS DP tutorial. The Walsh-Hadamard / (ℤ/2)^n Fourier perspective is standard in coding theory and analysis of Boolean functions (O'Donnell, Analysis of Boolean Functions). Sibling topics: 12-bitmask-dp and the prefix-sum / DP-on-subsets material in 13-dynamic-programming.

A Note on Generalizations Beyond `B_n`¶

The same machinery applies to any finite product of chains, not just C_2^n. If the universe has n "slots" each taking one of q values, the index space is [q]^n and the zeta transform of the product lattice is computed by a Yates sweep with the q × q lower-triangular all-ones block (cost O(n · q · q^n) if done per-axis with the q-prefix-sum). The Boolean case q = 2 is the most common because subsets are ubiquitous, but the divisor lattice (each prime's exponent is a chain) and the multiset lattice are the same theory — Möbius inversion there is the number-theoretic Möbius function and the "GCD/LCM convolutions" are the divisor-lattice analogues of AND/OR convolutions. Recognizing SOS as "zeta on a product of chains" is what lets you transfer it to these settings.

One-Paragraph Recap¶

The Sum-over-Subsets transform is the down-zeta of the Boolean lattice B_n ≅ C_2^n. Because zeta factors as the n-fold Kronecker product of the single-chain block [[1,0],[1,1]], it is evaluated by Yates' algorithm — one in-place bit sweep per dimension — in Θ(n · 2^n), beating the 3^n naive submask enumeration by sharing partial sums along lattice edges. The block is unipotent, so the inverse (Möbius) transform is the same sweep with subtraction, valid over any ring, and it realizes inclusion-exclusion. Changing the per-axis butterfly to [[1,1],[0,1]] gives the superset transform (AND-convolution), and to [[1,1],[1,-1]] gives the Walsh-Hadamard transform (XOR-convolution); grading by popcount lifts aggregation to the disjoint subset-sum convolution at an extra n factor. SOS DP, set convolutions, and the FFT are one tensor-product family.

The proofs in this document are deliberately layered so each can be cited independently: the induction over bits (Section 3) is the algorithmic correctness; the incidence-algebra / product-lattice argument (Sections 2, 5) is the structural why; the convolution theorems (Section 9) are the applications; and the Yates / FFT unification (Section 10) is the conceptual summit. A reader who internalizes only "SOS is the multidimensional prefix sum, and its inverse is finite differences with alternating signs" already holds the load-bearing intuition; the formalism above justifies every operational rule the senior file relies on.