Binomial Coefficients C(n, k) — Middle Level¶

Focus: Why the modular method works and when to choose each approach. We go deep on computing C(n, k) mod p via precomputed factorials and modular inverses, the linear inverse-factorial trick, and the core identities — Pascal's identity, symmetry, the hockey-stick identity, and Vandermonde's identity — that turn brute-force counting into closed forms. We also map out when O(n²) DP beats the factorial method and vice versa.

Introduction¶

At junior level you learned three routes to C(n, k): Pascal's triangle, the multiplicative formula, and factorials-plus-inverse mod a prime. At middle level you turn those routes into engineering decisions and prove to yourself why they are correct:

Why does fact[n] · invFact[k] · invFact[n−k] (mod p) give the right answer, and why must p be prime?
How do you precompute all inverse factorials with a single modular inverse instead of n of them?
When does the O(n²) Pascal DP beat the O(N)-precompute factorial method, and when is it the reverse?
Which identity — Pascal, symmetry, hockey-stick, Vandermonde — collapses a counting problem into a formula?
What breaks when the modulus is not prime or n is enormous, and which sibling topic (24-lucas-theorem, 33-factorial-mod-p) takes over?

These are the questions that separate "I can call comb()" from "I can choose the right method, size the precompute, and prove the answer."

Deeper Concepts¶

Why the modular factorial method needs a prime¶

The closed form is C(n, k) = n! / (k! (n−k)!). Under a modulus there is no "÷"; you multiply by a modular inverse. The inverse of a mod m is the x with a·x ≡ 1 (mod m), and it exists iff gcd(a, m) = 1. When m = p is prime, every a in 1 … p−1 is coprime to p, so every inverse exists and Fermat's little theorem gives it cheaply: a⁻¹ ≡ a^(p−2) (mod p) (sibling 05-fermat-euler).

If p is composite, some factor of k! or (n−k)! may share a factor with p, the inverse may not exist, and the formula collapses. That is exactly why a non-prime modulus pushes you to Lucas' theorem (prime modulus, huge n) or factorial-mod-prime-power + CRT (33-factorial-mod-p, 06-crt). The prime requirement is not a convenience — it is what makes the inverse well-defined for the whole range 1 … p−1.

Why the inverse-factorial recurrence works¶

Computing n separate inverses (invFact[i] = pow(fact[i], p−2, p)) costs O(n log p). There is a much better way using one inverse:

invFact[n] = (n!)⁻¹ mod p          # one Fermat inverse
invFact[i−1] = invFact[i] · i mod p

Why is the recurrence valid? Because (i−1)! = i! / i, so (i−1)!⁻¹ = (i!⁻¹) · i under the modulus. Concretely: invFact[i] = (i!)⁻¹, and multiplying by i cancels the i factor in i!, leaving ((i−1)!)⁻¹. This turns O(n log p) into O(n + log p) — one inverse plus n cheap multiplies. It is the single most important efficiency idea in modular combinatorics.

Pascal's identity — the DP recurrence¶

C(n, k) = C(n−1, k−1) + C(n−1, k),    with C(n,0)=C(n,n)=1.

This is a textbook DP recurrence: each value depends on two values in the previous row, computed with addition only. The cost to fill rows 0 … n is 1 + 2 + … + (n+1) = O(n²), and it works without any division or modular inverse — even modulo a composite number, because addition is always well-defined. That property (no division) is what makes the DP the right tool when p is not prime but n is small enough for an O(n²) table.

Multiplicative formula — integer exactness¶

C(n, k) = ∏_{i=1}^{k} (n − k + i) / i

The non-obvious fact: the partial product after step i equals C(n−k+i, i), which is always an integer, so the running division by i is always exact. This is why you must do result * (n−k+i) / i in that order — multiply first, then divide — never result / i * (n−k+i).

Comparison with Alternatives¶

Method selection¶

Method	Time	Space	Modulus	Best when
Pascal DP (full table)	O(n²)	O(n²)	any (prime or not)	Need all `C(i,j)` for `i,j ≤ n`; `n ≲ 5000`.
Pascal DP (one row)	O(n²) total	O(n)	any	Need one full row `n`.
Multiplicative formula	O(min(k,n−k))	O(1)	none (exact/BigInt)	A few values, no modulus.
Factorials + inverse (mod p)	O(N) setup, O(1)/query	O(N)	prime	Many queries mod a prime, large `n`.
Single C mod p (no table)	O(k + log p)	O(1)	prime	One query, no room for a table.
Lucas' theorem	O(p + log_p n)	O(p)	prime, huge n	`n` exceeds precompute range (sibling `24`).
Factorial mod p^e + CRT	varies	varies	prime power / composite	Non-prime modulus (sibling `33`, `06`).

Pascal DP vs factorial method — the decision¶

Criterion	Pascal DP	Factorial + inverse
Modulus must be prime?	No (addition only)	Yes
Cost to set up	O(n²)	O(n)
Cost per query	O(1) (lookup)	O(1) (lookup)
Memory	O(n²) full / O(n) one-row	O(n)
Max practical `n`	~5,000 (table)	~10⁷–10⁸
Needs modular inverse?	No	Yes

Choose Pascal DP when: n is small (≲ 5000), you need many (n,k) pairs, or the modulus is composite (no inverse needed). Choose factorials + inverse when: n is large (≥ 10⁴), the modulus is a prime, and you have many queries — the O(N) setup crushes O(n²).

Advanced Patterns¶

Pattern: Linear inverse-factorial precompute¶

Go¶

package main

import "fmt"

const MOD = 1_000_000_007

var fact, invFact []int64

func powMod(a, e, m int64) int64 {
    a %= m
    r := int64(1)
    for e > 0 {
        if e&1 == 1 {
            r = r * a % m
        }
        a = a * a % m
        e >>= 1
    }
    return r
}

func build(n int) {
    fact = make([]int64, n+1)
    invFact = make([]int64, n+1)
    fact[0] = 1
    for i := 1; i <= n; i++ {
        fact[i] = fact[i-1] * int64(i) % MOD
    }
    invFact[n] = powMod(fact[n], MOD-2, MOD) // ONE inverse
    for i := n; i > 0; i-- {
        invFact[i-1] = invFact[i] * int64(i) % MOD
    }
}

func C(n, k int) int64 {
    if k < 0 || k > n {
        return 0
    }
    return fact[n] * invFact[k] % MOD * invFact[n-k] % MOD
}

func main() {
    build(1_000_000)
    fmt.Println(C(5, 2))              // 10
    fmt.Println(C(1_000_000, 3))      // big mod p
}

Java¶

public class CombMod {
    static final long MOD = 1_000_000_007L;
    static long[] fact, invFact;

    static long powMod(long a, long e, long m) {
        a %= m; long r = 1;
        while (e > 0) {
            if ((e & 1) == 1) r = r * a % m;
            a = a * a % m;
            e >>= 1;
        }
        return r;
    }

    static void build(int n) {
        fact = new long[n + 1];
        invFact = new long[n + 1];
        fact[0] = 1;
        for (int i = 1; i <= n; i++) fact[i] = fact[i - 1] * i % MOD;
        invFact[n] = powMod(fact[n], MOD - 2, MOD);   // ONE inverse
        for (int i = n; i > 0; i--) invFact[i - 1] = invFact[i] * i % MOD;
    }

    static long C(int n, int k) {
        if (k < 0 || k > n) return 0;
        return fact[n] * invFact[k] % MOD * invFact[n - k] % MOD;
    }

    public static void main(String[] args) {
        build(1_000_000);
        System.out.println(C(5, 2));            // 10
        System.out.println(C(1_000_000, 3));    // big mod p
    }
}

Python¶

MOD = 1_000_000_007

def build(n):
    fact = [1] * (n + 1)
    for i in range(1, n + 1):
        fact[i] = fact[i - 1] * i % MOD
    inv_fact = [1] * (n + 1)
    inv_fact[n] = pow(fact[n], MOD - 2, MOD)   # ONE inverse
    for i in range(n, 0, -1):
        inv_fact[i - 1] = inv_fact[i] * i % MOD
    return fact, inv_fact


def C(n, k, fact, inv_fact):
    if k < 0 or k > n:
        return 0
    return fact[n] * inv_fact[k] % MOD * inv_fact[n - k] % MOD


if __name__ == "__main__":
    fact, inv_fact = build(1_000_000)
    print(C(5, 2, fact, inv_fact))         # 10
    print(C(1_000_000, 3, fact, inv_fact)) # big mod p

Pattern: One-row Pascal in O(n) space (composite modulus safe)¶

def pascal_row_mod(n, mod):
    """Row n of Pascal's triangle mod any modulus (no inverse needed)."""
    row = [1] * (n + 1)
    for i in range(1, n + 1):
        for k in range(i - 1, 0, -1):
            row[k] = (row[k] + row[k - 1]) % mod   # addition only
    return row

Because this uses only addition, it works for any modulus — prime or composite — making it the fallback when the factorial method's inverse does not exist.

Pattern: Single C(n, k) mod p without a table¶

def comb_single(n, k, mod):
    """One C(n,k) mod prime p in O(k + log p), no precompute."""
    if k < 0 or k > n:
        return 0
    k = min(k, n - k)
    num = 1
    for i in range(k):
        num = num * (n - i) % mod        # n·(n-1)···(n-k+1)
    den = 1
    for i in range(1, k + 1):
        den = den * i % mod              # k!
    return num * pow(den, mod - 2, mod) % mod   # one Fermat inverse

The Identities Toolbox¶

These four identities turn many counting problems into closed forms. Each is proved formally in professional.md; here we state, illustrate, and use them.

1. Pascal's identity (recurrence + sum rule)¶

C(n, k) = C(n−1, k−1) + C(n−1, k)

Combinatorial proof: fix one element. Subsets of size k either contain it (C(n−1,k−1) ways) or do not (C(n−1,k) ways). This is the engine of the triangle.

2. Symmetry¶

C(n, k) = C(n, n − k)

Choosing k to keep ⟺ choosing n−k to discard. Use it to compute with the smaller index.

3. Hockey-stick identity (column sums)¶

∑_{i=r}^{n} C(i, r) = C(n + 1, r + 1)

Sum a "diagonal" of the triangle starting at C(r, r) going down, and you land on the entry one step down-and-right — drawing a hockey-stick shape. Example with r = 2:

C(2,2) + C(3,2) + C(4,2) + C(5,2) = 1 + 3 + 6 + 10 = 20 = C(6, 3).

Use: collapses a running sum of binomials into a single binomial — common in counting "at least"/cumulative problems.

4. Vandermonde's identity (convolution)¶

C(m + n, r) = ∑_{k=0}^{r} C(m, k) · C(n, r − k)

Combinatorial proof: to pick r from a group of m reds plus n blues, choose k reds and r−k blues, summed over all splits k. Example m = n = 2, r = 2:

C(4, 2) = C(2,0)C(2,2) + C(2,1)C(2,1) + C(2,2)C(2,0)
        = 1·1 + 2·2 + 1·1 = 6.   ✓

Use: convolutions of binomials, polynomial coefficient products, and as a stepping stone to generating functions (sibling 22-polynomial-operations).

Other handy facts¶

Identity	Statement	Use
Row sum	`∑_{k=0}^{n} C(n,k) = 2ⁿ`	Total number of subsets of an `n`-set.
Alternating sum	`∑_{k=0}^{n} (−1)^k C(n,k) = 0` (`n ≥ 1`)	Inclusion-exclusion (sibling `26`).
Absorption	`k·C(n,k) = n·C(n−1,k−1)`	Re-index sums; derive expected values.
Binomial theorem	`(x+y)ⁿ = ∑_k C(n,k) xᵏ y^{n−k}`	Coefficient extraction; why "binomial."

Modular C(n,k): The Complete Recipe¶

Putting it together, here is the canonical contest/production setup:

GIVEN: prime modulus p (e.g. 10^9 + 7), max n = N, many queries.

ONE-TIME PRECOMPUTE  (O(N)):
  fact[0] = 1
  for i in 1..N:  fact[i] = fact[i-1] * i mod p
  invFact[N] = fact[N]^(p-2) mod p            # single Fermat inverse
  for i in N..1:  invFact[i-1] = invFact[i] * i mod p

PER QUERY  (O(1)):
  if k < 0 or k > n: return 0
  return fact[n] * invFact[k] % p * invFact[n-k] % p

Why each line: - fact[i] accumulates i! mod p with one multiply per step. - The single pow(fact[N], p−2, p) is the only O(log p) inverse you ever compute. - The downward sweep fills every invFact[i] using (i−1)!⁻¹ = i!⁻¹ · i. - The query divides n! by k! and (n−k)! via their precomputed inverses.

This is O(N) setup, O(1) per query, and correct for any 0 ≤ k, n ≤ N because the boundary guard handles out-of-range k.

graph LR A["fact[0..N] in O(N)"] --> B["invFact[N] = fact[N]^(p-2)<br/>ONE inverse"] B --> C["invFact[i-1]=invFact[i]*i<br/>sweep down O(N)"] C --> D["query: fact[n]*invFact[k]*invFact[n-k]<br/>O(1)"]

Code Examples¶

Hockey-stick and Vandermonde verified against brute force¶

Python¶

import math

def hockey_stick(n, r):
    """Verify sum_{i=r}^{n} C(i, r) == C(n+1, r+1)."""
    lhs = sum(math.comb(i, r) for i in range(r, n + 1))
    rhs = math.comb(n + 1, r + 1)
    return lhs, rhs


def vandermonde(m, n, r):
    """Verify C(m+n, r) == sum_k C(m, k) C(n, r-k)."""
    lhs = math.comb(m + n, r)
    rhs = sum(math.comb(m, k) * math.comb(n, r - k) for k in range(r + 1))
    return lhs, rhs


if __name__ == "__main__":
    print(hockey_stick(5, 2))   # (20, 20)
    print(vandermonde(2, 2, 2)) # (6, 6)
    print(vandermonde(5, 4, 3)) # (84, 84)

Go¶

package main

import "fmt"

func choose(n, k int64) int64 {
    if k < 0 || k > n {
        return 0
    }
    if k > n-k {
        k = n - k
    }
    r := int64(1)
    for i := int64(1); i <= k; i++ {
        r = r * (n - k + i) / i
    }
    return r
}

func main() {
    // hockey-stick: sum_{i=2}^{5} C(i,2) == C(6,3)
    var s int64
    for i := int64(2); i <= 5; i++ {
        s += choose(i, 2)
    }
    fmt.Println(s, choose(6, 3)) // 20 20

    // vandermonde: C(4,2) == sum_k C(2,k)C(2,2-k)
    var v int64
    for k := int64(0); k <= 2; k++ {
        v += choose(2, k) * choose(2, 2-k)
    }
    fmt.Println(v, choose(4, 2)) // 6 6
}

Java¶

public class Identities {
    static long choose(long n, long k) {
        if (k < 0 || k > n) return 0;
        if (k > n - k) k = n - k;
        long r = 1;
        for (long i = 1; i <= k; i++) r = r * (n - k + i) / i;
        return r;
    }

    public static void main(String[] args) {
        long s = 0;
        for (long i = 2; i <= 5; i++) s += choose(i, 2);
        System.out.println(s + " " + choose(6, 3));   // 20 20

        long v = 0;
        for (long k = 0; k <= 2; k++) v += choose(2, k) * choose(2, 2 - k);
        System.out.println(v + " " + choose(4, 2));   // 6 6
    }
}

Error Handling¶

Scenario	What goes wrong	Correct approach
Inverse with composite modulus	`fact[k]⁻¹` may not exist	Use Pascal DP, or Lucas (`24`) / prime-power (`33`).
Query `n > N` (precompute size)	Index out of bounds	Size `N` to the true max `n`; or compute single-value.
`n` enormous (`> 10⁹`)	Can't precompute `n!` at all	Use Lucas' theorem (`24-lucas-theorem`).
Overflow before `% p`	`fact[i-1] * i` exceeds 64-bit	Reduce mod `p` immediately after each multiply.
Wrong `invFact` chain	Built down without seeding `invFact[N]`	Seed `invFact[N]` with one inverse, then sweep.
Negative intermediate (Go/Java)	`n − k + i` or a subtraction goes negative	Guard `k > n`; normalize residues to `[0, p)`.
Hockey-stick off by one	Summed `C(i,r)` over wrong range	Range is `i = r … n`, result is `C(n+1, r+1)`.

Performance Analysis¶

Task	Scale	Cost
Precompute `fact`, `invFact` up to `N`	`N ≤ 10⁷`	`O(N)` — well under a second
One query `C(n,k) mod p` after precompute	any `n ≤ N`	`O(1)`
Full Pascal table	`n ≤ 5000`	`O(n²)` ≈ 1.25·10⁷ cells
One-row Pascal	`n ≤ 10⁶`	`O(n²)` total — slow for big `n`
Single `C(n,k) mod p`, no table	any	`O(k + log p)`
Lucas' theorem, huge `n`	`n` astronomically large	`O(p + log_p n)` (sibling `24`)

The decisive numbers: the O(N) precompute plus O(1) queries dominate everything once you have more than a handful of queries. The O(n²) Pascal table is only competitive when n is small and you need many (n, k) pairs or the modulus is composite.

import time

def bench_precompute(N):
    start = time.perf_counter()
    build(N)
    return time.perf_counter() - start

# Typical: N = 10^7 precomputes in well under a second in CPython.

Worked Examples¶

Example A — Why `n < p` matters (small prime)¶

Take p = 7. The factorial method is valid only while n < 7. Build the tables:

fact   = [1, 1, 2, 6, 24%7=3, 120%7=1, 720%7=6]      (indices 0..6)
       = [1, 1, 2, 6, 3, 1, 6]

Now C(6, 3) = 20, and 20 mod 7 = 6. Check the formula:

fact[6]·invFact[3]·invFact[3] mod 7

invFact[3] = (6)⁻¹ mod 7. Since 6·6 = 36 ≡ 1 (mod 7), invFact[3] = 6. So

6 · 6 · 6 = 216 ≡ 216 − 30·7 = 216 − 210 = 6 (mod 7).   ✓

It matches 20 mod 7 = 6. But the moment n = 7, fact[7] = 5040 ≡ 0 (mod 7) (because 7 | 7!), and the inverse of 0 does not exist — the method breaks, and you must switch to Lucas' theorem (sibling 24). For p = 10⁹+7 you essentially never hit n ≥ p in practice, which is why the straight method is the contest default.

Example B — A query trace mod 10⁹+7¶

Compute C(10, 4) mod (10⁹+7). The exact value is 210. With tables built to N ≥ 10:

C(10,4) = fact[10] · invFact[4] · invFact[6] mod p.

Each factor is a residue in [0, p). Because 210 < p, the result is exactly 210 — small binomials mod a big prime are just the binomials themselves. The modular machinery only "kicks in" (wraps around) once C(n,k) ≥ p.

Example C — Hockey-stick collapses a loop¶

Suppose a problem asks for ∑_{i=2}^{n} C(i, 2) for many n. Instead of an O(n) loop per query, hockey-stick gives a closed form: the sum is C(n+1, 3), an O(1) query after precompute. This is the practical payoff of knowing the identities — they turn O(n) sums into O(1) lookups.

Best Practices¶

Precompute once if you will answer many queries; never recompute factorials per query.
Compute all inverse factorials from one Fermat inverse via invFact[i−1] = invFact[i]·i.
Use Pascal DP for composite moduli — it needs no inverse and is correct mod anything.
Pick the method by n and modulus: small n or composite mod → DP; large n, prime mod → factorials.
Reduce mod p after every multiply to avoid overflow in Go/Java.
Verify identities against math.comb / a brute force on small inputs before relying on a derived formula.
Escalate to Lucas (24) or prime-power (33) the moment n exceeds your precompute range or the modulus is not prime.

Visual Animation¶

See animation.html for an interactive view.

The middle-level animation highlights: - Pascal's triangle filling via C(n,k)=C(n−1,k−1)+C(n−1,k), with the two parent cells highlighted. - The fact / invFact table being built — including the single Fermat-inverse seed and the downward sweep. - A live C(n,k) mod p query lighting up the three cells fact[n], invFact[k], invFact[n−k]. - Editable n, k, p so you can watch the same value appear in both the triangle and the factorial formula.

Summary¶

The modular method C(n,k) ≡ fact[n] · invFact[k] · invFact[n−k] (mod p) works because, under a prime p, every factorial has a modular inverse (Fermat: a⁻¹ ≡ a^(p−2)), and the inverse-factorial recurrence invFact[i−1] = invFact[i]·i computes all of them from a single inverse — giving O(N) setup and O(1) queries. When the modulus is composite or you need values mod anything, the division-free Pascal DP (O(n²), addition only) is the safe alternative. The four core identities — Pascal (the recurrence), symmetry (C(n,k)=C(n,n−k)), hockey-stick (∑C(i,r)=C(n+1,r+1)), and Vandermonde (C(m+n,r)=∑C(m,k)C(n,r−k)) — turn brute-force counting into closed forms and underlie Catalan numbers, stars-and-bars, and inclusion-exclusion (siblings 25, 28, 26). When n outgrows your precompute or p is not prime, hand off to Lucas' theorem (24) and factorial-mod-prime-power (33).

Next step: continue to senior.md for high-throughput query systems, precompute sizing, overflow engineering, and when Lucas / CRT become mandatory.

Binomial Coefficients C(n, k) — Middle Level¶

Table of Contents¶

Introduction¶

Deeper Concepts¶

Why the modular factorial method needs a prime¶

Why the inverse-factorial recurrence works¶

Pascal's identity — the DP recurrence¶

Multiplicative formula — integer exactness¶

Comparison with Alternatives¶

Method selection¶

Pascal DP vs factorial method — the decision¶

Advanced Patterns¶

Pattern: Linear inverse-factorial precompute¶

Go¶

Java¶

Python¶

Pattern: One-row Pascal in O(n) space (composite modulus safe)¶

Pattern: Single C(n, k) mod p without a table¶

The Identities Toolbox¶

1. Pascal's identity (recurrence + sum rule)¶

2. Symmetry¶

3. Hockey-stick identity (column sums)¶

4. Vandermonde's identity (convolution)¶

Other handy facts¶

Modular C(n,k): The Complete Recipe¶

Code Examples¶

Hockey-stick and Vandermonde verified against brute force¶

Python¶

Go¶

Java¶

Error Handling¶

Performance Analysis¶

Worked Examples¶

Example A — Why n < p matters (small prime)¶

Example B — A query trace mod 10⁹+7¶

Example C — Hockey-stick collapses a loop¶

Best Practices¶

Visual Animation¶

Summary¶

Example A — Why `n < p` matters (small prime)¶