Modular Arithmetic — Middle Level¶
Focus: The working algorithms you reach for daily — overflow-safe
mulmod, modular exponentiation, modular inverse by both Fermat (prime modulus) and extended Euclid (any coprime modulus), modular division, and precomputed factorials + inverse factorials fornCr mod pin O(1) per query. Plus a clear-eyed account of why prime moduli are so much friendlier than composite ones.
Table of Contents¶
- Introduction
- Deeper Concepts
- Overflow-Safe Multiplication
- Modular Exponentiation
- Modular Inverse: Fermat vs Extended Euclid
- Modular Division
- Binomial Coefficients nCr mod p
- Comparison with Alternatives
- Code Examples
- Error Handling
- Performance Analysis
- Best Practices
- Visual Animation
- Summary
Introduction¶
At junior level the message was: keep numbers in [0, m) and respect the three pitfalls (negatives, overflow, division). At middle level you assemble those primitives into the toolkit that real problems demand:
- How do I multiply two near-
mvalues whenmitself is near10^18and the product overflows 64 bits? - How do I raise a number to a huge power mod
mwithout computing the (astronomical) exact power? - How do I "divide" mod
m— and what is a modular inverse, when does it exist, and which of the two standard ways to compute it should I use? - How do I compute binomial coefficients
nCr mod pfornup to10^6and answer thousands of queries inO(1)each?
The unifying theme is the modular inverse: it is what makes division possible, what powers nCr mod p, and the place where the prime-vs-composite distinction bites hardest. We cover both standard inverse algorithms — Fermat's little theorem (a one-line modpow, but prime modulus only) and the extended Euclidean algorithm (works for any coprime modulus) — and explain exactly when to use each.
Deeper Concepts¶
Why a prime modulus is so friendly¶
When m = p is prime, every nonzero residue is coprime to p, so every nonzero element has an inverse. The structure Z/pZ is then a field (sibling professional.md proves this): you can add, subtract, multiply, and divide freely, exactly like rational arithmetic, with the single exception that you cannot divide by 0. This is why 10^9 + 7 and 998244353 are the canonical competitive-programming moduli — both are prime, so Fermat's little theorem gives inverses in one modpow and nothing ever lacks an inverse except 0.
When m is composite, Z/mZ is only a ring. The invertible elements (the units) are exactly those coprime to m, and there are φ(m) of them (Euler's totient). Any element sharing a factor with m — like 2 mod 4, or 6 mod 9 — has no inverse. You cannot divide by it, and Fermat's formula a^(m−2) is simply wrong (it relies on m being prime). For composite m you must use the extended Euclidean inverse, and even then only for coprime a.
Concrete contrast. Mod the prime 7: every nonzero element {1,2,3,4,5,6} has an inverse (1↔1, 2↔4, 3↔5, 6↔6), so any "fraction" mod 7 is well-defined. Mod the composite 8: only {1,3,5,7} (the odds, coprime to 8) are units — φ(8) = 4 — while 2, 4, 6 have no inverse. Trying 2⁻¹ mod 8 fails: no x gives 2x ≡ 1 (mod 8), because 2x is always even and 1 is odd. This is the single most important practical difference between prime and composite moduli, and the reason competitive problems almost always pick a prime.
The inverse as the engine of division¶
Modular division a / b (mod m) is defined as a · b⁻¹ (mod m), where b⁻¹ is the modular inverse of b. There is no other notion of division here — you never use the / operator on residues. Everything reduces to: can I invert b? (yes iff gcd(b, m) = 1), and how do I invert it? (Fermat for prime m, extended Euclid otherwise).
Fermat's little theorem (the prime shortcut)¶
For a prime p and any a with p ∤ a:
So the inverse is a single modular exponentiation, O(log p). Sibling 04-fermat-euler proves this; here we just use it. (The generalization to composite m is Euler's theorem: a^(φ(m)) ≡ 1 (mod m) for coprime a, giving a⁻¹ ≡ a^(φ(m)−1), but you need φ(m), i.e. m's factorization.)
Worked Fermat inverse. Invert 2 mod the prime p = 10^9 + 7. By Fermat, 2⁻¹ ≡ 2^(p−2) (mod p). There is a shortcut for inverting 2: since p is odd, (p+1)/2 is an integer and 2·(p+1)/2 = p+1 ≡ 1 (mod p), so 2⁻¹ = (p+1)/2 = 500 000 004. Check via modpow: 2^(p−2) mod p also yields 500 000 004, and 2·500 000 004 = 1 000 000 008 ≡ 1 (mod p). Both routes agree, illustrating that the Fermat formula and the closed form give the same unique inverse (a useful unit test).
Extended Euclid (the general method)¶
The extended Euclidean algorithm finds integers x, y with a·x + m·y = gcd(a, m). When gcd(a, m) = 1, reducing a·x ≡ 1 (mod m) shows x (normalized to [0, m)) is the inverse. This works for any modulus, prime or composite, as long as a is coprime to m. Full derivation in sibling 06-extended-euclidean-modular-inverse.
Worked extended-Euclid inverse. Invert a = 3 mod m = 26 (composite, so Fermat does not apply). Run the gcd tracking coefficients:
| step | equation | gcd-of |
|---|---|---|
| 1 | 26 = 8·3 + 2 | (26, 3) |
| 2 | 3 = 1·2 + 1 | (3, 2) |
| 3 | 2 = 2·1 + 0 | gcd = 1 |
Back-substitute: 1 = 3 − 1·2 = 3 − 1·(26 − 8·3) = 9·3 − 1·26. So 9·3 ≡ 1 (mod 26), and the inverse is 9. Check: 3·9 = 27 = 26 + 1 ≡ 1 (mod 26). Because 26 is composite, only extended Euclid (not Fermat) gives this — a concrete reason to keep the general algorithm in your toolkit.
Overflow-Safe Multiplication¶
mulmod(a, b, m) must return (a·b) mod m even when a·b exceeds the integer width. Three production-grade strategies:
__int128(C/C++ only):(__int128)a * b % m. One instruction-level 128-bit multiply, then reduce. Fastest and cleanest form < 2^63. Go and Java lack a native 128-bit int, so they use the alternatives below (ormath/bits.Mul64in Go,Math.multiplyHighin Java 9+).- Russian-peasant (binary)
mulmod: double-and-add.O(log b)additions modm, each safe because a sum of two values< mis< 2m < 2^63. Works in any language, for anym < 2^62. - Floating-point trick (
mulmodà la Schrage): computea*b - floor(a*b/m)*musinglong doubleto estimate the quotient. Fast but needs careful rounding analysis; used in some Miller-Rabin / Pollard-rho implementations (siblings08,09).
For the everyday m = 10^9 + 7, none of this is needed — a 64-bit (a*b) % m is exact because the product is < 10^18 < 2^63. The machinery matters when m approaches 10^18.
Worked overflow check. With m = 10^9 + 7, the largest operands are m − 1 ≈ 10^9, so (m−1)² ≈ 10^18, safely below 2^63 − 1 ≈ 9.2 × 10^18. A plain (a*b) % m is correct. Now take m ≈ 5 × 10^18: operands up to 5 × 10^18 give a product up to 2.5 × 10^37, which overflows 64-bit by 18+ orders of magnitude — the naive multiply silently wraps. Russian-peasant trace for (3·5) mod 7 shows the safe path: start result = 0, a = 3, b = 5 = 101₂; bit0=1 → result = 3, double a = 6; bit1=0 → skip, double a = 12 ≡ 5; bit2=1 → result = (3 + 5) % 7 = 1. And 15 mod 7 = 1 ✓. Every intermediate stayed below 2m.
Modular Exponentiation¶
Computing a^b mod m by multiplying a into an accumulator b times is O(b) — hopeless for b = 10^18. Binary exponentiation (square-and-multiply) does it in O(log b) by reading the bits of b:
powMod(a, b, m):
result = 1
a = a mod m
while b > 0:
if b is odd: result = (result · a) mod m
a = (a · a) mod m # square
b = b >> 1
return result
Each squaring doubles the exponent represented; multiplying in when a bit is set assembles a^b from a^(2^0), a^(2^1), …. This is the workhorse behind Fermat inverses and is treated fully in sibling 26-binary-exponentiation. Note: use a safe mulmod inside if m is large.
Worked modpow. Compute 2^10 mod 1000. Binary: 10 = 1010₂. Powers: 2^1=2, 2^2=4, 2^4=16, 2^8=256 (all mod 1000). Bits set at positions 1 and 3 (value 2 and 8): result = 2^2 · 2^8 = 4 · 256 = 1024 ≡ 24 (mod 1000). Indeed 2^10 = 1024, and 1024 mod 1000 = 24. The loop did 3 squarings and 2 multiplies — O(log b) — versus 10 multiplications for the naive approach. For b = 10^18 this is ~60 operations instead of a quintillion.
Modular Inverse: Fermat vs Extended Euclid¶
| Aspect | Fermat (a^(m−2)) | Extended Euclid |
|---|---|---|
| Modulus requirement | m must be prime | any m with gcd(a, m) = 1 |
| Method | one modpow, O(log m) | one gcd run, O(log m) |
| Code size | tiny (reuses powMod) | a bit longer (recursion/loop) |
| Fails when | m composite, or a ≡ 0 | gcd(a, m) ≠ 1 |
| Typical use | competitive nCr mod p, prime moduli | CRT, composite moduli, RSA key math |
| Constant factor | log m multiplications | ~log m divisions (often a touch faster) |
Decision rule: if the modulus is a known prime, Fermat is the one-liner. If the modulus is composite, unknown, or you cannot guarantee primality, use extended Euclid. For inverting many numbers 1..n mod a prime, neither beats the batch inverse-factorial trick below (O(n) total).
Modular Division¶
Two ways to get inv(b):
inv(b) = powMod(b, m−2, m)whenmis prime (Fermat).inv(b) = extgcd-based inversefor any coprimem.
Always guard b ≢ 0 (mod m) first — 0 has no inverse, and dividing by it is undefined. If b is non-coprime to a composite m, division is genuinely undefined (no unique answer); that is a modeling error, not something to paper over.
Worked modular division. Compute (7 / 3) mod 11. First 3⁻¹ mod 11: by Fermat 3⁻¹ = 3^9 mod 11 = 4 (since 3·4 = 12 ≡ 1). Then 7/3 ≡ 7·4 = 28 ≡ 6 (mod 11). Sanity: 6·3 = 18 ≡ 7 (mod 11), confirming 6 is indeed 7/3. Note that 7/3 is not an integer in ordinary arithmetic, yet modulo a prime it has a perfectly well-defined value — because every nonzero element of Z/11Z is invertible. This is the field property at work: "fractions" become concrete residues.
Linear inverses of 1..n in O(n)¶
When you need the inverses of all of 1, 2, …, n mod a prime p, neither Fermat-per-element (O(n log p)) nor extended-Euclid-per-element is optimal. A linear recurrence computes them all in O(n):
The recurrence comes from writing p = (p/i)·i + (p mod i), reducing mod p to 0 ≡ (p/i)·i + (p mod i), and multiplying through by inv[i]·inv[p mod i]. This is the batch trick behind fast inverse-factorial alternatives and any problem needing many small inverses at once — O(n) total instead of O(n log p).
Binomial Coefficients nCr mod p¶
The classic application that ties everything together. We want C(n, r) = n! / (r! (n−r)!) mod p for a prime p, possibly for many (n, r) pairs. You cannot divide factorials directly — you multiply by inverse factorials.
Precompute (once, O(n)):
fact[0] = 1
fact[i] = fact[i−1] · i mod p for i = 1..N
invfact[N] = powMod(fact[N], p−2, p) # one Fermat inverse
invfact[i] = invfact[i+1] · (i+1) mod p for i = N−1 .. 0
The clever step is computing invfact backwards from invfact[N] using the identity invfact[i] = invfact[i+1] · (i+1). This needs only one modular inverse for the whole table, not N of them — turning O(N log p) into O(N).
Why the backward identity holds. By definition invfact[i] = (i!)⁻¹ and invfact[i+1] = ((i+1)!)⁻¹. Since (i+1)! = (i+1)·i!, taking inverses gives ((i+1)!)⁻¹ = (i+1)⁻¹·(i!)⁻¹, i.e. invfact[i+1] = (i+1)⁻¹·invfact[i]. Multiplying both sides by (i+1): invfact[i+1]·(i+1) = invfact[i]. So each step multiplies by the integer (i+1) — no inverse needed after the first. This is the single most important micro-optimization in competitive combinatorics: it removes a factor of log p from the precompute and makes million-query nCr solutions feasible.
Each query (O(1)):
This is the standard combinatorics-mod-p engine; see also sibling 21-binomial-coefficients. (For n larger than the prime, you need Lucas' theorem — out of scope here, covered in sibling 21.)
Worked nCr trace. Compute C(5, 2) mod (10^9 + 7). Precomputed: fact[5] = 120, fact[2] = 2, fact[3] = 6. Inverse factorials: invfact[5] = 120⁻¹, and via the backward pass invfact[2] and invfact[3]. The query is fact[5]·invfact[2]·invfact[3] = 120·(2⁻¹)·(6⁻¹) mod p. Since 120/(2·6) = 120/12 = 10, the modular product collapses to 10 — matching C(5,2) = 10. Concretely, 2⁻¹ = 500 000 004, 6⁻¹ = 166 666 668, and 120·500000004·166666668 ≡ 10 (mod 10^9+7). The factorials and inverse factorials were each computed once; this query touched three array slots and did two multiplies — O(1).
Validation tip. Always cross-check nCr mod p against an exact big-integer comb(n, r) % p (Python math.comb, Java BigInteger) for small (n, r). A mismatch usually means a missing % p in the factorial loop, an off-by-one in the table size N, or a forgotten r > n → 0 guard.
Comparison with Alternatives¶
Inverting one number vs many¶
| Need | Best tool | Cost |
|---|---|---|
One inverse, prime m | Fermat a^(m−2) | O(log m) |
One inverse, composite coprime m | extended Euclid | O(log m) |
Inverses of 1..n, prime m | linear-sieve inverse recurrence inv[i] = −(m/i)·inv[m mod i] mod m | O(n) total |
Inverse factorials 0..n | one Fermat inverse + backward pass | O(n) total |
The "invert everything in O(n)" tricks (the inv[i] recurrence and the inverse-factorial backward pass) are the reason nCr mod p queries are O(1) after an O(n) precompute, rather than O(log p) per query.
Prime vs composite modulus¶
| Property | Prime p | Composite m |
|---|---|---|
| Every nonzero element invertible? | yes (field) | no (only the φ(m) units) |
| Inverse method | Fermat or extended Euclid | extended Euclid only (Fermat invalid) |
nCr mod m directly | yes | needs CRT over prime-power factors (sibling 05-crt) |
| Division by any nonzero | always defined | only when coprime to m |
Code Examples¶
Example: modpow, both inverses, division, and nCr mod p¶
Go¶
package main
import "fmt"
const MOD = 1_000_000_007
func powMod(a, b, m int64) int64 {
a %= m
if a < 0 {
a += m
}
var result int64 = 1
for b > 0 {
if b&1 == 1 {
result = result * a % m
}
a = a * a % m
b >>= 1
}
return result
}
// Fermat inverse: requires prime modulus.
func invFermat(a, p int64) int64 {
return powMod(a, p-2, p)
}
// Extended Euclid inverse: any coprime modulus. Returns inverse in [0, m).
func invEuclid(a, m int64) int64 {
g, x := extgcd(((a%m)+m)%m, m)
if g != 1 {
panic("no inverse: a and m not coprime")
}
return ((x % m) + m) % m
}
func extgcd(a, b int64) (g, x int64) {
if b == 0 {
return a, 1
}
g, x1 := extgcd(b, a%b)
return g, x1 - (a/b)*xHelper(b, a%b)
}
// helper to keep extgcd readable: recompute the y coefficient
func xHelper(a, b int64) int64 {
if b == 0 {
return 0
}
_, x1 := extgcd(b, a%b)
return x1
}
func divMod(a, b, m int64) int64 {
return a % m * invFermat(b, m) % m
}
func main() {
fmt.Println(powMod(3, 13, 7)) // 3
fmt.Println(invFermat(2, MOD)) // (MOD+1)/2
fmt.Println(invEuclid(3, 26)) // 9, since 3*9 = 27 ≡ 1 (mod 26)
fmt.Println(divMod(10, 2, MOD)) // 5
fmt.Println(nCr(10, 3)) // 120
}
const N = 1000
var fact, invfact [N + 1]int64
func initFact() {
fact[0] = 1
for i := 1; i <= N; i++ {
fact[i] = fact[i-1] * int64(i) % MOD
}
invfact[N] = invFermat(fact[N], MOD)
for i := N - 1; i >= 0; i-- {
invfact[i] = invfact[i+1] * int64(i+1) % MOD
}
}
func nCr(n, r int) int64 {
if fact[1] == 0 {
initFact()
}
if r < 0 || r > n {
return 0
}
return fact[n] * invfact[r] % MOD * invfact[n-r] % MOD
}
Java¶
public class ModToolkit {
static final long MOD = 1_000_000_007L;
static final int N = 1000;
static long[] fact = new long[N + 1], invfact = new long[N + 1];
static long powMod(long a, long b, long m) {
a %= m; if (a < 0) a += m;
long result = 1;
while (b > 0) {
if ((b & 1) == 1) result = result * a % m;
a = a * a % m;
b >>= 1;
}
return result;
}
static long invFermat(long a, long p) { return powMod(a, p - 2, p); }
// Extended Euclid: returns {gcd, x, y} with a*x + b*y = gcd.
static long[] extgcd(long a, long b) {
if (b == 0) return new long[]{a, 1, 0};
long[] r = extgcd(b, a % b);
return new long[]{r[0], r[2], r[1] - (a / b) * r[2]};
}
static long invEuclid(long a, long m) {
a = ((a % m) + m) % m;
long[] r = extgcd(a, m);
if (r[0] != 1) throw new ArithmeticException("no inverse");
return ((r[1] % m) + m) % m;
}
static long divMod(long a, long b, long m) {
return a % m * invFermat(b, m) % m;
}
static void initFact() {
fact[0] = 1;
for (int i = 1; i <= N; i++) fact[i] = fact[i - 1] * i % MOD;
invfact[N] = invFermat(fact[N], MOD);
for (int i = N - 1; i >= 0; i--) invfact[i] = invfact[i + 1] * (i + 1) % MOD;
}
static long nCr(int n, int r) {
if (r < 0 || r > n) return 0;
return fact[n] * invfact[r] % MOD * invfact[n - r] % MOD;
}
public static void main(String[] args) {
initFact();
System.out.println(powMod(3, 13, 7)); // 3
System.out.println(invEuclid(3, 26)); // 9
System.out.println(divMod(10, 2, MOD)); // 5
System.out.println(nCr(10, 3)); // 120
}
}
Python¶
MOD = 1_000_000_007
def pow_mod(a, b, m):
return pow(a % m, b, m) # Python's built-in 3-arg pow is binary exponentiation
def inv_fermat(a, p):
return pow(a, p - 2, p) # prime p only
def ext_gcd(a, b):
if b == 0:
return a, 1, 0
g, x1, y1 = ext_gcd(b, a % b)
return g, y1, x1 - (a // b) * y1
def inv_euclid(a, m):
a %= m
g, x, _ = ext_gcd(a, m)
if g != 1:
raise ValueError("no inverse: not coprime")
return x % m
def div_mod(a, b, m):
return a % m * inv_fermat(b, m) % m
N = 1000
fact = [1] * (N + 1)
for i in range(1, N + 1):
fact[i] = fact[i - 1] * i % MOD
invfact = [1] * (N + 1)
invfact[N] = inv_fermat(fact[N], MOD)
for i in range(N - 1, -1, -1):
invfact[i] = invfact[i + 1] * (i + 1) % MOD
def nCr(n, r):
if r < 0 or r > n:
return 0
return fact[n] * invfact[r] % MOD * invfact[n - r] % MOD
if __name__ == "__main__":
print(pow_mod(3, 13, 7)) # 3
print(inv_euclid(3, 26)) # 9
print(div_mod(10, 2, MOD)) # 5
print(nCr(10, 3)) # 120
Run: go run main.go | javac ModToolkit.java && java ModToolkit | python toolkit.py
Error Handling¶
| Scenario | What goes wrong | Correct approach |
|---|---|---|
invFermat on composite m | a^(m−2) is not the inverse. | Use extended Euclid for composite m. |
Inverting a with gcd(a, m) ≠ 1 | No inverse exists; extgcd returns g ≠ 1. | Check the gcd and raise/handle. |
nCr with r > n or r < 0 | Out-of-range factorial index. | Return 0 for invalid (n, r). |
powMod with negative base | a % m negative in Go/Java. | Normalize a to [0, m) first. |
mulmod overflow for big m | a*b exceeds 64 bits. | Use __int128 or Russian-peasant. |
Forgetting to precompute fact | nCr reads zeros. | Initialize the tables once at startup. |
Performance Analysis¶
| Operation | Cost | Notes |
|---|---|---|
powMod(a, b, m) | O(log b) mults | ~60 iterations for b = 10^18. |
| Fermat inverse | O(log m) | one powMod(a, m−2, m). |
| Extended Euclid inverse | O(log m) | often a slightly smaller constant. |
Precompute fact + invfact 0..N | O(N) | one inverse total (backward pass). |
Each nCr query | O(1) | three table lookups + two mults. |
Russian-peasant mulmod | O(log b) adds | only when __int128 unavailable and m large. |
The decisive optimization is the single-inverse factorial table: inverting fact[N] once and walking backwards gives all inverse factorials in O(N), so a million nCr queries cost a million O(1) lookups, not a million modpows.
# Quick sanity benchmark sketch
import time
def bench(num_queries=10**6):
start = time.perf_counter()
s = 0
for i in range(num_queries):
s = (s + nCr(1000, i % 1001)) % MOD
return time.perf_counter() - start
# Typically well under a second: each query is O(1).
Best Practices¶
- Pick a prime modulus when you control it — Fermat inverses and field structure simplify everything.
- One inverse for the whole factorial table — never invert each factorial separately.
- Guard coprimality before any inverse;
gcd ≠ 1means no inverse, full stop. - Use a safe
mulmod(or__int128) whenevermcan exceed ~3 × 10^9. - Reuse
powModfor Fermat inverses; keepmulmod,powMod,invas small tested units. - Validate against brute force on small inputs (compute
nCrexactly with big integers and compare modp). - Handle the
r > nandr < 0cases innCrby returning0, before indexing the factorial tables. - Normalize negative bases in
powMod(a %= m; if (a < 0) a += m;) so Go/Java do not feed a negative into the loop. - Prefer the
O(n)inverse recurrence over per-element modpow when you need many small inverses at once.
Common middle-level bugs and their tells¶
| Symptom | Likely cause | Fix |
|---|---|---|
nCr off by a huge amount | missing % p in factorial loop | reduce every multiply |
nCr returns 0 unexpectedly | tables not initialized | call initFact() once at startup |
inverse wrong for composite m | used Fermat on composite modulus | switch to extended Euclid |
mulmod wrong for big m | product overflowed 64-bit | use __int128 / Russian-peasant |
| negative result in Go/Java | base or subtraction went negative | normalize with ((x % m) + m) % m |
These five account for the overwhelming majority of middle-level modular bugs; the brute-force oracle catches every one. A good habit is to keep a tiny exact-arithmetic checker (Python pow(a, -1, m), math.comb) beside the modular code and assert agreement on random small inputs in tests.
Visual Animation¶
See
animation.htmlfor an interactive view.The middle-relevant parts of the animation show: - The square-and-multiply ladder of
a^b mod m, with each squaring and each "multiply-in" highlighted as the bits ofbare read. - The wrap-around of products on the number wheel, motivating whymulmodmust reduce as it goes. - Editablea,b,mso you can watch a Fermat inversea^(m−2)assemble step by step.
Summary¶
The middle-level modular toolkit is built on the modular inverse. Division is multiplication by an inverse; binomial coefficients are products of factorials and inverse factorials; and the inverse itself comes from either Fermat's little theorem (a^(m−2), a single O(log m) modpow, valid only for prime m) or the extended Euclidean algorithm (valid for any coprime m). Overflow in mulmod is handled by __int128 or a Russian-peasant loop; exponentiation is always binary (O(log b)). For nCr mod p, precompute factorials and — with a single inverse plus a backward pass — inverse factorials in O(n), then answer each query in O(1). The recurring lesson: prime moduli make Z/pZ a field where everything nonzero is invertible, while composite moduli leave non-invertible elements that break Fermat and forbid division.
Quick decision guide¶
- Need one inverse, prime modulus → Fermat
a^(p−2), onepowMod. - Need one inverse, composite/unknown modulus → extended Euclid (check
gcd = 1). - Need inverses of
1..nmod a prime → theO(n)linear recurrence. - Need
nCr mod pfor many queries → precomputefact+invfact(one inverse),O(1)per query. - Large modulus (≈
10^18) → overflow-safemulmod(__int128or Russian-peasant) inside every multiply. - Composite modulus, need division/
nCr→ factor and use CRT (senior level), not the prime recipe.
Internalize these six cases and you can route almost any middle-level modular task to the right tool without rederiving it each time.