Number-Theoretic Transform (NTT) — Junior Level¶
One-line summary: The NTT is the FFT done in a finite field
Z/pZinstead of with complex numbers. By replacing the complex root of unitye^{2πi/n}with a primitiven-th root of unityginsideZ/pZ(which only exists whenpis an "NTT-friendly" prime like998244353 = 119·2^23 + 1), you get exact integer convolution modpwith no floating-point rounding error at all.
Table of Contents¶
- Introduction
- Prerequisites
- Glossary
- Core Concepts
- Big-O Summary
- Real-World Analogies
- Pros & Cons
- Step-by-Step Walkthrough
- Code Examples
- Coding Patterns
- Error Handling
- Performance Tips
- Best Practices
- Edge Cases & Pitfalls
- Common Mistakes
- Cheat Sheet
- Visual Animation
- Summary
- Further Reading
Introduction¶
Suppose you have to multiply two polynomials, or equivalently compute the convolution of two integer arrays, and the answer must be exact modulo a prime such as 998244353. The schoolbook method costs O(n²): every coefficient of one polynomial multiplies every coefficient of the other. For n = 200000 that is 4·10^10 operations — far too slow.
The Fast Fourier Transform (FFT) speeds polynomial multiplication up to O(n log n) by evaluating both polynomials at the n-th complex roots of unity, multiplying the evaluations pointwise, and interpolating back. But classic FFT uses complex numbers (double floating point), so the final coefficients come out as values like 41999.9999998 that you must round. For large coefficients or large n, that rounding can flip the answer by 1 and silently corrupt an exact modular result.
The Number-Theoretic Transform (NTT) fixes this. It runs the exact same butterfly algorithm as FFT, but it works entirely inside the finite field Z/pZ — the integers 0, 1, …, p-1 with addition and multiplication taken mod p. The trick is that, for the right prime p, the field Z/pZ contains a number g that behaves exactly like the complex root of unity e^{2πi/n}: raising it to the n-th power gives 1, and its powers g^0, g^1, …, g^{n-1} are all distinct. Such a g is called a primitive n-th root of unity mod p.
Big idea: NTT = FFT where complex roots of unity are replaced by a primitive
n-th root of unity inZ/pZ. The arithmetic is exact integers modp, so there is zero rounding error.
This is the number-theory view of the transform. A complex-FFT topic already exists at 15-divide-and-conquer/05-fft, which derives the Cooley-Tukey butterfly in detail and mentions NTT briefly. We will not re-derive the butterfly here — we point you there for the shared recursive/iterative structure. This topic focuses on the number-theory questions that are unique to NTT: why you need a special prime, how you find the root, and how to get exact convolution for an arbitrary modulus by combining several NTT primes with the Chinese Remainder Theorem (CRT, sibling 05-crt and 15-garner-algorithm).
For finding the primitive root that NTT depends on, see sibling 12-primitive-root-discrete-root.
Prerequisites¶
Before reading this file you should be comfortable with:
- Modular arithmetic —
(a + b) mod p,(a · b) mod p, and modular inversea^{-1} mod p. See sibling02-modular-arithmetic. - Fast (binary) exponentiation — computing
g^e mod pinO(log e). See sibling26-binary-exponentiation. - Modular inverse via Fermat —
a^{-1} ≡ a^{p-2} (mod p)for primep. See sibling04-fermat-euler. - Polynomials and convolution —
c[k] = Σ_{i+j=k} a[i]·b[j]; multiplying two polynomials is convolving their coefficient arrays. - The FFT idea (evaluate → pointwise multiply → interpolate) — see
15-divide-and-conquer/05-fft. You do not need to memorize it, but you should know the three-step shape. - Big-O notation —
O(n log n)vsO(n²).
Optional but helpful:
- A glance at primitive roots (sibling
12) — the existence ofgis exactly a primitive-root statement. - Familiarity with CRT (sibling
05) for the multi-prime extension.
Glossary¶
| Term | Meaning |
|---|---|
| Convolution | c[k] = Σ_{i+j=k} a[i]·b[j]. Multiplying polynomials = convolving coefficient arrays. |
| DFT / FFT | Discrete Fourier Transform; FFT is the O(n log n) algorithm to compute it, using complex roots of unity. |
| NTT | Number-Theoretic Transform: the DFT computed in Z/pZ using a primitive root instead of complex numbers. |
Z/pZ | The finite field of integers {0, …, p-1} under addition and multiplication mod a prime p. |
| NTT-friendly prime | A prime of the form p = c·2^k + 1, so that Z/pZ contains a primitive 2^k-th root of unity. E.g. 998244353 = 119·2^23 + 1. |
Primitive root g of p | A generator: its powers g^0, …, g^{p-2} produce every nonzero residue mod p. |
Primitive n-th root of unity ω | A residue with ω^n ≡ 1 and ω^m ≢ 1 for 0 < m < n. Built as ω = g^{(p-1)/n} mod p. |
| Forward NTT | Evaluate a coefficient vector at the powers of ω: maps coefficients → point values. |
| Inverse NTT | The reverse map: point values → coefficients. Uses ω^{-1} and multiplies by n^{-1} mod p. |
| Butterfly | The size-2 combine step (u + ω·v, u − ω·v) shared by FFT and NTT (see 05-fft). |
| Pointwise product | Multiply two evaluation vectors entry-by-entry; this is where convolution becomes cheap. |
Core Concepts¶
1. Polynomial multiplication is convolution is the bottleneck¶
Multiplying A(x) = Σ a_i x^i by B(x) = Σ b_j x^j gives C(x) whose coefficient of x^k is c[k] = Σ_{i+j=k} a[i]·b[j]. That double sum is the convolution of arrays a and b. Done directly it is O(n²). NTT computes the same c[] in O(n log n).
2. The three-step transform recipe (shared with FFT)¶
Both FFT and NTT use the evaluate / multiply / interpolate strategy:
1. Forward transform a → â and b → b̂ (evaluate at the n roots of unity)
2. Pointwise multiply: ĉ[i] = â[i] · b̂[i] (cheap: O(n))
3. Inverse transform ĉ → c (interpolate back to coefficients)
The magic is that a product of polynomials in coefficient space becomes a pointwise product in evaluation space. The transform and its inverse cost O(n log n) each; the pointwise step is O(n).
3. Why complex roots, and what NTT swaps in¶
FFT evaluates at the complex n-th roots of unity ω = e^{2πi/n}. These satisfy two properties the algorithm relies on:
ω^n = 1(it is ann-th root of unity), andω^{n/2} = -1(the "halving" property that makes the divide-and-conquer butterfly work).
NTT needs a number with the same two properties but living in Z/pZ instead of C. That number is a primitive n-th root of unity mod p, written ω. Once you have it, every line of the FFT becomes a modular operation, and the output is exact integers — no double, no rounding.
4. NTT-friendly primes: p = c·2^k + 1¶
A primitive n-th root of unity mod p exists only if n divides p − 1 (proved in professional.md; it follows from the cyclic structure of Z/pZ*). For NTT we want n to be a power of two (so the butterfly can halve repeatedly), so we need 2^k | p − 1 for k as large as we will ever need. Primes of the form
have exactly this: p − 1 = c · 2^k, so every power of two up to 2^k divides p − 1. These are the NTT-friendly primes. The most famous is:
Its primitive root is g = 3. From g you build the n-th root: ω = g^{(p-1)/n} mod p.
5. Building ω from the primitive root g¶
If g is a primitive root of p (a generator of Z/pZ*) and n | p − 1, then
is a primitive n-th root of unity. Quick check: ω^n = g^{(p-1)} ≡ 1 by Fermat's little theorem, and ω has order exactly n. For the inverse transform you also need ω^{-1} = g^{(p-1) − (p-1)/n} mod p, or just pow(ω, p-2, p) by Fermat.
6. Inverse NTT: same butterfly, then divide by n¶
The inverse transform is the forward transform run with ω^{-1} instead of ω, followed by multiplying every entry by n^{-1} mod p. That single scaling by n^{-1} is the modular analogue of FFT's "divide by n". Because n is a power of two and p is odd, n^{-1} mod p always exists.
Big-O Summary¶
| Operation | Time | Space | Notes |
|---|---|---|---|
| Schoolbook convolution | O(n²) | O(n) | The baseline NTT beats. |
Forward NTT (size n) | O(n log n) | O(n) | Same butterfly as FFT. |
Inverse NTT (size n) | O(n log n) | O(n) | Forward with ω^{-1}, then scale by n^{-1}. |
| Pointwise multiply | O(n) | O(n) | Entry-by-entry product mod p. |
Full polynomial multiply mod p | O(n log n) | O(n) | Two forward + pointwise + one inverse. |
Multiply for arbitrary modulus M | O(n log n) | O(n) | 3 NTT primes + CRT (see middle.md). |
Building ω from g | O(log p) | O(1) | One modular exponentiation. |
The headline number is O(n log n) for exact modular convolution, with the input size n rounded up to the next power of two that is at least len(a) + len(b) − 1.
Real-World Analogies¶
| Concept | Analogy |
|---|---|
| Complex roots of unity (FFT) | Measuring a sound with continuous, analog tuning forks — accurate but with measurement noise (rounding). |
Primitive root in Z/pZ (NTT) | A set of digital tuning forks that snap exactly to integer pitches — no measurement noise at all. |
| NTT-friendly prime | A specially manufactured ruler whose markings line up perfectly with the lengths you need (2^k divisions). |
| Evaluate → multiply → interpolate | Converting two recipes to "ingredient totals", adding them cheaply, then converting back to a recipe. |
| Pointwise product | In evaluation space the polynomials don't tangle: each sample point multiplies only its own counterpart. |
| Multi-prime + CRT | Three witnesses each see the answer "mod their own prime"; CRT cross-examines them to reconstruct the true number. |
Where the analogy breaks: tuning forks measure frequency; the NTT does not actually measure anything physical — it is a pure algebraic change of basis that happens to make multiplication cheap.
Pros & Cons¶
| Pros | Cons |
|---|---|
Exact: integer arithmetic mod p, zero floating-point rounding error. | Requires an NTT-friendly prime; you cannot pick an arbitrary modulus directly. |
Same O(n log n) speed as FFT, same butterfly structure. | The transform length n must be a power of two dividing p − 1. |
| No precision worries even for large coefficients (the killer FFT failure mode). | Modular multiplication is a bit slower per operation than double multiply. |
| Works for any modulus via multiple primes + CRT. | Arbitrary-modulus convolution needs 3 transforms + CRT — more code, more constants. |
| Deterministic, reproducible across machines (no FP nondeterminism). | You must know (p, g) pairs or compute the primitive root (sibling 12). |
When to use: exact convolution / polynomial multiply modulo a prime (especially 998244353 or 10^9 + 7 via multi-prime), big-integer multiplication, counting problems that reduce to convolution, string matching via convolution.
When NOT to use: you only need an approximate real-valued convolution (FFT with double may be simpler), n is tiny (schoolbook O(n²) wins), or your modulus is fixed and not NTT-friendly and you cannot afford the multi-prime overhead (then bluestein/FFT-with-splitting may be considered — see senior.md).
Step-by-Step Walkthrough¶
Let us multiply two tiny polynomials modulo a small NTT-friendly prime so every number stays readable.
Pick the prime p = 17. Note 17 − 1 = 16 = 2^4, so 17 is NTT-friendly and supports transform sizes up to n = 16. A primitive root of 17 is g = 3.
We multiply A(x) = 1 + 2x and B(x) = 3 + 4x. The exact product is
so the answer we expect is c = [3, 10, 8] (and 8 mod 17 = 8, all small).
Step 1 — choose the transform size n. The result has 2 + 2 − 1 = 3 coefficients, and n must be a power of two ≥ 3, so n = 4. Pad both inputs to length 4:
Step 2 — build the root. A primitive n = 4-th root of unity is
Check: ω^4 = 13^4 mod 17. 13^2 = 169 = 169 − 9·17 = 169 − 153 = 16 ≡ −1, and (−1)^2 = 1. Good — ω = 13 has order 4, and ω^{n/2} = ω^2 = 16 ≡ −1, the halving property NTT needs.
Step 3 — forward NTT of a and b. The forward transform evaluates the polynomial at ω^0, ω^1, ω^2, ω^3 = 1, 13, 16, 4. For a = [1,2,0,0], â[t] = a[0] + a[1]·ω^t = 1 + 2·ω^t:
For b = [3,4,0,0], b̂[t] = 3 + 4·ω^t:
b̂[0] = 3 + 4·1 = 7
b̂[1] = 3 + 4·13 = 55 ≡ 4
b̂[2] = 3 + 4·16 = 67 ≡ 67 − 51 = 16
b̂[3] = 3 + 4·4 = 19 ≡ 2
Step 4 — pointwise multiply (mod 17).
ĉ[0] = 3·7 = 21 ≡ 4
ĉ[1] = 10·4 = 40 ≡ 6
ĉ[2] = 16·16= 256 ≡ 256 − 15·17 = 256 − 255 = 1
ĉ[3] = 9·2 = 18 ≡ 1
Step 5 — inverse NTT of ĉ. Use ω^{-1}. Since ω = 13 and ω^4 = 1, ω^{-1} = ω^3 = 4. The inverse transform evaluates ĉ at powers of ω^{-1} and then divides by n = 4, i.e. multiplies by 4^{-1} mod 17. Because 4·13 = 52 = 51 + 1 ≡ 1, we have 4^{-1} = 13.
Evaluating Ĉ(y) = Σ ĉ[t]·y^t at y = (ω^{-1})^s = 4^s for s = 0,1,2,3, then scaling by 13, recovers exactly:
Dropping the trailing zero gives c = [3, 10, 8] — matching the hand product 3 + 10x + 8x². The whole pipeline used only integers mod 17; nothing rounded.
That is the entire mechanism: pad to a power of two, forward-transform both, multiply pointwise, inverse-transform, scale by n^{-1}.
Code Examples¶
Example: Polynomial multiplication mod 998244353 (recursive NTT for clarity)¶
This junior version uses a simple recursive transform so the structure is obvious. The iterative, in-place version (faster, the one you ship) is in middle.md. The butterfly itself is explained in 15-divide-and-conquer/05-fft.
Go¶
package main
import "fmt"
const MOD = 998244353 // = 119 * 2^23 + 1
const G = 3 // primitive root of MOD
func power(a, e int64) int64 {
a %= MOD
r := int64(1)
for e > 0 {
if e&1 == 1 {
r = r * a % MOD
}
a = a * a % MOD
e >>= 1
}
return r
}
func inv(a int64) int64 { return power(a, MOD-2) } // Fermat inverse
// ntt transforms p in place; invert=true does the inverse transform.
func ntt(p []int64, invert bool) {
n := len(p)
if n == 1 {
return
}
even := make([]int64, n/2)
odd := make([]int64, n/2)
for i := 0; i < n/2; i++ {
even[i] = p[2*i]
odd[i] = p[2*i+1]
}
ntt(even, invert)
ntt(odd, invert)
// primitive n-th root (or its inverse)
w := power(G, (MOD-1)/int64(n))
if invert {
w = inv(w)
}
cur := int64(1)
for i := 0; i < n/2; i++ {
t := cur * odd[i] % MOD
p[i] = (even[i] + t) % MOD
p[i+n/2] = (even[i] - t + MOD) % MOD
cur = cur * w % MOD
}
}
func multiply(a, b []int64) []int64 {
need := len(a) + len(b) - 1
n := 1
for n < need {
n <<= 1
}
fa := make([]int64, n)
fb := make([]int64, n)
copy(fa, a)
copy(fb, b)
ntt(fa, false)
ntt(fb, false)
for i := 0; i < n; i++ {
fa[i] = fa[i] * fb[i] % MOD
}
ntt(fa, true)
ninv := inv(int64(n))
for i := 0; i < n; i++ {
fa[i] = fa[i] * ninv % MOD
}
return fa[:need]
}
func main() {
a := []int64{1, 2} // 1 + 2x
b := []int64{3, 4} // 3 + 4x
c := multiply(a, b)
fmt.Println(c) // [3 10 8] = 3 + 10x + 8x^2
}
Java¶
import java.util.*;
public class NttJunior {
static final long MOD = 998244353L; // 119 * 2^23 + 1
static final long G = 3; // primitive root
static long power(long a, long e) {
a %= MOD;
long r = 1;
while (e > 0) {
if ((e & 1) == 1) r = r * a % MOD;
a = a * a % MOD;
e >>= 1;
}
return r;
}
static long inv(long a) { return power(a, MOD - 2); }
static void ntt(long[] p, boolean invert) {
int n = p.length;
if (n == 1) return;
long[] even = new long[n / 2], odd = new long[n / 2];
for (int i = 0; i < n / 2; i++) {
even[i] = p[2 * i];
odd[i] = p[2 * i + 1];
}
ntt(even, invert);
ntt(odd, invert);
long w = power(G, (MOD - 1) / n);
if (invert) w = inv(w);
long cur = 1;
for (int i = 0; i < n / 2; i++) {
long t = cur * odd[i] % MOD;
p[i] = (even[i] + t) % MOD;
p[i + n / 2] = (even[i] - t + MOD) % MOD;
cur = cur * w % MOD;
}
}
static long[] multiply(long[] a, long[] b) {
int need = a.length + b.length - 1;
int n = 1;
while (n < need) n <<= 1;
long[] fa = Arrays.copyOf(a, n), fb = Arrays.copyOf(b, n);
ntt(fa, false);
ntt(fb, false);
for (int i = 0; i < n; i++) fa[i] = fa[i] * fb[i] % MOD;
ntt(fa, true);
long ninv = inv(n);
for (int i = 0; i < n; i++) fa[i] = fa[i] * ninv % MOD;
return Arrays.copyOf(fa, need);
}
public static void main(String[] args) {
long[] a = {1, 2}, b = {3, 4};
System.out.println(Arrays.toString(multiply(a, b))); // [3, 10, 8]
}
}
Python¶
MOD = 998244353 # 119 * 2**23 + 1
G = 3 # primitive root
def power(a, e):
a %= MOD
r = 1
while e > 0:
if e & 1:
r = r * a % MOD
a = a * a % MOD
e >>= 1
return r
def inv(a):
return power(a, MOD - 2) # Fermat inverse
def ntt(p, invert):
n = len(p)
if n == 1:
return p
even = ntt(p[0::2], invert)
odd = ntt(p[1::2], invert)
w = power(G, (MOD - 1) // n)
if invert:
w = inv(w)
out = [0] * n
cur = 1
for i in range(n // 2):
t = cur * odd[i] % MOD
out[i] = (even[i] + t) % MOD
out[i + n // 2] = (even[i] - t) % MOD
cur = cur * w % MOD
return out
def multiply(a, b):
need = len(a) + len(b) - 1
n = 1
while n < need:
n <<= 1
fa = a + [0] * (n - len(a))
fb = b + [0] * (n - len(b))
fa = ntt(fa, False)
fb = ntt(fb, False)
fc = [x * y % MOD for x, y in zip(fa, fb)]
fc = ntt(fc, True)
ninv = inv(n)
return [x * ninv % MOD for x in fc[:need]]
if __name__ == "__main__":
print(multiply([1, 2], [3, 4])) # [3, 10, 8]
What it does: multiplies 1 + 2x by 3 + 4x and prints [3, 10, 8], the exact coefficients of 3 + 10x + 8x², all mod 998244353. Run: go run main.go | javac NttJunior.java && java NttJunior | python ntt.py
Coding Patterns¶
Pattern 1: Schoolbook oracle (test NTT against this)¶
Intent: Before trusting NTT, convolve the slow obvious way on small inputs and compare.
def conv_bruteforce(a, b, mod):
c = [0] * (len(a) + len(b) - 1)
for i, ai in enumerate(a):
for j, bj in enumerate(b):
c[i + j] = (c[i + j] + ai * bj) % mod
return c
This is O(n²), too slow for big n, but a perfect correctness oracle. NTT must match it on every random small case.
Pattern 2: Pad-to-power-of-two helper¶
Intent: The transform size must be a power of two ≥ len(a)+len(b)-1. Always compute it the same way.
Pattern 3: Build the root from (p, g)¶
Intent: Get a primitive n-th root and its inverse from the prime and its primitive root.
Error Handling¶
| Error | Cause | Fix |
|---|---|---|
| Garbage / wrong coefficients | Used a non-NTT-friendly prime (no 2^k-th root). | Use p = c·2^k + 1 with 2^k ≥ n; e.g. 998244353. |
| Transform of wrong length | n not a power of two, or smaller than len(a)+len(b)-1. | Pad up to the next power of two ≥ the result length. |
Off-by-n (all entries n× too big) | Forgot to multiply by n^{-1} after the inverse transform. | Scale every entry by inv(n) exactly once. |
| Negative values | even[i] − t went negative before % p. | Add MOD then % MOD. |
| Wrong root used in inverse | Reused ω instead of ω^{-1}. | Inverse transform uses ω^{-1} (or pow(ω, p-2)). |
Overflow in a*b | Product of two near-p values overflows 32-bit. | Use 64-bit (int64/long) before % MOD. |
Performance Tips¶
- Round
nup to a power of two, not just any value — the butterfly needs powers of two. - Precompute roots
ω^0, ω^1, …once per size rather than re-exponentiating inside the loop. - Use the iterative in-place NTT (with bit-reversal) for real workloads; the recursive version here is for understanding only. See
middle.md. - Reuse buffers: forward-transform
bonce if you multiply many polynomials by the sameb. - Skip a transform: to square a polynomial, transform once and pointwise-square, saving one forward NTT.
Best Practices¶
- Always state
(MOD, G)once as named constants; never sprinkle magic primes. - Test NTT against the schoolbook oracle (Pattern 1) on random small arrays before trusting it.
- Confirm
2^k ≥ nfor your chosen prime and your largest input —998244353caps atn = 2^23. - Keep
multiply,ntt, andpower/invas small, separately testable functions. - Document whether coefficients are already reduced mod
pon input; normalize them if not.
Edge Cases & Pitfalls¶
- Empty input — multiplying by an empty array should return empty; guard
len == 0. n = 1— the transform of a single element is itself; the recursion base case.- Result length — it is exactly
len(a) + len(b) − 1; trim the padded zeros at the end. - Coefficients larger than
p— reduce inputs modpfirst, or they corrupt the transform. - Prime too small for
n— if2^k < n, then-th root does not exist; you get nonsense. Use a bigger prime. - Arbitrary modulus — if the required modulus is not NTT-friendly (e.g.
10^9 + 7), you cannot NTT directly; use the multi-prime + CRT method (middle.md). - Floating point — never mix
doubleinto NTT; the whole point is exact integers.
Common Mistakes¶
- Picking a random prime instead of an NTT-friendly one — there is no
2^k-th root, so the transform is meaningless. - Forgetting the
n^{-1}scaling after the inverse transform — every coefficient comes outntimes too large. - Not padding to a power of two — the butterfly assumes power-of-two length.
- Padding too short —
n < len(a)+len(b)-1causes wraparound (cyclic convolution), giving wrong coefficients. - Using
ω(notω^{-1}) in the inverse — produces a scrambled result. - 32-bit overflow —
a*bfor residues near998244353overflowsint; use 64-bit before% MOD. - Confusing the prime's primitive root
gwith then-th rootω—ω = g^{(p-1)/n}, notgitself.
Cheat Sheet¶
| Question | Tool | Formula |
|---|---|---|
Multiply polynomials mod prime p | NTT | forward, pointwise, inverse |
Build n-th root of unity | from primitive root g | ω = g^{(p-1)/n} mod p |
| Inverse root | Fermat or g | ω^{-1} = pow(ω, p-2) |
| Scale after inverse | n^{-1} | multiply all entries by pow(n, p-2) |
| Transform size | next power of two | n ≥ len(a)+len(b)-1, n = 2^t |
| Common NTT prime / root | constants | p = 998244353, g = 3 |
| Arbitrary modulus | multi-prime + CRT | see middle.md, 15-garner-algorithm |
Core algorithm:
multiply(a, b, p, g):
n = next_pow2(len(a) + len(b) - 1)
pad a, b to length n
NTT(a); NTT(b) # forward, root ω = g^((p-1)/n)
c[i] = a[i] * b[i] mod p # pointwise
INTT(c) # inverse, root ω^-1
c[i] = c[i] * inv(n) mod p # scale
return c[0 : len(a)+len(b)-1]
# cost: O(n log n), exact integers mod p
Visual Animation¶
See
animation.htmlfor an interactive visualization.The animation demonstrates: - Bit-reversal permutation of the input over a small NTT-friendly prime - Each butterfly stage combining pairs with powers of the root
ω- The pointwise product of the two transformed vectors - The inverse transform plus then^{-1}scaling that recovers exact coefficients - Play / Pause / Step controls and editable input arrays
Summary¶
The Number-Theoretic Transform is the FFT performed in a finite field Z/pZ rather than with complex numbers. By replacing the complex root of unity with a primitive n-th root of unity built from a prime's primitive root (ω = g^{(p-1)/n} mod p), NTT computes exact convolution modulo p in O(n log n) with no floating-point rounding. The root only exists when n | p − 1, which is why you need an NTT-friendly prime p = c·2^k + 1 such as 998244353 = 119·2^23 + 1 with primitive root g = 3. The recipe is exactly FFT's: pad to a power of two, forward-transform both inputs, multiply pointwise, inverse-transform, and scale by n^{-1}. For an arbitrary (non-friendly) modulus, run NTT under several friendly primes and recombine with CRT — the subject of middle.md.
Further Reading¶
- Sibling
15-divide-and-conquer/05-fft— the Cooley-Tukey butterfly (shared structure; we did not re-derive it here). - Sibling
12-primitive-root-discrete-root— finding the primitive rootgthat NTT depends on. - Sibling
02-modular-arithmetic,04-fermat-euler,26-binary-exponentiation— the modular toolkit. - Sibling
05-crtand15-garner-algorithm— combining multiple NTT primes for an arbitrary modulus. - cp-algorithms.com — "Number Theoretic Transform" and "Fast Fourier Transform".
- Modern Computer Algebra (von zur Gathen & Gerhard) — fast multiplication and finite-field transforms.