Extended Euclidean Algorithm & Modular Inverse — Mathematical Foundations and Complexity Theory¶

Table of Contents¶

Formal Definitions
Bezout's Identity (Proof)
The Extended Euclidean Algorithm (Correctness by Induction)
Modular Inverse Existence (iff gcd = 1)
Uniqueness and the Group of Units
The Linear Congruence a·x ≡ b (mod m)
The O(n) Inverse-Table Derivation
Fermat, Euler, and the Order of an Element
Complexity: The O(log) Bound, Binary GCD, Half-GCD
Bezout, CRT, Continued Fractions, and Applications
Summary

1. Formal Definitions¶

Let ℤ be the integers and m ≥ 1 an integer modulus.

Definition 1.1 (Divisibility). For integers d, a, write d | a ("d divides a") if a = d·k for some integer k.

Definition 1.2 (Greatest common divisor). For integers a, b not both zero, gcd(a, b) is the largest integer d with d | a and d | b. By convention gcd(a, 0) = |a| and gcd(0, 0) = 0.

Definition 1.3 (Coprime). Integers a, b are coprime (relatively prime) if gcd(a, b) = 1.

Definition 1.4 (Congruence). a ≡ b (mod m) means m | (a - b). The residue classes mod m form the ring ℤ_m = {0, 1, …, m-1} under addition and multiplication mod m.

Definition 1.5 (Modular inverse). An element a ∈ ℤ_m has a multiplicative inverse a⁻¹ if there exists x ∈ ℤ_m with a·x ≡ 1 (mod m). When it exists it is unique (Theorem 5.1) and denoted a⁻¹.

Definition 1.6 (Bezout coefficients). Integers x, y with a·x + b·y = gcd(a, b) are Bezout coefficients. They are not unique (Section 5).

Definition 1.7 (Linear congruence). An equation a·x ≡ b (mod m) in the unknown x, with given a, b ∈ ℤ and modulus m ≥ 1. Its solution set (Section 6) is either empty or a coset of a subgroup of (ℤ_m, +).

Notation conventions. Throughout, g = gcd(a, m); ⌊·⌋ is floor; q_k, r_k are the quotients and remainders of the Euclidean sequence; φ is Euler's totient; (ℤ_m)^× is the group of units (invertible residues) of ℤ_m; ord(a) is the multiplicative order of a; and [P] is the Iverson bracket (1 if P holds, else 0). "Bezout pair" means a specific (x, y) solving a·x + b·y = g.

2. Bezout's Identity (Proof)¶

Theorem 2.1 (Bezout's identity). For integers a, b not both zero, there exist integers x, y with

a·x + b·y = gcd(a, b).

Moreover gcd(a, b) is the smallest positive integer expressible as a·x + b·y.

Proof. Consider the set S = { a·u + b·v : u, v ∈ ℤ, a·u + b·v > 0 }. S is nonempty (e.g., a·a + b·b > 0 since not both are zero, or take signs appropriately). By the well-ordering principle, S has a least element d = a·x + b·y > 0. We claim d = gcd(a, b).

d divides both a and b. Divide a by d: a = q·d + r with 0 ≤ r < d. Then

r = a - q·d = a - q(a·x + b·y) = a·(1 - q·x) + b·(-q·y),

so r is an integer combination of a and b. If r > 0, then r ∈ S and r < d, contradicting minimality of d. Hence r = 0, so d | a. Symmetrically d | b. Thus d is a common divisor.

d is the greatest common divisor. Let c be any common divisor of a and b. Then c | (a·x + b·y) = d, so c ≤ d. Therefore d is the greatest common divisor, and d = a·x + b·y. ∎

Corollary 2.2. An integer n is expressible as a·x + b·y for some integers x, y iff gcd(a, b) | n. (The set of all such combinations is exactly the set of multiples of gcd(a, b) — the ideal (a, b) = (gcd(a, b)) in ℤ.)

Proof of Corollary 2.2. If n = a·x + b·y, then since g = gcd(a, b) divides both a and b, it divides the combination, so g | n. Conversely, if g | n, write n = g·t; by Theorem 2.1 there are x₀, y₀ with a·x₀ + b·y₀ = g, so a·(t·x₀) + b·(t·y₀) = g·t = n. ∎

Remark. Bezout's identity is the existence statement; the extended Euclidean algorithm (Section 3) is the constructive proof, computing one such (x, y) explicitly in O(log) steps. The non-constructive well-ordering proof above guarantees existence; the algorithm exhibits a witness.

Ideal-theoretic restatement. In the language of ring theory, the set (a, b) = {a·u + b·v : u, v ∈ ℤ} is the ideal generated by a and b in the principal ideal domain ℤ. Bezout's theorem says this ideal equals (g), the principal ideal generated by g = gcd(a, b). That ℤ is a PID — every ideal is principal — is precisely why every two-generator ideal collapses to a single generator, the gcd. The extended Euclidean algorithm is the constructive proof that ℤ is a Euclidean domain (it has a division algorithm), and Euclidean domains are PIDs.

Worked Bezout. a = 1234, b = 54. Forward: gcd(1234, 54) = gcd(54, 46) = gcd(46, 8) = gcd(8, 6) = gcd(6, 2) = 2 (quotients 22, 1, 5, 1, 3). Running the iterative recurrence of Section 3 yields 1234·x + 54·y = 2 with x = -7, y = 160: check 1234·(-7) + 54·160 = -8638 + 8640 = 2 ✓. The coefficient x = -7 already satisfies the bound |x| ≤ b/(2g) = 54/4 = 13.5 of Theorem 5.5. The lesson: Bezout coefficients are easy to verify by substitution but should be computed by the algorithm, never guessed — the recurrence mechanically produces the minimal witness.

3. The Extended Euclidean Algorithm (Correctness by Induction)¶

Algorithm (recursive).

EXTGCD(a, b):
  if b == 0: return (a, 1, 0)            # a·1 + 0·0 = a
  (g, x1, y1) = EXTGCD(b, a mod b)
  return (g, y1, x1 - ⌊a/b⌋·y1)

Theorem 3.1 (Correctness). EXTGCD(a, b) returns (g, x, y) with g = gcd(a, b) and a·x + b·y = g, for all integers a ≥ 0, b ≥ 0 not both zero.

Proof (strong induction on b).

Base case b = 0. Returns (a, 1, 0). Indeed gcd(a, 0) = a and a·1 + 0·0 = a. ✓

Inductive step b > 0. Let q = ⌊a/b⌋ and r = a mod b = a - q·b, with 0 ≤ r < b. By the inductive hypothesis applied to (b, r) (note r < b), the recursive call returns (g, x₁, y₁) with g = gcd(b, r) and

b·x₁ + r·y₁ = g.

First, gcd(a, b) = gcd(b, a mod b) = gcd(b, r) = g (the Euclidean identity), so the returned g is correct. Now substitute r = a - q·b:

g = b·x₁ + (a - q·b)·y₁ = a·y₁ + b·(x₁ - q·y₁).

Thus setting x = y₁ and y = x₁ - q·y₁ gives a·x + b·y = g, which is exactly what the algorithm returns. By induction the theorem holds. ∎

Theorem 3.2 (Iterative form correctness). The iterative algorithm maintaining triples (r, s, t) with the invariant r = s·a + t·b, updated by (r, s, t) ← (r_{prev} - q·r, s_{prev} - q·s, t_{prev} - q·t), terminates with the previous row equal to (g, x, y).

Proof. Initialize row 0 as (a, 1, 0) and row 1 as (b, 0, 1); both satisfy r = s·a + t·b (a = 1·a + 0·b, b = 0·a + 1·b). The update is a linear combination of two rows each satisfying the invariant, so the new row satisfies it too (the invariant is preserved by the linear update because r_{new} = r_{prev} - q·r = (s_{prev} - q·s)a + (t_{prev} - q·t)b). The r column is exactly the Euclidean remainder sequence, which reaches 0; the row before the zero holds r = g with its (s, t) = (x, y). ∎

Worked verification. EXTGCD(240, 46):

EXTGCD(240,46): q=5, r=10  → EXTGCD(46,10)
EXTGCD(46,10):  q=4, r=6   → EXTGCD(10,6)
EXTGCD(10,6):   q=1, r=4   → EXTGCD(6,4)
EXTGCD(6,4):    q=1, r=2   → EXTGCD(4,2)
EXTGCD(4,2):    q=2, r=0   → EXTGCD(2,0) → (2,1,0)
climb: (4,2):(2,0,1) (6,4):(2,1,-1) (10,6):(2,-1,2) (46,10):(2,2,-9) (240,46):(2,-9,47)

Check the top: 240·(-9) + 46·47 = -2160 + 2162 = 2 = gcd(240, 46). ✓

3.1 Iterative Trace with the Invariant¶

The iterative algorithm (Theorem 3.2) on (a, b) = (240, 46), showing every (r, s, t) row with r = 240s + 46t:

row  r     s     t      check 240·s + 46·t
 0   240   1     0      240·1  + 46·0   = 240  ✓
 1   46    0     1      240·0  + 46·1   = 46   ✓     q = 240/46 = 5
 2   10    1    -5      240·1  + 46·(-5)= 240-230 = 10  ✓   q = 46/10 = 4
 3   6    -4    21      240·(-4)+46·21  = -960+966 = 6  ✓   q = 10/6 = 1
 4   4     5   -26      240·5  +46·(-26)= 1200-1196= 4  ✓   q = 6/4 = 1
 5   2    -9    47      240·(-9)+46·47  = -2160+2162= 2  ✓   q = 4/2 = 2
 6   0    23  -120      (remainder 0 — stop; previous row is the answer)

Row 5 holds (g, x, y) = (2, -9, 47), matching the recursive trace. The invariant r = 240s + 46t holds on every row, which is the inductive content of Theorem 3.2: each row is a fixed linear combination of a and b, and the subtraction step preserves that. Note the t column (0, 1, -5, 21, -26, 47) is (up to sign) the denominator-convergent sequence of the continued fraction of 240/46 — the connection made precise in Section 10.3.

4. Modular Inverse Existence (iff gcd = 1)¶

Theorem 4.1 (Existence criterion). For a ∈ ℤ_m, the inverse a⁻¹ exists if and only if gcd(a, m) = 1.

Proof.

(⇐) Coprime ⟹ inverse exists. If gcd(a, m) = 1, Bezout (Theorem 2.1) gives integers x, y with a·x + m·y = 1. Reduce mod m: since m·y ≡ 0 (mod m),

a·x ≡ 1 (mod m),

so x mod m is an inverse of a. The extended Euclidean algorithm constructs this x in O(log m).

(⇒) Inverse exists ⟹ coprime. Suppose a·x ≡ 1 (mod m), i.e. a·x - 1 = m·k for some integer k, so a·x - m·k = 1. Let d = gcd(a, m). Then d | a and d | m, hence d | (a·x - m·k) = 1, forcing d = 1. Therefore a and m are coprime. ∎

Corollary 4.2. 0 is never invertible mod m > 1, since gcd(0, m) = m ≠ 1. More generally, any a sharing a prime factor with m is non-invertible.

Corollary 4.3 (Prime modulus). If p is prime, every nonzero a ∈ ℤ_p is invertible (since gcd(a, p) = 1 for 1 ≤ a ≤ p-1). Thus ℤ_p is a field: the foundation of "arithmetic mod a prime".

Geometric/lattice view. The set {a·x mod m : x ∈ ℤ} equals the set of multiples of g = gcd(a, m) in {0, 1, …, m-1} (a consequence of Corollary 2.2). It contains 1 iff g = 1. This is the same fact as Theorem 4.1, viewed as "the cyclic subgroup generated by a under multiplication-by-a reaches 1 iff a is a unit."

Worked non-existence. Take a = 6, m = 9, g = gcd(6, 9) = 3. The reachable products {6·x mod 9 : x = 0,1,2,…} = {0, 6, 3, 0, 6, 3, …} = {0, 3, 6} — exactly the multiples of 3 mod 9. The value 1 never appears, so 6 has no inverse mod 9. The orbit visits only m/g = 3 distinct residues, confirming Theorem 4.1 concretely: non-coprimality confines the orbit to the multiples-of-g sublattice, which excludes the unit 1.

Counting the units. The number of invertible residues mod m is φ(m) (Corollary 5.4). For m = 9, φ(9) = 6, the units being {1, 2, 4, 5, 7, 8} — precisely the residues coprime to 9, and 6 ∉ this set. So "fraction of invertible residues" is φ(m)/m, which for a prime is (p-1)/p ≈ 1 (almost everything invertible) and for a number with small prime factors can be much smaller.

5. Uniqueness and the Group of Units¶

Theorem 5.1 (Uniqueness of the inverse). If a⁻¹ exists mod m, it is unique in ℤ_m.

Proof. Suppose a·x ≡ 1 and a·x' ≡ 1 (mod m). Then a·x ≡ a·x', so a(x - x') ≡ 0 (mod m). Multiplying by a⁻¹ (which exists by hypothesis) gives x - x' ≡ 0, i.e. x ≡ x' (mod m). ∎

Theorem 5.2 (Non-uniqueness of Bezout coefficients). If (x, y) satisfies a·x + b·y = g, then all solutions are

(x + k·(b/g),  y - k·(a/g)),   k ∈ ℤ.

Proof. Each such pair satisfies a(x + k·b/g) + b(y - k·a/g) = a·x + b·y + k(ab/g - ab/g) = g. Conversely, if a·x' + b·y' = g, subtract to get a(x' - x) = -b(y' - y); dividing by g and using gcd(a/g, b/g) = 1 forces (b/g) | (x' - x), giving the stated family. ∎

So the inverse (a residue mod m) is unique, but the Bezout coefficient x (an integer) is determined only mod m/g = m (when g = 1); the canonical inverse is x mod m.

Theorem 5.3 (Group of units). The invertible residues form a group (ℤ_m)^× under multiplication, of order φ(m) (Euler's totient). It is closed (product of units is a unit: (ab)⁻¹ = b⁻¹a⁻¹), associative, has identity 1, and every element has an inverse by definition.

Corollary 5.4. |(ℤ_m)^×| = φ(m), and for prime p, φ(p) = p - 1, so (ℤ_p)^× has p - 1 elements — every nonzero residue.

5.1 Bezout Coefficient Bounds¶

Theorem 5.5 (Coefficient size bound). When a, b > 0 are not both zero and g = gcd(a, b), the extended Euclidean algorithm produces Bezout coefficients satisfying

|x| ≤ b / (2g)   and   |y| ≤ a / (2g)

(the "minimal" Bezout pair). In particular, for the modular inverse case (b = m, g = 1), the returned x satisfies |x| ≤ m/2, so a single reduction ((x % m) + m) % m is enough and the intermediate x never exceeds m in magnitude.

Proof idea. The coefficient sequences (s_k) and (t_k) produced by the algorithm are strictly increasing in magnitude and alternate in sign, with |s_k| ≤ b/(2g) and |t_k| ≤ a/(2g) at the terminal step — a standard induction on the Euclidean rows using |s_{k+1}| = |s_{k-1}| + q_k|s_k| and the relation s_k t_{k+1} - s_{k+1} t_k = ±1. The full induction appears in Knuth, TAOCP Vol. 2, §4.5.2. ∎

Practical consequence. Because |x| ≤ m/2, the inverse computation needs no big-integer arithmetic for the coefficient magnitude itself — only the product q·s inside the loop can exceed m, which is the overflow concern of the senior file, not a growth in x.

Determinant invariant. Throughout the run the 2×2 matrix [[s_k, t_k], [s_{k+1}, t_{k+1}]] has determinant ±1 (it is a product of unimodular elementary row operations [[0,1],[1,-q_k]], each of determinant -1). This unimodularity is what guarantees the coefficient pairs stay integers and bounded, and it is the algebraic reason the terminal coefficients are the minimal Bezout pair: a unimodular transform of (a, b) cannot inflate the coefficients beyond the bounds of Theorem 5.5.

5.2 The Inverse as a Group Power¶

In (ℤ_m)^×, the inverse of a is a^(φ(m)-1) (Theorem 8.2). Combined with Lagrange's theorem (the order of a divides φ(m)), this means the inverse is a^(ord(a)-1) where ord(a) is the multiplicative order — often much smaller than φ(m). While extended Euclid is the practical tool, the group-theoretic view explains why the inverse is unique (Theorem 5.1): in any finite group, inverses are unique.

Worked order-based inverse. Mod 7, ord(2) = 3 (2, 4, 1). So 2⁻¹ = 2^(3-1) = 2² = 4 (mod 7), agreeing with 2·4 = 8 ≡ 1. Using the full φ(7) = 6 instead gives 2⁵ = 32 ≡ 4 — the same answer, but a larger exponent. Knowing the order shortens the exponent, though in practice extended Euclid sidesteps exponentiation entirely. The point is conceptual: the inverse is a power of the element, living in the cyclic subgroup it generates.

6. The Linear Congruence a·x ≡ b (mod m)¶

Theorem 6.1 (Solvability and solution count). Let g = gcd(a, m). The congruence a·x ≡ b (mod m): 1. has a solution iff g | b; 2. when solvable, has exactly g solutions modulo m, of the form x₀ + t·(m/g) for t = 0, 1, …, g-1, where x₀ is any particular solution.

Proof.

Solvability. a·x ≡ b (mod m) means a·x - b = m·y for some y, i.e. a·x - m·y = b has an integer solution (x, y). By Corollary 2.2 this Diophantine equation is solvable iff gcd(a, m) = g divides b.

Construction of x₀. When g | b, extended Euclid gives x' with a·x' + m·y' = g. Multiply by b/g:

a·(x'·b/g) + m·(y'·b/g) = b,

so x₀ = x'·(b/g) is a particular solution; reduce it mod m.

Counting solutions. If x₀ and x₁ are both solutions, a(x₁ - x₀) ≡ 0 (mod m), i.e. m | a(x₁ - x₀). Dividing by g: (m/g) | (a/g)(x₁ - x₀), and since gcd(a/g, m/g) = 1, we get (m/g) | (x₁ - x₀). So all solutions are ≡ x₀ (mod m/g), which gives exactly g distinct residues mod m: x₀, x₀ + m/g, …, x₀ + (g-1)m/g. ∎

Corollary 6.2 (Unique-solution case). If gcd(a, m) = 1, the congruence has a unique solution x ≡ a⁻¹·b (mod m) — the inverse times the right-hand side. This is the everyday case and reduces "solve the congruence" to "multiply by the inverse."

Worked unique-solution example. 7·x ≡ 3 (mod 26): gcd(7, 26) = 1, so a unique solution. From Section 8.0, 7⁻¹ ≡ 15 (mod 26), so x ≡ 15·3 = 45 ≡ 19 (mod 26). Check: 7·19 = 133 = 5·26 + 3 ≡ 3 (mod 26) ✓. The whole congruence collapses to one multiplication by the precomputed inverse — the reason inverses are precomputed once and reused across many right-hand sides b.

Worked example. 14·x ≡ 30 (mod 100): g = gcd(14, 100) = 2, and 2 | 30, so 2 solutions. Extended Euclid: 14·(-7) + 100·1 = 2, so x' = -7. Then x₀ = -7·(30/2) mod (100/2) = -105 mod 50 = 45. Solutions: 45 and 45 + 50 = 95. Check 14·45 = 630 ≡ 30 (mod 100) ✓ and 14·95 = 1330 ≡ 30 (mod 100) ✓.

6.1 The Solution Set as a Coset¶

Geometric interpretation. The solution set of a·x ≡ b (mod m) is a coset of the subgroup H = (m/g)·ℤ_m = {0, m/g, 2m/g, …, (g-1)m/g} of (ℤ_m, +). Concretely, if x₀ is one solution, the full set is x₀ + H. This is why the solutions are evenly spaced m/g apart and number exactly g = |H|: cosets of a subgroup partition the group into equal-size blocks. The map x ↦ a·x (mod m) is a group homomorphism on (ℤ_m, +) with kernel H, so its fibers (preimages of a value) are exactly the cosets of H.

Theorem 6.3 (Reduction to the coprime case). The congruence a·x ≡ b (mod m) with g = gcd(a, m) | b is equivalent to the coprime congruence

(a/g)·x ≡ (b/g)  (mod m/g),

which has the unique solution x₀ ≡ (a/g)⁻¹·(b/g) (mod m/g) because gcd(a/g, m/g) = 1. Lifting back to mod m produces the g solutions x₀, x₀ + m/g, …, x₀ + (g-1)m/g.

Proof. Dividing the congruence a·x - b = m·y through by g (valid since g | a, g | m, g | b) gives (a/g)·x - (b/g) = (m/g)·y, i.e. the reduced congruence mod m/g. Now gcd(a/g, m/g) = 1 (dividing out the full common factor), so (a/g)⁻¹ exists mod m/g and the reduced congruence has a unique solution there. Each residue mod m/g lifts to exactly g residues mod m. ∎

This reduction is the cleanest implementation strategy: divide out g, invert in the smaller modulus, then enumerate the g lifts. It avoids the scaling subtlety of x₀ = x'·(b/g) mod (m/g) by working in the coprime world directly.

7. The O(n) Inverse-Table Derivation¶

Theorem 7.1. For a prime p and all 2 ≤ i < p,

i⁻¹ ≡ -⌊p/i⌋ · (p mod i)⁻¹  (mod p).

Proof. Write p = q·i + r with q = ⌊p/i⌋ and r = p mod i, 0 ≤ r < i. Since 1 ≤ i < p and p is prime, 0 < r (because i ∤ p) and i, r are both coprime to p, so i⁻¹ and r⁻¹ exist. Reduce the equation mod p:

q·i + r ≡ 0 (mod p).

Multiply both sides by i⁻¹ r⁻¹:

q·r⁻¹ + i⁻¹ ≡ 0 (mod p)   ⟹   i⁻¹ ≡ -q·r⁻¹ = -⌊p/i⌋·(p mod i)⁻¹ (mod p). ∎

Corollary 7.2 (Linear-time table). With inv[1] = 1 and the recurrence applied for i = 2, …, n, all of inv[1..n] are computed in O(n) ring operations, because r = p mod i < i means (p mod i)⁻¹ = inv[p mod i] was already computed earlier in the increasing-i pass.

inv[1] = 1
for i = 2 .. n:  inv[i] = ( -(p/i) * inv[p mod i] ) mod p   # normalized to [0, p)

Necessity of p prime and p > n. The recurrence requires (p mod i)⁻¹ to exist for every 2 ≤ i ≤ n; equivalently every i ≤ n must be a unit. This holds iff gcd(i, p) = 1 for all such i, guaranteed by p prime and p > n. For composite m, some i ≤ n shares a factor with m and has no inverse, so the table is undefined for those indices.

Related identity (factorial inverses). A companion linear-time trick inverts all factorials: compute fact[i] forward, invert fact[n] once via Fermat/extended Euclid, then walk backward invfact[i-1] = invfact[i]·i (mod p). Combined with inv[i] = invfact[i]·fact[i-1], this yields all element inverses and all factorial inverses in O(n) with a single inversion — the standard binomial-coefficient precomputation.

7.1 Worked Table for p = 11¶

Running inv[1] = 1, inv[i] = -(11/i)·inv[11 % i] (mod 11):

i=2:  11/2 = 5,  11%2 = 1,  inv = -5·inv[1] = -5 ≡ 6   (2·6 = 12 ≡ 1 ✓)
i=3:  11/3 = 3,  11%3 = 2,  inv = -3·inv[2] = -3·6 = -18 ≡ 4   (3·4 = 12 ≡ 1 ✓)
i=4:  11/4 = 2,  11%4 = 3,  inv = -2·inv[3] = -2·4 = -8 ≡ 3   (4·3 = 12 ≡ 1 ✓)
i=5:  11/5 = 2,  11%5 = 1,  inv = -2·inv[1] = -2 ≡ 9   (5·9 = 45 ≡ 1 ✓)
i=6:  11/6 = 1,  11%6 = 5,  inv = -1·inv[5] = -9 ≡ 2   (6·2 = 12 ≡ 1 ✓)
...   yielding [_, 1, 6, 4, 3, 9, 2, 8, 7, 5, 10]

Every dependency inv[11 % i] has index 11 % i < i, hence is already filled — the structural reason the single increasing pass is O(n).

7.2 Failure on a Composite Modulus¶

For m = 12 and i = 4: 4 is not invertible mod 12 (gcd(4, 12) = 4), so inv[4] is undefined, and any inv[j] whose recurrence references it inherits garbage. The recurrence is well-defined only when every i ≤ n is a unit, i.e. gcd(i, m) = 1 for all 1 ≤ i ≤ n — guaranteed precisely by m prime and m > n. This is the formal boundary of the table trick's applicability.

7.3 Comparison of Batch-Inversion Complexities¶

Method	Inputs	Inversions	Multiplications	Total
`n` separate extended-Euclid	any coprime	`n`	—	`O(n log m)`
O(n) table recurrence	`1..n`, prime `p > n`	`0`	`O(n)`	`O(n)`
Montgomery batch trick	arbitrary `a₁..aₙ`	`1`	`3(n-1)`	`O(n)` + one `O(log m)`
factorial-inverse sweep	`0!..n!`, prime	`1`	`O(n)`	`O(n)` + one `O(log m)`

The table recurrence is the only method needing zero inversions, but it is restricted to the contiguous range 1..n mod a prime. Montgomery's trick is the general-purpose O(n) batch method for arbitrary elements, trading the table's structure for one unavoidable inversion. Both crush the naive O(n log m) of independent inversions when n is large.

8. Fermat, Euler, and the Order of an Element¶

8.0 Worked Inverse via Continued-Fraction Convergents¶

Compute 7⁻¹ mod 26 three ways to see them coincide:

Extended Euclid: 26 = 3·7 + 5; 7 = 1·5 + 2; 5 = 2·2 + 1; 2 = 2·1.
  back: 1 = 5 - 2·2 = 5 - 2(7 - 5) = 3·5 - 2·7 = 3(26 - 3·7) - 2·7 = 3·26 - 11·7.
  so 7·(-11) ≡ 1 (mod 26),  -11 mod 26 = 15.   →  7⁻¹ = 15.
Continued fraction of 7/26 = [0; 3, 1, 2, 2]; penultimate convergent denominator = 15. ✓
Fermat does NOT apply (26 is composite); brute check: 7·15 = 105 = 4·26 + 1 ≡ 1. ✓

All three (where applicable) give 15. The composite modulus is exactly the case where Fermat fails and extended Euclid (= continued-fraction convergent) is mandatory.

Theorem 8.1 (Fermat's little theorem). If p is prime and p ∤ a, then a^(p-1) ≡ 1 (mod p), hence a⁻¹ ≡ a^(p-2) (mod p).

Proof. The nonzero residues {1, 2, …, p-1} form the group (ℤ_p)^× of order p-1. By Lagrange's theorem, the order of any element divides p-1, so a^(p-1) ≡ 1. Equivalently, the map x ↦ a·x permutes {1, …, p-1}, so ∏ x ≡ ∏ (a·x) = a^(p-1) ∏ x, and cancelling ∏ x (a unit) gives a^(p-1) ≡ 1. Multiplying by a⁻¹ yields a^(p-2) ≡ a⁻¹. ∎

Theorem 8.2 (Euler's theorem). If gcd(a, m) = 1, then a^(φ(m)) ≡ 1 (mod m), hence a⁻¹ ≡ a^(φ(m)-1) (mod m).

Proof. a is an element of the group (ℤ_m)^× of order φ(m); by Lagrange, a^(φ(m)) ≡ 1. ∎

Worked Euler inversion. For a = 5, m = 12: φ(12) = φ(4)φ(3) = 2·2 = 4, so 5⁻¹ ≡ 5^(4-1) = 5³ = 125 ≡ 125 - 10·12 = 5 (mod 12). Check: 5·5 = 25 ≡ 1 (mod 12) ✓ (5 is self-inverse here). This required knowing φ(12) = 4; for a modulus whose factorization is unknown, computing φ(m) is as hard as factoring, which is why extended Euclid — needing no factorization — is the universal practical inverse despite Euler's elegance.

Why Euler does not help cryptanalysis. RSA security rests on φ(n) = (p-1)(q-1) being unknown without p, q. An attacker who could compute φ(n) could derive the private exponent d = e⁻¹ mod φ(n) and break the system. Computing φ(n) is equivalent to factoring n, so Euler-based inversion of e is exactly the infeasible step. Legitimate key generation knows p, q (hence φ(n)) and uses extended Euclid to get d cheaply — the asymmetry that makes RSA work.

When each inverse method applies.

Method	Requires	Cost	Detects non-existence
Extended Euclid	`gcd(a, m) = 1` (any `m`)	`O(log m)`	yes (`g ≠ 1`)
Fermat `a^(p-2)`	`m = p` prime, `p ∤ a`	`O(log p)`	no
Euler `a^(φ(m)-1)`	`gcd(a, m) = 1`, know `φ(m)`	`O(log m)` + factoring	no

Euler subsumes Fermat (φ(p) = p-1) but needs φ(m), whose computation requires factoring m — generally harder than extended Euclid. So extended Euclid is the universal practical inverse; Fermat is the clean prime-modulus shortcut.

Cost comparison, concretely. For a prime p, Fermat's a^(p-2) does ≈ log₂ p squarings plus up to log₂ p multiplications — about 2 log₂ p modular multiplications. Extended Euclid does ≈ 0.84 ln p ≈ 0.58 log₂ p divisions on average (Section 9.1). Since a division and a multiplication are comparable in cost, extended Euclid typically wins on operation count by a factor of ~3, in addition to detecting non-existence. Fermat's advantages are code simplicity (one modpow call) and constant-time behavior with a fixed ladder — the reasons cryptographic code over a fixed prime field often prefers it despite the higher operation count.

Order and primitive roots (aside). The multiplicative order of a mod m is the least e > 0 with a^e ≡ 1; it divides φ(m) (Lagrange). When (ℤ_m)^× is cyclic (e.g., m prime), a generator is a primitive root, and inverses are a⁻¹ = g^(φ(m) - log_g a) in terms of discrete logs — a connection to the discrete-log problem underlying much of cryptography.

8.1 Wilson's Theorem and Self-Inverse Elements¶

Theorem 8.3 (Self-inverse elements). Mod a prime p, the only elements equal to their own inverse are 1 and p-1: a² ≡ 1 (mod p) iff p | (a-1)(a+1) iff a ≡ ±1 (mod p).

Proof. a² ≡ 1 means (a-1)(a+1) ≡ 0 (mod p); since ℤ_p is a field (no zero divisors), one factor is 0, so a ≡ 1 or a ≡ -1 ≡ p-1. ∎

Corollary (Wilson's theorem). (p-1)! ≡ -1 (mod p) for prime p. The product (p-1)! = ∏_{a=1}^{p-1} a pairs each a with its distinct inverse a⁻¹ (product 1), except the two self-inverse elements 1 and p-1, whose product is p-1 ≡ -1. This is a clean application of inverse uniqueness and pairing — every non-self-inverse unit cancels against its inverse partner.

9. Complexity: The O(log) Bound¶

Theorem 9.1 (Lamé's theorem). The number of division steps in the Euclidean algorithm on (a, b) with a > b ≥ 1 is at most 5 times the number of decimal digits of b, i.e. O(log b). More precisely, the worst case is achieved by consecutive Fibonacci numbers.

Proof sketch. If the algorithm takes k steps on (a, b), then b ≥ F_{k+1} and a ≥ F_{k+2}, where F is the Fibonacci sequence. Each step's quotient is ≥ 1, and the remainders satisfy r_{i-1} = q_i r_i + r_{i+1} ≥ r_i + r_{i+1}, the Fibonacci recurrence inequality. Since F_k ≈ φ^k/√5 with φ = (1+√5)/2, having b ≥ F_{k+1} forces k = O(log_φ b) = O(log b). ∎

Theorem 9.2 (Extended Euclid complexity). EXTGCD(a, b) runs in O(log min(a, b)) arithmetic operations. With fixed-width (e.g. 64-bit) integers each operation is O(1), giving O(log m) total. With L-bit big integers each division/multiplication is O(L²) schoolbook (or O(L log L) fast), so the bit-complexity is O(L²) to O(L³) naively, improved to O(L log² L) by the half-GCD divide-and-conquer.

Proof. The step count is O(log) by Lamé (Theorem 9.1); each step does a constant number of divisions, multiplications, and subtractions on the running (r, s, t) triples. Multiply per-step cost by the step count. ∎

Worked worst case (Fibonacci). EXTGCD(F_{10}, F_9) = EXTGCD(55, 34) realizes the bound: every quotient is 1, so the remainders march down the Fibonacci sequence 55, 34, 21, 13, 8, 5, 3, 2, 1, 0 — 9 steps for inputs of magnitude 55. Since F_k ≈ φ^k/√5, the step count k ≈ log_φ(F_k·√5) is the maximum possible for inputs of that size; any non-Fibonacci pair of the same magnitude terminates in fewer steps. This is exactly why Lamé's bound is stated in terms of Fibonacci numbers: they are the extremal inputs.

Corollary 9.3. The modular inverse and the linear-congruence solver are both O(log m) over fixed-width integers — the same asymptotic class as a single fast exponentiation, but typically with a smaller constant (no log m modular multiplications, just O(log m) divisions).

9.1 The Average Number of Steps¶

Beyond the worst case, the average number of Euclidean steps over uniformly random pairs (a, b) with b ≤ N is (12 ln 2 / π²) ln N + O(1) ≈ 0.843 ln N, a result of the Heilbronn-Dixon analysis tied to the Gauss-Kuzmin distribution of continued-fraction quotients. So in expectation the algorithm is even faster than the Fibonacci worst case suggests. This matters when inverting many random residues: the amortized cost is firmly logarithmic with a small constant, never approaching the 1/log φ worst-case slope except on adversarially chosen Fibonacci-like inputs.

9.2 Binary (Stein) GCD and Its Extension¶

The plain extended Euclid uses division, which is expensive for big integers. Stein's binary GCD replaces division with shifts and subtractions:

- both even:  gcd(a,b) = 2·gcd(a/2, b/2)
- one even:   gcd(a,b) = gcd(a/2, b)        (the even one halved)
- both odd:   gcd(a,b) = gcd(|a-b|/2, min(a,b))

Its extended variant tracks coefficients through these shift/subtract steps (with care: halving a coefficient requires the operand to be even, handled by conditionally adding the modulus first). The binary extended GCD runs in O(L²) bit operations for L-bit inputs but with only shifts/adds — often faster than division-based extended Euclid on hardware where division is slow, and the basis of the constant-time safegcd (Bernstein-Yang) used in cryptographic libraries.

Worked binary GCD. gcd(48, 18): both even → 2·gcd(24, 9); 24 even, 9 odd → gcd(12, 9); → gcd(6, 9) → gcd(3, 9); both odd → gcd(|9-3|/2, 3) = gcd(3, 3) → both odd → gcd(0, 3) = 3. Times the factored-out 2: gcd(48, 18) = 6. Every step is a shift or subtraction — no division — which is the practical appeal for big integers and the structural reason the iteration count is fixed (one bit removed per step), enabling a constant-time implementation.

9.3 Half-GCD: Quasi-Linear Bit Complexity¶

For very large operands, the half-GCD algorithm computes the gcd (and the transformation matrix giving Bezout coefficients) in O(M(L) log L) bit operations, where M(L) is the cost of L-bit multiplication. With FFT-based multiplication M(L) = O(L log L), this is O(L log² L) — quasi-linear, versus the O(L²) of the schoolbook approach. Half-GCD recursively reduces the operands by processing the high-order halves with 2×2 matrix products, the same divide-and-conquer that powers fast multiplication and is implemented in high-performance libraries like GMP. For cryptographic-size inverses, this is the algorithm actually running inside BigInteger.modInverse and friends.

10. Bezout, CRT, and Diophantine Connections¶

The extended algorithm is the computational heart of two sibling topics.

Theorem 10.1 (Linear Diophantine, sibling 07). a·x + b·y = c has integer solutions iff gcd(a, b) | c, and the general solution is (x₀ + k·b/g, y₀ - k·a/g) for k ∈ ℤ, where (x₀, y₀) comes from scaling Bezout coefficients by c/g. This is Corollary 2.2 plus Theorem 5.2 made explicit; extended Euclid produces the base solution.

Theorem 10.2 (CRT via inverses, sibling 08). For pairwise-coprime moduli m₁, …, m_k with product M, the system x ≡ r_i (mod m_i) has a unique solution mod M:

x ≡ Σ_i r_i · (M/m_i) · ((M/m_i)⁻¹ mod m_i)   (mod M).

Proof. Each term (M/m_i)·((M/m_i)⁻¹ mod m_i) is ≡ 1 (mod m_i) and ≡ 0 (mod m_j) for j ≠ i (since m_j | M/m_i). Hence the sum is ≡ r_i (mod m_i) for every i. The inverse (M/m_i)⁻¹ mod m_i exists because gcd(M/m_i, m_i) = 1 (pairwise coprimality), and it is computed by extended Euclid. Uniqueness follows because ℤ_M ≅ ℤ_{m₁} × … × ℤ_{m_k} (the CRT ring isomorphism). ∎

Theorem 10.3 (Rational reconstruction). Given r ≡ p/q (mod m) with bounds |p|, q ≤ √(m/2), the unique rational p/q is recovered by running extended Euclid on (m, r) and stopping when the remainder drops below √(m/2); the coefficients yield p and q. This "stopping early" use of extended Euclid underlies recovering exact fractions from modular images — a technique in exact linear algebra and CRT-based algorithms.

Synthesis. Bezout coefficients are the common currency: the modular inverse is the coprime special case (g = 1, read off x), the linear congruence scales them, CRT combines several inverses, and Diophantine equations expose them directly. One O(log) algorithm underlies all four.

10.1 Worked CRT Reconstruction¶

Solve the system x ≡ 2 (mod 3), x ≡ 3 (mod 5), x ≡ 2 (mod 7) (M = 105):

M/3 = 35,  35⁻¹ mod 3 = 2   (35 ≡ 2 mod 3, 2·2 = 4 ≡ 1)   → term 2·35·2 = 140
M/5 = 21,  21⁻¹ mod 5 = 1   (21 ≡ 1 mod 5)                → term 3·21·1 = 63
M/7 = 15,  15⁻¹ mod 7 = 1   (15 ≡ 1 mod 7)                → term 2·15·1 = 30
x ≡ 140 + 63 + 30 = 233 ≡ 233 - 2·105 = 23  (mod 105)

Check: 23 mod 3 = 2 ✓, 23 mod 5 = 3 ✓, 23 mod 7 = 2 ✓. Each of the three inverses came from a tiny extended-Euclid call; CRT is just "one inverse per modulus, then a weighted sum."

Garner's incremental form. An alternative to the symmetric formula above is Garner's algorithm, which builds the answer one congruence at a time in mixed-radix form: x = r₁ + m₁(c₂ + m₂(c₃ + …)), where each c_i is computed with a single inverse (m₁⋯m_{i-1})⁻¹ mod m_i. Garner's form avoids the large product M in intermediate steps (better for big integers) and is the method of choice in CRT-based exact arithmetic. It still rests on one modular inverse per modulus — extended Euclid remains the workhorse beneath both forms.

10.2 The Ring Isomorphism Underpinning CRT¶

Theorem 10.4 (CRT isomorphism). For pairwise-coprime m₁, …, m_k with product M, the map

ψ : ℤ_M → ℤ_{m₁} × … × ℤ_{m_k},   ψ(x) = (x mod m₁, …, x mod m_k)

is a ring isomorphism. Consequently (ℤ_M)^× ≅ (ℤ_{m₁})^× × … × (ℤ_{m_k})^×, and φ is multiplicative: φ(M) = ∏ φ(m_i).

Proof. ψ is a ring homomorphism (reduction commutes with + and ×). It is injective: ψ(x) = 0 means every m_i | x, so M = lcm = ∏ m_i | x (pairwise coprimality), hence x ≡ 0 (mod M). Both sides have M = ∏ m_i elements, so injectivity forces bijectivity. The unit-group factorization follows because units map to units, and φ(M) = |(ℤ_M)^×| = ∏ |(ℤ_{m_i})^×| = ∏ φ(m_i). ∎

The constructive inverse of ψ is exactly the CRT formula of Theorem 10.2, and the modular inverses (M/m_i)⁻¹ mod m_i are the structure constants of that inverse map — another reason extended Euclid sits at the heart of CRT.

10.3 Modular Inverse and Continued Fractions¶

The Bezout coefficients are precisely the convergents of the continued-fraction expansion of a/m. If a/m = [q₀; q₁, q₂, …] with convergents p_k/Q_k, then the second-to-last convergent gives the inverse: a · (±Q_{n-1}) ≡ ±1 (mod m). This is the same (s, t) sequence the iterative algorithm tracks (the t column is the denominator-convergent sequence up to sign). So "compute the modular inverse," "run extended Euclid," and "expand a/m as a continued fraction and take the penultimate convergent" are three names for one computation — the unifying view behind rational reconstruction (Theorem 10.3).

10.5 Application Survey: Where the Inverse Appears¶

The modular inverse and extended Euclid are not abstractions — they are the computational core of many named results and systems. A consolidated map:

Domain	Use of the inverse / extended Euclid	Modulus type
RSA key generation	`d = e⁻¹ mod φ(n)` (private exponent)	composite `φ(n)` → extended Euclid
RSA CRT decryption	`qinv = q⁻¹ mod p`	prime `p`
Diffie-Hellman / ECC	nonce inverse `k⁻¹` in signing	prime field order
CRT reconstruction	`(M/m_i)⁻¹ mod m_i` per modulus	each `m_i`
Binomial coefficients mod `p`	factorial inverses via one inversion	prime `p`
Rational reconstruction	early-stopped extended Euclid	any `m`
Linear Diophantine `ax + by = c`	scaled Bezout coefficients	—
Affine cipher decryption	invert the multiplier mod alphabet size	composite (e.g. 26)
Hashing / fraction-free arithmetic	divide by `a` via `a⁻¹`	prime

The unifying principle. Every one of these is "I have a·x + m·y = 1 (or = g) and I want to either invert a, combine residues, or recover an integer relation." Extended Euclid is the single O(log m) primitive that produces the Bezout witness underlying all of them. Mastering it — its correctness (Section 3), its existence law (Section 4), its solution-set structure (Section 6), and its complexity (Section 9) — is mastering the entire family.

A note on polynomial analogues. Every result in this file transfers verbatim from ℤ to the polynomial ring F[x] over a field F, which is also a Euclidean domain. The "extended Euclidean algorithm for polynomials" computes Bezout polynomials a(x)·u(x) + b(x)·v(x) = gcd(a, b), and the "inverse mod a polynomial" is how one inverts elements of a finite field F_{p^k} = F_p[x]/(irreducible) — the arithmetic behind AES's GF(2⁸), Reed-Solomon codes, and elliptic curves over extension fields. The integer story you have learned is one instance of a general Euclidean-domain phenomenon.

10.6 The Affine Cipher Worked Example¶

The affine cipher encrypts c = (a·x + b) mod 26; decryption requires a⁻¹ mod 26, which exists iff gcd(a, 26) = 1 (so a ∈ {1,3,5,7,9,11,15,17,19,21,23,25}). For a = 5: extended Euclid gives 5·21 + 26·(-4) = 1, so 5⁻¹ ≡ 21 (mod 26), and decryption is x = 21·(c - b) mod 26. A key with a = 13 (gcd(13, 26) = 13) is invalid — no inverse, so decryption is impossible. This is the existence law of Theorem 4.1 surfacing as a concrete key-validity constraint, and a composite modulus (26) where Fermat is inapplicable.

11. Summary¶

Bezout's identity (Theorem 2.1): a·x + b·y = gcd(a, b) always has integer solutions; gcd is the least positive combination, and the achievable combinations are exactly the multiples of gcd (Corollary 2.2).
Extended Euclid (Theorem 3.1) constructs Bezout coefficients in O(log) by the recursion x = y₁, y = x₁ - ⌊a/b⌋·y₁, proved correct by induction on b; the iterative form maintains the invariant r = s·a + t·b.
Inverse existence (Theorem 4.1) holds iff gcd(a, m) = 1: coprimality gives Bezout's a·x ≡ 1, and any inverse forces gcd = 1. For prime p, every nonzero residue is invertible, so ℤ_p is a field (Corollary 4.3).
Uniqueness (Theorem 5.1): the inverse is unique mod m; Bezout coefficients are not (Theorem 5.2), differing by multiples of (b/g, -a/g). Units form the group (ℤ_m)^× of order φ(m).
Linear congruence (Theorem 6.1): a·x ≡ b (mod m) is solvable iff gcd(a, m) | b, with exactly g = gcd(a, m) solutions spaced m/g apart; the coprime case has a unique solution a⁻¹·b.
O(n) inverse table (Theorem 7.1): i⁻¹ ≡ -⌊p/i⌋·(p mod i)⁻¹ (mod p) computes all of 1..n in linear time, requiring p prime and p > n.
Fermat/Euler (Theorems 8.1–8.2): a⁻¹ = a^(p-2) for prime p, a⁻¹ = a^(φ(m)-1) in general — both via Lagrange's theorem on (ℤ_m)^×; Euler needs φ(m) (factoring), so extended Euclid stays the universal tool.
Complexity (Theorems 9.1–9.2): O(log min(a, b)) steps (Lamé, worst case Fibonacci), O(L log² L) bit-complexity with half-GCD for L-bit inputs.
Connections (Section 10): Bezout coefficients power linear Diophantine equations (07), CRT reconstruction (08), and rational reconstruction — one algorithm, four applications.

Theorem Cheat Table¶

Statement	Condition	Result
Bezout	not both zero	`a·x + b·y = gcd(a, b)` solvable
`a·x + b·y = n`	—	solvable iff `gcd(a,b) \| n`
`a⁻¹ mod m` exists	—	iff `gcd(a, m) = 1`
inverse uniqueness	inverse exists	unique in `ℤ_m`
`a·x ≡ b (mod m)`	—	solvable iff `g \| b`; then `g` solutions
inverse table `1..n`	`p` prime, `p > n`	`O(n)` via `-⌊p/i⌋·inv[p%i]`
Fermat inverse	`p` prime, `p ∤ a`	`a⁻¹ = a^(p-2)`
Euclid step count	`a > b ≥ 1`	`O(log b)` (Lamé)
coefficient size	`g = gcd(a,b)`	`\|x\| ≤ b/(2g)`, `\|y\| ≤ a/(2g)`
self-inverse mod `p`	`p` prime	only `1` and `p-1`
CRT isomorphism	`m_i` pairwise coprime	`ℤ_M ≅ ∏ ℤ_{m_i}`, `φ` multiplicative

Closing Notes¶

One witness, many theorems (Section 10.5): the Bezout pair from a single O(log m) extended-Euclid call is the structure constant behind RSA's private exponent, CRT reconstruction, binomial-coefficient precomputation, rational reconstruction, and Diophantine solving. The proofs differ; the primitive does not.
Existence is geometry (Section 4): a⁻¹ exists iff 1 lies in the lattice {a·x mod m}, which is the multiples of gcd(a, m); coprimality is the statement that this lattice is all of ℤ_m.
Solutions are cosets (Section 6.1): the solution set of a·x ≡ b (mod m) is a coset of the subgroup (m/g)·ℤ_m, explaining both the count g and the even spacing m/g with one group-theoretic fact.
Continued fractions unify (Section 10.3): extended Euclid, the modular inverse, and continued-fraction convergents are three names for the same (s, t) recurrence — the conceptual glue tying this topic to rational approximation.
Complexity is logarithmic and tight (Section 9): O(log m) steps by Lamé, ≈ 0.84 ln N on average, with half-GCD giving quasi-linear O(L log² L) bit-complexity at cryptographic sizes — the reason the inverse is never the bottleneck.
The same machinery over polynomials (Section 10.5): extended Euclid in F[x] inverts elements of finite extension fields F_{p^k}, powering AES's GF(2⁸), Reed-Solomon codes, and curve arithmetic — the integer case is one instance of a Euclidean-domain pattern.

Foundational references: any number-theory text (Hardy & Wright, An Introduction to the Theory of Numbers); CLRS Ch. 31 for the algorithmic treatment; Knuth TAOCP Vol. 2 §4.5.2–4.5.3 for coefficient bounds, average step count, and half-GCD; Shoup, A Computational Introduction to Number Theory and Algebra, for ℤ_m, units, and inverses; Bernstein-Yang for constant-time gcd; and sibling topics 05-gcd-lcm-euclidean, 07-linear-diophantine, and 08-chinese-remainder-theorem.