Pollard's Rho Factorization — Junior Level¶
One-line summary: Pollard's rho finds a nontrivial factor of a composite number
nby walking a pseudo-random sequencex ← x² + c (mod n), watching for a repeat (a cycle), and computinggcd(|x − y|, n). When that gcd lands strictly between1andn, you have splitn— in expectedO(n^{1/4})time, far faster than trial-dividing up to√n.
Table of Contents¶
- Introduction
- Prerequisites
- Glossary
- Core Concepts
- Big-O Summary
- Real-World Analogies
- Pros & Cons
- Step-by-Step Walkthrough
- Code Examples
- Coding Patterns
- Error Handling
- Performance Tips
- Best Practices
- Edge Cases & Pitfalls
- Common Mistakes
- Cheat Sheet
- Visual Animation
- Summary
- Further Reading
Introduction¶
Suppose someone hands you a 19-digit number like n = 1234567891011121314 and asks: "What are its prime factors?" The obvious approach — try dividing by 2, 3, 5, 7, … up to √n — is hopeless: √n is about 1.1 × 10⁹, so a billion trial divisions per number. For a 64-bit number √n reaches 4.3 × 10⁹. That is too slow when you must factor many numbers.
There is a clever shortcut discovered by John Pollard in 1975, the rho (ρ) algorithm. It does not search for a factor directly. Instead it generates a sequence of "random-looking" numbers modulo n:
and looks for a moment when two of these values become congruent modulo a hidden prime factor p of n. When xᵢ ≡ xⱼ (mod p) but xᵢ ≠ xⱼ (mod n), the difference xᵢ − xⱼ is a multiple of p but not of n. So:
gcd(|xᵢ − xⱼ|, n)is a multiple ofpthat is smaller thann— a nontrivial factor ofn.
The deep reason this is fast is the birthday paradox. The sequence x mod p lives in a set of only p values, and a repeating value (a collision) appears after about √p steps — not p steps. Since p ≤ √n for the smallest prime factor, we expect to find a factor in roughly √(√n) = n^{1/4} iterations. For a 64-bit number that is around 65000 steps instead of billions.
This single idea connects three things you may have met separately:
- Random walks and cycles — the sequence eventually loops, tracing a shape like the Greek letter ρ (a tail leading into a circle), which is where the name comes from.
- The gcd / Euclidean algorithm (sibling
01-gcd-euclidean) — the workhorse that turns a collision into an actual factor. - Primality testing (sibling
08-miller-rabin) — Pollard's rho only splitsn; to fully factornyou recurse, and at each step you ask Miller-Rabin "is this piece already prime?"
One caveat from minute one: Pollard's rho is a factoring tool for composite numbers. Before running it you should (a) strip small factors by quick trial division, and (b) check with Miller-Rabin whether n is already prime — running rho on a prime never terminates usefully.
Prerequisites¶
Before reading this file you should be comfortable with:
- Modular arithmetic —
(a + b) mod n,(a · b) mod n, and why products can overflow 64-bit integers (this matters a lot here). - The gcd and the Euclidean algorithm —
gcd(a, b)via repeated remainder. See sibling01-gcd-euclidean. - What a prime factor is — every integer
> 1factors uniquely into primes (the Fundamental Theorem of Arithmetic). - Big-O notation —
O(√n),O(n^{1/4}),O(log n). - Loops and recursion — rho is a loop; full factorization is a small recursion.
Optional but helpful:
- A glance at the birthday paradox (a collision among random draws from
pitems appears after≈ √pdraws). - Familiarity with Miller-Rabin primality testing (sibling
08-miller-rabin), used to decide when to stop recursing.
Glossary¶
| Term | Meaning |
|---|---|
| Factor / divisor | An integer d > 0 with n mod d == 0. A nontrivial factor satisfies 1 < d < n. |
| Prime | An integer > 1 whose only divisors are 1 and itself. |
| Composite | An integer > 1 that is not prime (has a nontrivial factor). |
The map f | The pseudo-random iteration f(x) = (x² + c) mod n driving the rho sequence. |
c (the constant) | A small added constant in f; if one c fails you retry with another. Avoid c = 0 and c = −2. |
| Cycle / collision | The point where the sequence x mod p repeats a value, closing a loop. |
| Tortoise & hare (Floyd) | Two pointers, one moving 1 step and one moving 2 steps, used to detect the cycle. |
gcd(|x − y|, n) | The greatest common divisor that turns a collision into a factor. |
| Nontrivial gcd | A gcd d with 1 < d < n; this is the factor we want. |
n^{1/4} | The fourth root of n, the expected number of rho steps to find a factor. |
| mulmod | An overflow-safe routine computing (a · b) mod n when a·b exceeds 64 bits. |
Core Concepts¶
1. The Pseudo-Random Sequence¶
Pollard's rho generates values modulo n by iterating a simple polynomial:
This sequence is deterministic (no real randomness), but x² + c (mod n) scrambles values enough to behave like a random walk for our purposes. We pick a starting seed (often 2) and a constant c (often 1).
2. Why the Sequence Forms a "Rho" Shape¶
The values live in {0, 1, …, n−1}, a finite set, so the sequence must eventually repeat — once a value reappears, the sequence cycles forever. Drawing the trajectory, you get a tail leading into a loop, exactly the shape of the Greek letter ρ:
The same shape appears when we look at the sequence modulo a hidden prime p that divides n. Because p is much smaller than n, that smaller cycle closes much sooner.
3. The Collision That Reveals a Factor¶
Let p be an (unknown) prime factor of n. Consider the sequence reduced mod p: x₀ mod p, x₁ mod p, …. This sequence lives in only p values, so it collides after about √p steps: there exist indices i < j with
If at the same time xᵢ ≠ xⱼ (mod n) (which is usually true since n is much bigger), then xⱼ − xᵢ is not a multiple of n. Therefore:
That gcd is the factor we extract. We never know p in advance — the gcd finds it for us.
4. Cycle Detection: Floyd's Tortoise and Hare¶
We cannot store all past values to check for collisions (there could be √p ≈ n^{1/4} of them, but we want O(1) memory). Floyd's trick uses two pointers:
tortoise: x ← f(x) (1 step per round)
hare: y ← f(f(y)) (2 steps per round)
each round: d = gcd(|x − y|, n)
Because the hare moves twice as fast, it "laps" the tortoise inside the cycle; the difference x − y will hit a multiple of p (a collision mod p) and the gcd catches it. We check the gcd every round.
5. Three Outcomes of the gcd¶
Each round the gcd d = gcd(|x − y|, n) is one of:
1 < d < n— success!dis a nontrivial factor; return it.d == 1— no collision yet; keep iterating.d == n— the tortoise and hare collided mod n before splitting (the rare bad case). Restart with a differentc(and/or seed). The whole sequence collapsed to a trivial result.
6. From Splitting to Fully Factoring¶
Pollard's rho gives one factor d. To get the complete prime factorization you combine it with primality testing:
factor(n):
if n == 1: return
if isPrime(n): record n; return # Miller-Rabin (sibling 08)
d = pollardRho(n) # one nontrivial factor
factor(d) # recurse on the factor
factor(n / d) # recurse on the cofactor
You first strip out small primes (trial division by 2, 3, 5, …), then run rho on what remains, recursing until every piece is prime.
Big-O Summary¶
| Operation | Time | Space | Notes |
|---|---|---|---|
Trial division up to √n | O(√n) | O(1) | The slow baseline; fine only for small n. |
| One Pollard rho step | O(log n) | O(1) | One mulmod (a few 64-bit ops) plus a gcd. |
| Pollard rho to find one factor | O(n^{1/4}) expected | O(1) | Birthday paradox on the smallest prime p ≤ √n. |
| Miller-Rabin primality test | O(log³ n) | O(1) | Decides prime vs composite (sibling 08). |
| Full factorization (recurse + test) | O(n^{1/4} · polylog n) expected | O(log n) | Dominated by rho on the largest prime factor. |
Worst case (rho unlucky / n prime) | can loop | O(1) | Mitigated by Miller-Rabin first and c-restarts. |
The headline number is O(n^{1/4}) expected to split a composite — the square root of the trial-division bound O(√n). For 64-bit n that is the difference between ~65000 and ~4 billion operations.
Real-World Analogies¶
| Concept | Analogy |
|---|---|
The sequence x ← x² + c | A pinball bouncing around a finite board — chaotic but deterministic; eventually it retraces a path. |
| The rho (ρ) shape | A runner on a track who jogs in from the parking lot (the tail) and then circles the loop forever. |
| Birthday paradox | In a room of just 23 people two share a birthday with >50% chance; collisions happen far sooner than the 365-size set suggests — here the "room" is the p-sized residue set. |
| Tortoise and hare | Two runners on a circular track at different speeds: the faster one always laps the slower, so they meet. |
gcd(|x − y|, n) | A metal detector: it cannot see the buried prime p, but it beeps (returns a factor) the moment two positions differ by a multiple of p. |
Restart with new c | Re-rolling the dice when a board layout traps you in a dead loop — a fresh constant reshapes the walk. |
Where the analogy breaks: a real pinball is physical and random; our sequence is exact integer arithmetic mod n, fully reproducible from the same seed and c.
Pros & Cons¶
| Pros | Cons |
|---|---|
Expected O(n^{1/4}) — square-root faster than trial division for finding a factor. | Expected, not guaranteed: a bad c can fail and force a restart. |
Tiny O(1) memory — just a few integer variables. | Returns one factor, not the full factorization; needs recursion + primality test. |
| Simple to code once you have gcd and a safe mulmod. | Requires an overflow-safe (a·b) mod n (__int128 or Montgomery) for 64-bit n. |
| Finds small prime factors especially fast (cost scales with the smallest prime). | Useless against cryptographic semiprimes (RSA-2048): n^{1/4} is still astronomically large. |
| Combines cleanly with Miller-Rabin for full factorization of 64-bit numbers in microseconds. | Must special-case n even, n prime, and perfect powers, or it misbehaves. |
When to use: factoring 64-bit (or up to ~10^{18}–10^{19}) integers quickly; number-theory problems needing the factorization of one or many medium-sized numbers; computing things like the number/sum of divisors, Euler's totient, or the largest prime factor.
When NOT to use: breaking real cryptography (the moduli are hundreds of digits; n^{1/4} is hopeless), or when n is tiny (trial division is simpler and the constant factors favor it).
Step-by-Step Walkthrough¶
Let us factor n = 8051 by hand-ish reasoning. (8051 = 83 × 97; the smallest prime is p = 83, so we expect a collision after about √83 ≈ 9 steps.)
Use f(x) = (x² + 1) mod 8051, seed x = y = 2. Each round: advance tortoise once, hare twice, then gcd(|x − y|, 8051).
Round 1:
tortoise: x = f(2) = (4 + 1) = 5
hare: y = f(f(2)) = f(5) = (25 + 1) = 26
gcd(|5 − 26|, 8051) = gcd(21, 8051) = 1 → keep going
Round 2:
x = f(5) = 26
y = f(f(26)) = f(677) = (677² + 1) mod 8051 = 7474
gcd(|26 − 7474|, 8051) = gcd(7448, 8051) = 1 → keep going
Round 3:
x = f(26) = 677
y = f(f(7474)) = … = 871
gcd(|677 − 871|, 8051) = gcd(194, 8051) = 97 → SUCCESS
The gcd returned 97, a nontrivial factor of 8051. The cofactor is 8051 / 97 = 83. Both 97 and 83 are prime (Miller-Rabin would confirm), so the full factorization is 8051 = 83 × 97.
What happened mod p = 83? Look at the sequence mod 83: 2, 5, 26, 677 mod 83 = 13, …. At some point two of these coincided mod 83 (a collision), making their difference a multiple of 83. The gcd then surfaced a factor — in this trace it surfaced 97 first because the difference 677 − 871 = −194 = −2 × 97 happened to be a multiple of 97. Either prime is a valid nontrivial factor; rho is happy to return whichever collision it hits first.
Each round did exactly one mulmod for the tortoise, two for the hare, and one gcd — a handful of cheap operations, and it split 8051 in 3 rounds.
Code Examples¶
Example: Pollard's Rho to Find One Factor¶
This implements Floyd's tortoise-and-hare rho with a safe mulmod, returning one nontrivial factor of a composite n.
Go¶
package main
import (
"fmt"
"math/bits"
)
// mulmod computes (a*b) % m without overflowing 64 bits, using 128-bit math.
func mulmod(a, b, m uint64) uint64 {
hi, lo := bits.Mul64(a, b)
_, rem := bits.Div64(hi%m, lo, m) // (hi:lo) mod m
return rem
}
func gcd(a, b uint64) uint64 {
for b != 0 {
a, b = b, a%b
}
return a
}
func abssub(a, b uint64) uint64 {
if a > b {
return a - b
}
return b - a
}
// pollardRho returns a nontrivial factor of composite n (n even handled outside).
func pollardRho(n uint64) uint64 {
if n%2 == 0 {
return 2
}
for c := uint64(1); ; c++ {
f := func(x uint64) uint64 { return (mulmod(x, x, n) + c) % n }
x, y, d := uint64(2), uint64(2), uint64(1)
for d == 1 {
x = f(x) // tortoise: 1 step
y = f(f(y)) // hare: 2 steps
d = gcd(abssub(x, y), n)
}
if d != n { // success: nontrivial factor
return d
}
// d == n: collapsed; retry with the next c
}
}
func main() {
fmt.Println(pollardRho(8051)) // 83 or 97
fmt.Println(pollardRho(1234567891011121314)) // a nontrivial factor
}
Java¶
import java.math.BigInteger;
public class PollardRho {
static long mulmod(long a, long b, long m) {
// Use 128-bit via BigInteger for clarity/safety in Java.
return BigInteger.valueOf(a).multiply(BigInteger.valueOf(b))
.mod(BigInteger.valueOf(m)).longValue();
}
static long gcd(long a, long b) {
while (b != 0) { long t = a % b; a = b; b = t; }
return a;
}
static long pollardRho(long n) {
if (n % 2 == 0) return 2;
for (long c = 1; ; c++) {
long x = 2, y = 2, d = 1;
while (d == 1) {
x = (mulmod(x, x, n) + c) % n; // tortoise
y = (mulmod(y, y, n) + c) % n;
y = (mulmod(y, y, n) + c) % n; // hare (2 steps)
d = gcd(Math.abs(x - y), n);
}
if (d != n) return d; // success
// d == n: retry with next c
}
}
public static void main(String[] args) {
System.out.println(pollardRho(8051L)); // 83 or 97
System.out.println(pollardRho(1234567891011121314L));
}
}
Python¶
from math import gcd
def pollard_rho(n):
if n % 2 == 0:
return 2
c = 1
while True:
f = lambda x: (x * x + c) % n # Python ints never overflow
x, y, d = 2, 2, 1
while d == 1:
x = f(x) # tortoise: 1 step
y = f(f(y)) # hare: 2 steps
d = gcd(abs(x - y), n)
if d != n: # success: nontrivial factor
return d
c += 1 # d == n: retry with new c
if __name__ == "__main__":
print(pollard_rho(8051)) # 83 or 97
print(pollard_rho(1234567891011121314)) # a nontrivial factor
What it does: walks the rho sequence with tortoise/hare, computes the gcd each round, returns the first nontrivial factor, and restarts with a new c if the gcd hits n. Run: go run main.go | javac PollardRho.java && java PollardRho | python rho.py
Coding Patterns¶
Pattern 1: Trial Division Oracle (the slow truth you test against)¶
Intent: Before trusting rho, factor the slow, obvious way and check the products match on small inputs.
def factor_trial(n):
factors = []
d = 2
while d * d <= n:
while n % d == 0:
factors.append(d)
n //= d
d += 1
if n > 1:
factors.append(n)
return factors # always correct, but O(sqrt(n))
This is O(√n). Too slow for big n, but a perfect correctness oracle: factor_rho(n) must return the same multiset.
Pattern 2: Strip Small Factors First¶
Intent: Quick trial division by small primes removes easy factors and the even case, so rho only runs on the hard remainder.
def strip_small(n, factors, limit=1000):
d = 2
while d <= limit and d * d <= n:
while n % d == 0:
factors.append(d)
n //= d
d += 1
return n # the remaining (possibly prime) cofactor
Pattern 3: Recurse with Miller-Rabin to Fully Factor¶
Intent: rho splits; primality decides when to stop.
Error Handling¶
| Error | Cause | Fix |
|---|---|---|
Wrong / overflowed factor on 64-bit n | x*x overflowed before % n. | Use an overflow-safe mulmod (__int128 / bits.Mul64 / BigInteger / Python big ints). |
| Infinite loop | Ran rho on a prime n (no nontrivial factor exists). | Test primality with Miller-Rabin first; never feed rho a prime. |
Always returns n (no progress) | Bad constant c (e.g. c = 0 makes the sequence degenerate). | Retry with a different c; never use c = 0 or c = n − 2. |
Misses factor 2 | Even n not handled; rho with c can stall on small factors. | Strip 2 (and other small primes) by trial division first. |
| Recursion never terminates | Forgot the isPrime base case in factor. | Always check isPrime(n) before recursing. |
| Negative difference / wrong gcd | Computed x − y as a signed value that wrapped. | Use abs(x − y) (or a branch) on unsigned types. |
Performance Tips¶
- Use a real
mulmod. On 64-bitn,a*boverflows;__int128(C/C++),bits.Mul64+bits.Div64(Go), or Montgomery multiplication makes each step fast and correct. - Strip small primes first. A short trial-division sieve up to ~1000 removes most factors instantly and shrinks the number rho must crack.
- Batch the gcd (Brent). Computing a gcd every step is wasteful; accumulate the product of several
|x − y|values modnand take one gcd per batch (covered inmiddle.md). - Prefer Brent's variant. Brent's cycle detection does fewer
fevaluations than Floyd's tortoise/hare for the same work — typically ~25% faster (seemiddle.md). - Pick the smallest factor early. rho's cost scales with the smallest prime, so it naturally finds small factors first; stripping them keeps the remaining
nsmall.
Best Practices¶
- Always run Miller-Rabin first to avoid looping forever on a prime, and to decide when recursion stops.
- Always use an overflow-safe mulmod for any
nbeyond ~2^{31}; never write a plain(a*b) % nfor bign. - Handle
c-restarts: wrap rho in a loop that retries with a newc(and reseed) when the gcd returnsn. - Strip small primes by trial division before invoking rho; it is faster and dodges rho's weakness on tiny factors.
- Test the full factorizer against a trial-division oracle on many random small
nso you trust it on largen. - Keep
pollardRho,mulmod,gcd, andisPrimeas small, separately testable functions — most bugs hide inmulmod.
Edge Cases & Pitfalls¶
nis prime — rho cannot split it and loops; you must detect primality first.nis even — return2immediately; do not waste rho on the factor2.n = 1— has no prime factors; the recursion base case must handle it.nis a perfect power (e.g.p²,p³) — rho can struggle to split a prime power; some implementations check for perfect powers first (seesenior.md).- Small
n(< ~100) — just trial-divide; rho's overhead is not worth it. - The gcd returns
n— the rare collapse; restart with a differentc. Do not treatnas a factor. - Overflow — the single most common correctness killer:
x*xfor 64-bitnoverflows a 64-bit product. Always use a safe mulmod.
Common Mistakes¶
- Plain
(a*b) % nfor 64-bitn— silently overflows and returns garbage factors. Use a 128-bit / Montgomery mulmod. - Running rho on a prime — infinite loop. Always Miller-Rabin first.
- Using
c = 0— makesf(x) = x²degenerate (it can get stuck at fixed points like0and1). Start atc = 1. - Never restarting on
gcd == n— the algorithm gets stuck; you must retry with a newc. - Treating the gcd
== nas a factor —nis not a nontrivial factor; this corrupts the factorization. - Forgetting to strip the factor
2— rho is awkward with small factors; handle evennup front. - No primality base case in recursion —
factorrecurses forever; always stop whenisPrime(n). - Assuming rho cracks RSA — for cryptographic
n,n^{1/4}is still impossibly large; rho is a medium-size tool only.
Cheat Sheet¶
| Question | Tool | Note |
|---|---|---|
Find one factor of composite n | Pollard's rho | O(n^{1/4}) expected |
Is n prime? | Miller-Rabin (sibling 08) | needed before & during factoring |
| Turn a collision into a factor | gcd(|x − y|, n) | nontrivial iff 1 < gcd < n |
| Detect the cycle | Floyd (tortoise/hare) or Brent | O(1) memory |
gcd returned n | restart | new c, new seed |
| Strip easy factors | trial division to ~1000 | do this first |
Fully factor n | recurse + Miller-Rabin | split, then recurse on both parts |
Core algorithm:
pollardRho(n):
if n even: return 2
for c = 1, 2, 3, …:
x = y = 2; d = 1
f(v) = (v*v + c) mod n # use a SAFE mulmod
while d == 1:
x = f(x) # tortoise (1 step)
y = f(f(y)) # hare (2 steps)
d = gcd(|x − y|, n)
if d != n: return d # nontrivial factor
# else d == n → next c
# expected O(n^{1/4}); pair with Miller-Rabin to fully factor
Visual Animation¶
See
animation.htmlfor an interactive visualization.The animation demonstrates: - The sequence
xᵢ = (x² + c) mod ntracing out the "rho" shape (a tail leading into a cycle). - The tortoise (1 step) and hare (2 steps) pointers chasing each other around the loop. - Thegcd(|x − y|, n)being computed each round, with1, the factor, ornhighlighted. - A factor popping out when the gcd lands strictly between1andn, plus editablenandcand Step/Run/Reset controls.
Summary¶
Pollard's rho factors a composite n by walking the pseudo-random sequence x ← x² + c (mod n) and watching for a collision modulo a hidden prime p: when xᵢ ≡ xⱼ (mod p) the difference is a multiple of p, so gcd(|xᵢ − xⱼ|, n) is a nontrivial factor. Floyd's tortoise-and-hare finds the cycle in O(1) memory, and the birthday paradox makes the whole search take only ≈ √p ≤ n^{1/4} steps — the square root of trial division's √n. The one rule never to forget: rho splits a composite into one factor; to fully factor you strip small primes by trial division, test primality with Miller-Rabin (sibling 08), and recurse on the pieces. Get the overflow-safe mulmod right and you can factor 64-bit numbers in microseconds.
Further Reading¶
- J. M. Pollard, A Monte Carlo method for factorization (1975) — the original two-page paper.
- Sibling topic
01-gcd-euclidean— the gcd that turns a collision into a factor. - Sibling topic
08-miller-rabin— the primality test that decides when to stop recursing. - R. Brent, An improved Monte Carlo factorization algorithm (1980) — the faster cycle-detection variant (
middle.md). - cp-algorithms.com — "Integer factorization" and "Pollard's rho algorithm".
- Introduction to Algorithms (CLRS), number-theoretic algorithms chapter — Pollard's rho and gcd.