Quicksort (Divide and Conquer by Partitioning) — Middle Level¶
Focus: The two classic partition schemes (Lomuto vs Hoare) and why they differ, the full menu of pivot-selection strategies, three-way partitioning for duplicate-heavy data, quickselect for order statistics, tail-call elimination to bound stack depth, and a head-to-head comparison with merge sort.
Table of Contents¶
- Introduction
- Deeper Concepts
- Comparison with Alternatives
- Advanced Patterns
- Three-Way Partitioning (Dutch National Flag)
- Quickselect: k-th Order Statistic
- Code Examples
- Error Handling
- Performance Analysis
- Best Practices
- Visual Animation
- Summary
Introduction¶
At junior level the message was a single idea: partition around a pivot, recurse on both sides, no merge. At middle level you confront the engineering decisions that make quicksort correct and fast in practice:
- There are two famous partition schemes — Lomuto and Hoare. They behave differently on duplicates and do different numbers of swaps. Which one, and why?
- Pivot selection is the whole ballgame. First, last, random, median-of-three, median-of-medians — what are the trade-offs?
- When the array is full of duplicate keys, plain quicksort degrades. The Dutch National Flag three-way partition fixes it. How?
- Quicksort recurses into both sides; quickselect recurses into only one and finds the k-th smallest in expected
O(n). Why does dropping one recursion drop the complexity fromn log nton? - Bad splits make recursion
O(n)deep. Tail-call elimination (recurse small side, loop large side) caps it atO(log n). How?
These are the questions that separate "I can write quicksort" from "I can write a quicksort that survives sorted input, duplicate-heavy input, and a deep call stack."
Deeper Concepts¶
Lomuto vs Hoare partition¶
Both schemes rearrange a subarray around a pivot in O(n), but they work differently.
Lomuto (single index, pivot at the end):
partition_lomuto(a, lo, hi): # pivot = a[hi]
pivot = a[hi]
i = lo - 1
for j = lo to hi - 1:
if a[j] <= pivot:
i++; swap(a[i], a[j])
swap(a[i+1], a[hi])
return i + 1 # pivot's FINAL index
Hoare (two pointers moving inward, pivot at the start — the original 1961 scheme):
partition_hoare(a, lo, hi): # pivot = a[lo]
pivot = a[lo]
i = lo - 1; j = hi + 1
loop:
do i++ while a[i] < pivot
do j-- while a[j] > pivot
if i >= j: return j # a SPLIT POINT, not the pivot's index
swap(a[i], a[j])
The crucial differences:
| Aspect | Lomuto | Hoare |
|---|---|---|
| Pointers | one moving index i plus scan j | two pointers i, j moving toward each other |
| Pivot position | last element; ends in final sorted slot | first element; not placed in final slot |
| Return value | pivot's final index p | a split point j; recurse [lo,j] and [j+1,hi] |
| Swaps (typical) | more swaps | ~3× fewer swaps on average |
| Equal elements | sends == pivot to one side; degrades to O(n²) on all-equal | splits equal elements across both sides — stays balanced on all-equal |
| Recursion call | (lo, p-1), (p+1, hi) | (lo, j), (j+1, hi) — note j, not j-1 |
The classic Lomuto bug: on an all-equal array, every element is ≤ pivot, so i advances every step, the pivot lands at the far right, and you recurse on n-1 elements — O(n²). Hoare's scheme handles all-equal arrays gracefully because its two pointers meet in the middle, splitting equal elements evenly. This is why production code prefers Hoare (or three-way) over naive Lomuto. Lomuto's compensating virtue is that it is simpler to get right and returns the pivot's exact final index, which makes quickselect cleaner.
Watch the recursion bounds. Hoare returns a split point
jsuch thata[lo..j] ≤ a[j+1..hi](loosely), so you recurse on(lo, j)and(j+1, hi)— never(lo, j-1), which would drop an element and can loop. Mixing up the bounds is the #1 Hoare bug.
Why pivot quality controls the recurrence¶
The running time obeys the recurrence T(n) = T(|left|) + T(|right|) + O(n), where the O(n) is the partition pass:
- Balanced split (
|left| ≈ |right| ≈ n/2):T(n) = 2T(n/2) + O(n) = O(n log n). - Unbalanced split (
|left| = 0,|right| = n-1):T(n) = T(n-1) + O(n) = O(n²).
A constant-fraction split — say always 90/10 — is still O(n log n) (just with a larger constant), because log_{10/9} n is still logarithmic. The worst case requires splits that are unbalanced every single time, which is exactly what a fixed pivot on sorted input produces and what randomization prevents.
Tail-call elimination caps stack depth¶
Plain recursion on both sides can reach depth O(n) if the smaller side is always tiny. The fix: recurse on the smaller partition, and loop (iterate) on the larger one.
quicksort(a, lo, hi):
while lo < hi:
p = partition(a, lo, hi)
if (p - lo) < (hi - p): # left side smaller
quicksort(a, lo, p - 1) # recurse on the SMALL side
lo = p + 1 # loop on the LARGE side
else:
quicksort(a, p + 1, hi)
hi = p - 1
Because each recursive call now handles at most half the elements, the recursion depth is O(log n) even in the worst case for time. (The time can still be O(n²), but the stack stays O(log n).) This is essential for sorting large arrays without a stack overflow.
Comparison with Alternatives¶
Quicksort vs Merge Sort¶
| Property | Quicksort | Merge Sort (01-merge-sort) |
|---|---|---|
| Average time | O(n log n) | O(n log n) |
| Worst time | O(n²) | O(n log n) guaranteed |
| Extra space | O(log n) (in place) | O(n) (needs a scratch array) |
| Stable? | No | Yes |
| Cache behavior | Excellent (sequential partition scan) | Good, but extra array hurts |
| Where work happens | the divide (partition) | the combine (merge) |
| Linked lists | awkward | natural fit |
| Typical constant factor | smaller — usually wins wall-clock | larger |
Rule of thumb: quicksort for in-memory arrays where speed and memory matter and stability does not; merge sort when you need stability or a worst-case guarantee, or when sorting linked lists or external (on-disk) data.
Quicksort vs Heapsort¶
| Property | Quicksort | Heapsort |
|---|---|---|
| Average time | O(n log n) (smaller constant) | O(n log n) |
| Worst time | O(n²) | O(n log n) guaranteed |
| Space | O(log n) | O(1) |
| Stable? | No | No |
| Cache behavior | Excellent | Poor (jumps around the heap) |
Heapsort's guaranteed O(n log n) makes it the fallback in introsort: start with quicksort, but if recursion gets too deep (a sign of bad pivots), switch to heapsort to guarantee O(n log n). (Full treatment in senior.md.)
Pivot strategies compared¶
| Strategy | Extra cost per call | Sorted-input behavior | Worst case |
|---|---|---|---|
| First or last element | O(1) | O(n²) — the classic trap | O(n²) |
| Random element | O(1) (one RNG call) | expected O(n log n) | O(n²) but vanishingly unlikely |
| Median-of-three | O(1) (3 comparisons) | O(n log n) on sorted/reverse | O(n²) on crafted "median-of-three killer" inputs |
| Ninther (median of 3 medians-of-3) | O(1) (more comparisons) | very robust | O(n²) extremely rarely |
| Median-of-medians | O(n) per call | guaranteed good | O(n log n) worst — but large constant |
Median-of-three picks the median of a[lo], a[mid], a[hi]. It is the practical sweet spot: nearly free, kills the sorted-input trap, and as a bonus sorting those three elements lets you skip them in the partition. Median-of-medians guarantees a good pivot (and O(n log n) worst case) but its O(n) overhead per call makes it slower in practice — it is mostly used to give quickselect a worst-case O(n) guarantee, not for sorting.
Advanced Patterns¶
Pattern: Hoare partition with correct recursion bounds¶
Go¶
package main
import "fmt"
func hoarePartition(a []int, lo, hi int) int {
pivot := a[lo] // pivot = first element
i, j := lo-1, hi+1
for {
for {
i++
if a[i] >= pivot {
break
}
}
for {
j--
if a[j] <= pivot {
break
}
}
if i >= j {
return j // split point
}
a[i], a[j] = a[j], a[i]
}
}
func quicksortHoare(a []int, lo, hi int) {
if lo < hi {
p := hoarePartition(a, lo, hi)
quicksortHoare(a, lo, p) // note: p, NOT p-1
quicksortHoare(a, p+1, hi)
}
}
func main() {
a := []int{3, 7, 8, 5, 2, 1, 9, 5, 4}
quicksortHoare(a, 0, len(a)-1)
fmt.Println(a) // [1 2 3 4 5 5 7 8 9]
}
Java¶
public class HoareQuicksort {
static int hoarePartition(int[] a, int lo, int hi) {
int pivot = a[lo];
int i = lo - 1, j = hi + 1;
while (true) {
do { i++; } while (a[i] < pivot);
do { j--; } while (a[j] > pivot);
if (i >= j) return j; // split point
int t = a[i]; a[i] = a[j]; a[j] = t;
}
}
static void quicksort(int[] a, int lo, int hi) {
if (lo < hi) {
int p = hoarePartition(a, lo, hi);
quicksort(a, lo, p); // p, not p-1
quicksort(a, p + 1, hi);
}
}
public static void main(String[] args) {
int[] a = {3, 7, 8, 5, 2, 1, 9, 5, 4};
quicksort(a, 0, a.length - 1);
System.out.println(java.util.Arrays.toString(a));
}
}
Python¶
def hoare_partition(a, lo, hi):
pivot = a[lo] # pivot = first element
i, j = lo - 1, hi + 1
while True:
i += 1
while a[i] < pivot:
i += 1
j -= 1
while a[j] > pivot:
j -= 1
if i >= j:
return j # split point
a[i], a[j] = a[j], a[i]
def quicksort_hoare(a, lo, hi):
if lo < hi:
p = hoare_partition(a, lo, hi)
quicksort_hoare(a, lo, p) # p, not p-1
quicksort_hoare(a, p + 1, hi)
if __name__ == "__main__":
a = [3, 7, 8, 5, 2, 1, 9, 5, 4]
quicksort_hoare(a, 0, len(a) - 1)
print(a)
Pattern: Median-of-three pivot¶
Intent: Choose the median of a[lo], a[mid], a[hi] as the pivot to dodge the sorted-input worst case nearly for free.
def median_of_three(a, lo, hi):
mid = (lo + hi) // 2
# sort the three so a[lo] <= a[mid] <= a[hi]
if a[mid] < a[lo]:
a[lo], a[mid] = a[mid], a[lo]
if a[hi] < a[lo]:
a[lo], a[hi] = a[hi], a[lo]
if a[hi] < a[mid]:
a[mid], a[hi] = a[hi], a[mid]
# now a[mid] is the median; move it to hi-1 (or hi) for Lomuto
a[mid], a[hi] = a[hi], a[mid]
return a[hi] # the pivot value, now at the end
After sorting the three sentinels, the median sits in the middle; swapping it to hi lets the standard Lomuto partition run unchanged.
Pattern: Insertion-sort cutoff for small subarrays¶
Intent: Recursion overhead dominates for tiny ranges. Switch to insertion sort below a threshold (typically 10–16). Every production hybrid does this.
CUTOFF = 16
def quicksort_cutoff(a, lo, hi):
if hi - lo + 1 <= CUTOFF:
insertion_sort(a, lo, hi) # cheap on small/nearly-sorted ranges
return
p = partition(a, lo, hi)
quicksort_cutoff(a, lo, p - 1)
quicksort_cutoff(a, p + 1, hi)
Three-Way Partitioning (Dutch National Flag)¶
When the array has many duplicate keys, two-way partitioning is wasteful: equal elements get shuffled into one side and re-partitioned repeatedly. Edsger Dijkstra's Dutch National Flag algorithm partitions into three regions in a single pass — < pivot, == pivot, > pivot — and then recurses only on the < and > regions. Elements equal to the pivot are placed and never touched again.
The invariant uses three pointers lt, i, gt:
three_way(a, lo, hi):
pivot = a[lo]
lt = lo; i = lo; gt = hi
while i <= gt:
if a[i] < pivot:
swap(a[lt], a[i]); lt++; i++
elif a[i] > pivot:
swap(a[i], a[gt]); gt-- # do NOT advance i: new a[i] is unscanned
else: # a[i] == pivot
i++
quicksort(a, lo, lt - 1)
quicksort(a, gt + 1, hi)
The key subtlety: when a[i] > pivot you swap it to the > region and do not advance i, because the element just swapped in from gt has not been examined yet.
On an array of all-equal keys, three-way partition finishes the whole range in one O(n) pass (the == region swallows everything), turning the catastrophic O(n²) Lomuto case into O(n). This is the foundation of the production "3-way quicksort" used for keys with few distinct values.
Quickselect: k-th Order Statistic¶
Quickselect finds the k-th smallest element (the k-th order statistic) without fully sorting. It uses the same partition step, but after partitioning it recurses into only one side — the side that contains rank k.
quickselect(a, lo, hi, k): # find element of rank k (0-indexed) in a[lo..hi]
if lo == hi: return a[lo]
p = partition(a, lo, hi) # pivot lands at p (its final rank)
if k == p: return a[p] # pivot IS the answer
elif k < p: return quickselect(a, lo, p - 1, k)
else: return quickselect(a, p + 1, hi, k)
Why expected O(n) and not O(n log n)? Quicksort recurses into both sides, costing T(n) = 2T(n/2) + O(n) = O(n log n). Quickselect recurses into one side, costing T(n) = T(n/2) + O(n). That recurrence solves to O(n) (1 + 1/2 + 1/4 + …) = O(2n) = O(n) — the geometric series of partition costs converges to a constant times n. Dropping one of the two recursive calls collapses the log n factor.
Quickselect's worst case is still O(n²) (bad pivots), but median-of-medians pivot selection guarantees a constant-fraction split and gives quickselect a worst-case O(n) — the famous BFPRT algorithm (Blum-Floyd-Pratt-Rivest-Tarjan, 1973). In practice a random pivot is used (expected O(n), much smaller constant); median-of-medians is reserved for when a worst-case guarantee is required. The full expected-time proof is in professional.md.
Code Examples¶
Three-way quicksort + quickselect (random pivot)¶
Go¶
package main
import (
"fmt"
"math/rand"
)
// Three-way (Dutch National Flag) quicksort: robust on duplicate keys.
func quicksort3(a []int, lo, hi int) {
if lo >= hi {
return
}
r := lo + rand.Intn(hi-lo+1)
a[lo], a[r] = a[r], a[lo] // random pivot to front
pivot := a[lo]
lt, i, gt := lo, lo, hi
for i <= gt {
if a[i] < pivot {
a[lt], a[i] = a[i], a[lt]
lt++
i++
} else if a[i] > pivot {
a[i], a[gt] = a[gt], a[i]
gt--
} else {
i++
}
}
quicksort3(a, lo, lt-1)
quicksort3(a, gt+1, hi)
}
// Quickselect: k-th smallest (0-indexed), expected O(n).
func quickselect(a []int, k int) int {
lo, hi := 0, len(a)-1
for lo < hi {
r := lo + rand.Intn(hi-lo+1)
a[r], a[hi] = a[hi], a[r]
pivot := a[hi]
i := lo - 1
for j := lo; j < hi; j++ {
if a[j] <= pivot {
i++
a[i], a[j] = a[j], a[i]
}
}
i++
a[i], a[hi] = a[hi], a[i]
if k == i {
return a[i]
} else if k < i {
hi = i - 1
} else {
lo = i + 1
}
}
return a[lo]
}
func main() {
a := []int{5, 5, 2, 8, 5, 1, 9, 5, 4}
quicksort3(a, 0, len(a)-1)
fmt.Println(a) // [1 2 4 5 5 5 5 8 9]
b := []int{7, 2, 9, 4, 1, 8, 3}
fmt.Println(quickselect(b, 2)) // 3rd smallest (0-indexed rank 2) = 3
}
Java¶
import java.util.Random;
public class ThreeWayAndSelect {
static final Random rng = new Random();
static void quicksort3(int[] a, int lo, int hi) {
if (lo >= hi) return;
int r = lo + rng.nextInt(hi - lo + 1);
int t = a[lo]; a[lo] = a[r]; a[r] = t; // random pivot to front
int pivot = a[lo];
int lt = lo, i = lo, gt = hi;
while (i <= gt) {
if (a[i] < pivot) {
int s = a[lt]; a[lt] = a[i]; a[i] = s; lt++; i++;
} else if (a[i] > pivot) {
int s = a[i]; a[i] = a[gt]; a[gt] = s; gt--;
} else {
i++;
}
}
quicksort3(a, lo, lt - 1);
quicksort3(a, gt + 1, hi);
}
static int quickselect(int[] a, int k) {
int lo = 0, hi = a.length - 1;
while (lo < hi) {
int r = lo + rng.nextInt(hi - lo + 1);
int t = a[r]; a[r] = a[hi]; a[hi] = t;
int pivot = a[hi], i = lo - 1;
for (int j = lo; j < hi; j++)
if (a[j] <= pivot) { i++; int s = a[i]; a[i] = a[j]; a[j] = s; }
i++;
int s = a[i]; a[i] = a[hi]; a[hi] = s;
if (k == i) return a[i];
else if (k < i) hi = i - 1;
else lo = i + 1;
}
return a[lo];
}
public static void main(String[] args) {
int[] a = {5, 5, 2, 8, 5, 1, 9, 5, 4};
quicksort3(a, 0, a.length - 1);
System.out.println(java.util.Arrays.toString(a));
int[] b = {7, 2, 9, 4, 1, 8, 3};
System.out.println(quickselect(b, 2)); // 3
}
}
Python¶
import random
def quicksort3(a, lo, hi):
"""Three-way (Dutch National Flag) quicksort; robust on duplicates."""
if lo >= hi:
return
r = random.randint(lo, hi)
a[lo], a[r] = a[r], a[lo] # random pivot to front
pivot = a[lo]
lt, i, gt = lo, lo, hi
while i <= gt:
if a[i] < pivot:
a[lt], a[i] = a[i], a[lt]
lt += 1
i += 1
elif a[i] > pivot:
a[i], a[gt] = a[gt], a[i]
gt -= 1
else:
i += 1
quicksort3(a, lo, lt - 1)
quicksort3(a, gt + 1, hi)
def quickselect(a, k):
"""k-th smallest (0-indexed), expected O(n)."""
lo, hi = 0, len(a) - 1
while lo < hi:
r = random.randint(lo, hi)
a[r], a[hi] = a[hi], a[r]
pivot = a[hi]
i = lo - 1
for j in range(lo, hi):
if a[j] <= pivot:
i += 1
a[i], a[j] = a[j], a[i]
i += 1
a[i], a[hi] = a[hi], a[i]
if k == i:
return a[i]
elif k < i:
hi = i - 1
else:
lo = i + 1
return a[lo]
if __name__ == "__main__":
a = [5, 5, 2, 8, 5, 1, 9, 5, 4]
quicksort3(a, 0, len(a) - 1)
print(a) # [1, 2, 4, 5, 5, 5, 5, 8, 9]
b = [7, 2, 9, 4, 1, 8, 3]
print(quickselect(b, 2)) # 3
Error Handling¶
| Scenario | What goes wrong | Correct approach |
|---|---|---|
| Hoare returns wrong bounds | Recursing (lo, p-1) instead of (lo, p) drops the split element. | Hoare recurses on (lo, p) and (p+1, hi); Lomuto on (lo, p-1) and (p+1, hi). |
Three-way i advanced after > swap | Skips an unscanned element. | After swapping into the > region, do not advance i. |
All-equal array is O(n²) | Using two-way Lomuto. | Use three-way partition; equal keys finish in one pass. |
| Quickselect returns wrong rank | Off-by-one between 0-indexed k and pivot rank p. | Compare k == p, recurse k < p left, k > p right; keep both 0-indexed. |
| Stack overflow on bad splits | Recursing both sides without small-side-first. | Recurse smaller side, loop larger (tail-call elimination). |
| Median-of-three on size-1/2 ranges | mid index coincides with lo/hi. | Guard tiny ranges (cutoff to insertion sort handles this). |
Performance Analysis¶
| Input pattern | Two-way Lomuto (fixed pivot) | Two-way (random pivot) | Three-way (random pivot) |
|---|---|---|---|
| Random distinct | O(n log n) | O(n log n) | O(n log n) |
| Already sorted | O(n²) | O(n log n) | O(n log n) |
| Reverse sorted | O(n²) | O(n log n) | O(n log n) |
| All equal | O(n²) | O(n²) | O(n) |
| Few distinct values | poor | mediocre | O(n·#distinct) ≈ O(n log n) |
The table makes the engineering decisions concrete: randomize the pivot to fix the sorted-input columns, and three-way partition to fix the duplicate-heavy columns. A production quicksort does both.
Swap counts: Hoare does roughly n/6 swaps per partition pass on random data, versus Lomuto's n/2 — a constant-factor win that adds up. This, plus Hoare's graceful handling of duplicates, is why the original Hoare scheme (or three-way) underlies most library implementations rather than the textbook Lomuto.
# Quick empirical check of the all-equal pathology
import time
def time_sort(sort_fn, a):
b = a[:]
t = time.perf_counter()
sort_fn(b, 0, len(b) - 1)
return time.perf_counter() - t
# all-equal: two-way Lomuto is O(n^2); three-way is O(n)
alleq = [7] * 5000
# time_sort(lomuto_quicksort, alleq) -> slow
# time_sort(quicksort3, alleq) -> fast
Best Practices¶
- Pick the partition for your data: Hoare or three-way for general/duplicate-heavy input; Lomuto only when you want the pivot's exact index (e.g. for quickselect) or for teaching.
- Always randomize or median-of-three the pivot. Fixed pivots are an
O(n²)time bomb on sorted input. - Three-way partition when keys repeat. It turns the all-equal
O(n²)disaster intoO(n). - Recurse small side first, loop the large side to keep the stack at
O(log n). - Cut off to insertion sort for subarrays of size ≤ ~16.
- Reuse one tested
partitionand parameterize the pivot choice; most bugs live in the partition and its recursion bounds. - Use quickselect, not a full sort, when you only need an order statistic — expected
O(n)beatsO(n log n).
Visual Animation¶
See
animation.htmlfor an interactive view.The middle-level animation highlights: - The Lomuto partition scan with the
i(boundary) andj(scan) pointers, and each swap that grows the≤ pivotregion. - The pivot landing in its final position and the array splitting into left and right subproblems. - The recursion tree forming as each partition spawns two smaller partitions, and how a balanced split (good pivot) keeps the tree shallow.
Summary¶
The two partition schemes — Lomuto (one index, pivot last, returns the pivot's final position) and Hoare (two pointers, pivot first, returns a split point, ~3× fewer swaps and graceful on duplicates) — are the two ways to do the O(n) divide step. Pivot selection is the dominant performance lever: random or median-of-three keeps splits balanced and dodges the sorted-input O(n²) trap; median-of-medians guarantees it at a higher constant. Three-way partitioning (Dutch National Flag) turns the catastrophic all-equal case from O(n²) into O(n) by isolating keys equal to the pivot in one pass. Quickselect reuses the partition but recurses into only one side, finding the k-th order statistic in expected O(n) instead of O(n log n). Tail-call elimination (recurse small, loop large) bounds the stack at O(log n). Compared to merge sort, quicksort trades a guaranteed worst case for in-place sorting and a smaller constant — the right trade for most in-memory array sorting.