Quick Sort — Optimize¶
1. Random Pivot — Avoid O(n²)¶
Before¶
After¶
Win: Eliminates O(n²) on adversarial input.2. Median-of-Three Pivot¶
After¶
def median_of_three(arr, lo, hi):
mid = (lo + hi) // 2
if arr[lo] > arr[mid]: arr[lo], arr[mid] = arr[mid], arr[lo]
if arr[lo] > arr[hi]: arr[lo], arr[hi] = arr[hi], arr[lo]
if arr[mid] > arr[hi]: arr[mid], arr[hi] = arr[hi], arr[mid]
arr[mid], arr[hi-1] = arr[hi-1], arr[mid]
return arr[hi-1]
3. Insertion Sort Cutoff¶
After¶
CUTOFF = 16
def quick_sort(arr, lo, hi):
if hi - lo < CUTOFF:
insertion_sort(arr, lo, hi)
return
# ...
4. Tail-Recursion Optimization (Bound Stack)¶
Before¶
def quick_sort(arr, lo, hi):
if lo < hi:
p = partition(arr, lo, hi)
quick_sort(arr, lo, p-1)
quick_sort(arr, p+1, hi)
After¶
def quick_sort(arr, lo, hi):
while lo < hi:
p = partition(arr, lo, hi)
if p - lo < hi - p:
quick_sort(arr, lo, p-1)
lo = p + 1
else:
quick_sort(arr, p+1, hi)
hi = p - 1
5. 3-Way Partition for Duplicates¶
Before¶
Vanilla Quick Sort on [5,5,5,5,5,...] → O(n²).
After¶
def quick_sort_3way(arr, lo, hi):
if lo >= hi: return
pivot = arr[lo]; lt, gt, i = lo, hi, lo + 1
while i <= gt:
if arr[i] < pivot:
arr[lt], arr[i] = arr[i], arr[lt]; lt += 1; i += 1
elif arr[i] > pivot:
arr[i], arr[gt] = arr[gt], arr[i]; gt -= 1
else: i += 1
quick_sort_3way(arr, lo, lt-1)
quick_sort_3way(arr, gt+1, hi)
6. Block Partition (Modern Pdqsort Trick)¶
Batch comparisons for cache-friendly partitioning.
For 64 elements at a time:
1. Compute compare results into a bitmap
2. Apply all swaps in a tight loop
7. Dual-Pivot Quicksort¶
Two pivots → 3 partitions. Win: ~5% faster than single-pivot; used in Java for primitives.
8. Hybrid Introsort (Quick + Heap)¶
import math, heapq
def introsort(arr):
max_depth = 2 * int(math.log2(len(arr)))
_sort(arr, 0, len(arr)-1, max_depth)
def _sort(arr, lo, hi, depth):
if hi - lo < 16:
insertion_sort(arr, lo, hi); return
if depth == 0:
heap_sort_range(arr, lo, hi); return
p = partition(arr, lo, hi)
_sort(arr, lo, p-1, depth-1)
_sort(arr, p+1, hi, depth-1)
9. Iterative (No Recursion)¶
def quick_sort_iter(arr):
stack = [(0, len(arr)-1)]
while stack:
lo, hi = stack.pop()
if lo < hi:
p = partition(arr, lo, hi)
stack.append((lo, p-1))
stack.append((p+1, hi))
10. Branchless Partition¶
// Pseudocode for branchless update
int cmp = (arr[j] <= pivot);
arr[i + cmp] = arr[j]; // conditional write via cmov
i += cmp;
11. Parallel Sort¶
func ParallelQuickSort(arr []int) {
if len(arr) < 10000 { QuickSort(arr); return }
p := partition(arr, 0, len(arr)-1)
var wg sync.WaitGroup
wg.Add(2)
go func() { defer wg.Done(); ParallelQuickSort(arr[:p]) }()
go func() { defer wg.Done(); ParallelQuickSort(arr[p+1:]) }()
wg.Wait()
}
12. Use Pdqsort¶
Win: All optimizations above already implemented; no work for you.Summary¶
| # | Optimization | Win |
|---|---|---|
| 1 | Random pivot | Eliminates O(n²) |
| 2 | Median-of-three | 10% faster |
| 3 | Insertion cutoff | 1.5× |
| 4 | Tail-recursion | O(log n) stack |
| 5 | 3-way partition | O(n) on duplicates |
| 6 | Block partition | 30-50% on random |
| 7 | Dual-pivot | 5% |
| 8 | Introsort | Worst-case O(n log n) |
| 9 | Iterative | No stack overflow |
| 10 | Branchless | 10-20% in C/Go |
| 11 | Parallel | 3-5× on 8 cores |
| 12 | Use Pdqsort | All of the above for free |
Final benchmark (Go n=10⁶ random):