Skip to content

Quick Sort — Optimize

1. Random Pivot — Avoid O(n²)

Before

pivot = arr[hi]  # O(n²) on sorted

After

import random
rnd = random.randint(lo, hi)
arr[rnd], arr[hi] = arr[hi], arr[rnd]
pivot = arr[hi]
Win: Eliminates O(n²) on adversarial input.

2. Median-of-Three Pivot

After

def median_of_three(arr, lo, hi):
    mid = (lo + hi) // 2
    if arr[lo] > arr[mid]: arr[lo], arr[mid] = arr[mid], arr[lo]
    if arr[lo] > arr[hi]:  arr[lo], arr[hi]  = arr[hi], arr[lo]
    if arr[mid] > arr[hi]: arr[mid], arr[hi] = arr[hi], arr[mid]
    arr[mid], arr[hi-1] = arr[hi-1], arr[mid]
    return arr[hi-1]
Win: ~10% faster than first/last pivot; immune to common adversarial patterns.

3. Insertion Sort Cutoff

After

CUTOFF = 16
def quick_sort(arr, lo, hi):
    if hi - lo < CUTOFF:
        insertion_sort(arr, lo, hi)
        return
    # ...
Win: ~1.5× faster overall.

4. Tail-Recursion Optimization (Bound Stack)

Before

def quick_sort(arr, lo, hi):
    if lo < hi:
        p = partition(arr, lo, hi)
        quick_sort(arr, lo, p-1)
        quick_sort(arr, p+1, hi)

After

def quick_sort(arr, lo, hi):
    while lo < hi:
        p = partition(arr, lo, hi)
        if p - lo < hi - p:
            quick_sort(arr, lo, p-1)
            lo = p + 1
        else:
            quick_sort(arr, p+1, hi)
            hi = p - 1
Win: Stack bounded to O(log n) even on bad pivots.

5. 3-Way Partition for Duplicates

Before

Vanilla Quick Sort on [5,5,5,5,5,...] → O(n²).

After

def quick_sort_3way(arr, lo, hi):
    if lo >= hi: return
    pivot = arr[lo]; lt, gt, i = lo, hi, lo + 1
    while i <= gt:
        if arr[i] < pivot:
            arr[lt], arr[i] = arr[i], arr[lt]; lt += 1; i += 1
        elif arr[i] > pivot:
            arr[i], arr[gt] = arr[gt], arr[i]; gt -= 1
        else: i += 1
    quick_sort_3way(arr, lo, lt-1)
    quick_sort_3way(arr, gt+1, hi)
Win: O(n) on all-equal input.

6. Block Partition (Modern Pdqsort Trick)

Batch comparisons for cache-friendly partitioning.

For 64 elements at a time:
  1. Compute compare results into a bitmap
  2. Apply all swaps in a tight loop
Win: 30-50% faster on random data due to fewer branch mispredictions.

7. Dual-Pivot Quicksort

Two pivots → 3 partitions. Win: ~5% faster than single-pivot; used in Java for primitives.

8. Hybrid Introsort (Quick + Heap)

import math, heapq
def introsort(arr):
    max_depth = 2 * int(math.log2(len(arr)))
    _sort(arr, 0, len(arr)-1, max_depth)

def _sort(arr, lo, hi, depth):
    if hi - lo < 16:
        insertion_sort(arr, lo, hi); return
    if depth == 0:
        heap_sort_range(arr, lo, hi); return
    p = partition(arr, lo, hi)
    _sort(arr, lo, p-1, depth-1)
    _sort(arr, p+1, hi, depth-1)
Win: Worst case O(n log n) guaranteed. Used in C++ STL.

9. Iterative (No Recursion)

def quick_sort_iter(arr):
    stack = [(0, len(arr)-1)]
    while stack:
        lo, hi = stack.pop()
        if lo < hi:
            p = partition(arr, lo, hi)
            stack.append((lo, p-1))
            stack.append((p+1, hi))
Win: No stack overflow risk; explicit memory.

10. Branchless Partition

// Pseudocode for branchless update
int cmp = (arr[j] <= pivot);
arr[i + cmp] = arr[j];  // conditional write via cmov
i += cmp;
Win: 10-20% in C/Go; Pdqsort uses this.

11. Parallel Sort

func ParallelQuickSort(arr []int) {
    if len(arr) < 10000 { QuickSort(arr); return }
    p := partition(arr, 0, len(arr)-1)
    var wg sync.WaitGroup
    wg.Add(2)
    go func() { defer wg.Done(); ParallelQuickSort(arr[:p]) }()
    go func() { defer wg.Done(); ParallelQuickSort(arr[p+1:]) }()
    wg.Wait()
}
Win: 3-5× on 8 cores for large n.

12. Use Pdqsort

sort.Slice(arr, func(i, j int) bool { return arr[i] < arr[j] })
Win: All optimizations above already implemented; no work for you.


Summary

# Optimization Win
1 Random pivot Eliminates O(n²)
2 Median-of-three 10% faster
3 Insertion cutoff 1.5×
4 Tail-recursion O(log n) stack
5 3-way partition O(n) on duplicates
6 Block partition 30-50% on random
7 Dual-pivot 5%
8 Introsort Worst-case O(n log n)
9 Iterative No stack overflow
10 Branchless 10-20% in C/Go
11 Parallel 3-5× on 8 cores
12 Use Pdqsort All of the above for free

Final benchmark (Go n=10⁶ random):

Vanilla Quick (random pivot)     :  90 ms
+ median-of-three                :  80 ms
+ insertion cutoff               :  60 ms
+ 3-way partition                :  55 ms
Pdqsort (sort.Ints)              :  38 ms ← winner
Vanilla on sorted (first as pivot): TIMEOUT (O(n²))