Shell Sort — Middle Level¶

Table of Contents¶

Introduction
Deeper Concepts
Gap Sequence Deep Dive
Comparison with Alternatives
Advanced Patterns
Benchmark Methodology
Code Examples
Error Handling
Performance Analysis
Best Practices
Visual Animation
Summary

Introduction¶

Focus: "Why does Shell Sort work, why is its complexity tied to the gap sequence, and when do I choose it over Insertion / Merge / Quick?"

At the middle level we move past "Shell Sort is insertion sort with a gap" and ask three sharper questions:

What invariants do h-sorting passes preserve? Knowing this lets us reason about why later passes need less work.
Why does the gap sequence dominate complexity? It is the only knob we have, and it changes the algorithm from O(n²) (Shell's original) to O(n log² n) (Pratt).
When do we pick Shell Sort over insertion / merge / quick sort? Middle engineers need to make this call quickly based on n, memory budget, stack depth, and whether stability is required.

This document assumes you have read the junior introduction.

Deeper Concepts¶

Invariant: After `h`-sorting, the Array Stays `h`-sorted Forever¶

After we finish a pass at gap h, the array is h-sorted: for every valid index i, arr[i] <= arr[i + h].

Key theorem (Knuth, TAOCP Vol. 3): if an array is h-sorted and we then run a pass at gap k < h, the array remains h-sorted after the k-sort.

This is why later passes do less work: each new pass adds a new ordering constraint without destroying the previous ones. By the time g = 1 runs, the array satisfies many ordering constraints simultaneously — every gap in the sequence has been applied — so the final insertion sort is nearly linear.

Invariant: Final Pass Is Plain Insertion Sort¶

The final pass at g = 1 is insertion sort. Therefore Shell Sort is guaranteed correct: insertion sort over a near-sorted array always produces a fully sorted array. Everything before the final pass is "preparation" — it doesn't have to produce a sorted array, only one that the final pass can finish cheaply.

Recurrence Sketch — Why Gap Choice Matters¶

Plain insertion sort: T(n) = O(n²) because each element may travel O(n) positions, and there are n elements.

Shell Sort at gap g: each subsequence has length n / g, and h-sorting a subsequence costs O((n/g)²) in the worst case. There are g subsequences, so one pass costs g · (n/g)² = n² / g.

Sum over all gaps g₁ > g₂ > ... > g_t = 1:

T_total ≤ Σ n² / g_i

If gaps are powers of 2: T = n² · (1/n + 1/(n/2) + ... + 1) = n² · (2 - 1/n) ≈ 2n². Not great.

But if the h-sorted invariants compound, later passes do much less than n² / g work. The exact bound depends on how the gaps share common factors. Hibbard's 2^k - 1 chooses pairwise coprime gaps so that each pass adds independent ordering constraints, leading to O(n^(3/2)).

Why Shell's `n/2, n/4, ...` Sequence Is Bad¶

Shell's original sequence has the property that even-indexed elements never compare with odd-indexed elements until gap = 1. If you place small values at all even positions and large values at all odd positions, every pass at even gap sorts these two interleaved sets independently — the merging only happens at g = 1, forcing the final insertion sort to do O(n²) work.

Adversarial example for Shell's sequence on n = 16:

input: [1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15, 8, 16]

After g = 8, 4, 2: array is unchanged (already h-sorted for all even h). At g = 1: insertion sort runs in O(n²).

Hibbard's 2^k - 1 sequence avoids this because odd and even indices mix at every pass.

Why Pratt's O(n log² n) Works¶

Pratt's sequence uses all integers of the form 2^a · 3^b: 1, 2, 3, 4, 6, 8, 9, 12, 16, 18, .... There are O(log² n) such integers below n, and each pass costs O(n). Total: O(n log² n).

The proof relies on a "3-smooth" property: any inversion is fixed within O(log² n) passes because the gap sequence covers every meaningful distance.

Trade-off: Pratt's sequence has the best provable bound but most passes in practice (~log² n vs. ~log n for Knuth/Hibbard), so the constant factor is high. It's almost never the fastest in benchmarks.

Why Ciura's Empirical Sequence Wins in Practice¶

Ciura ran exhaustive Monte-Carlo benchmarks over millions of random inputs of various sizes and searched for the gap sequence that minimized average runtime. The result: 1, 4, 10, 23, 57, 132, 301, 701, ....

No proof of asymptotic complexity exists. The sequence is "magic" — it just empirically beats every closed-form sequence on n ≤ a few million. Conjectured: between O(n^(5/4)) and O(n^(4/3)).

For n > 701, extrapolate with g_{k+1} = ⌊2.25 · g_k⌋.

Gap Sequence Deep Dive¶

Sequence	Generator	Asymptotic	Pass count	Notes
Shell `n/2`	`g = n/2`; halve	O(n²)	log₂ n	Avoid; only educational
Hibbard	`2^k − 1`	O(n^(3/2))	~log₂ n	Coprime gaps mix odd/even
Knuth `(3^k−1)/2`	`g = 3g + 1`	~O(n^(3/2)) practical	~log₃ n	Simple, decent, widely used
Sedgewick (1986)	Mixed two formulas	O(n^(4/3))	~log₂ n	Best closed-form among proven
Pratt	All `2^a · 3^b` ≤ n	O(n log² n)	O(log² n)	Best proven bound; many passes
Ciura	Empirical: 1,4,10,23,57,132,301,701 + ×2.25 extension	unproven (~O(n^(5/4)))	O(log_{2.25} n)	Fastest in practice

Choosing a Sequence in Production¶

graph TD A[Need Shell Sort?] --> B{n known?} B -->|n ≤ 10^4| C[Ciura — fastest] B -->|n unknown, embedded| D[Knuth — simple closed form] B -->|need proof| E[Pratt — provable n log² n] B -->|need O(n^4/3) guarantee| F[Sedgewick] C --> G[Done] D --> G E --> G F --> G

Adversarial Inputs¶

Each sequence has worst-case inputs:

Sequence	Adversarial input shape	Resulting time
Shell n/2	Interleaved small/large at even/odd positions	O(n²)
Hibbard	Specific permutations exploiting coprimality	O(n^(3/2))
Pratt	None known better than O(n log² n)	O(n log² n)
Ciura	Random and adversarial both fast; no known bad case below threshold	empirical

Comparison with Alternatives¶

Attribute	Shell Sort	Insertion Sort	Merge Sort	Quick Sort	Heap Sort
Time (best)	O(n log n)	O(n)	O(n log n)	O(n log n)	O(n log n)
Time (avg)	O(n^(4/3))–O(n^(3/2))	O(n²)	O(n log n)	O(n log n)	O(n log n)
Time (worst)	O(n^(3/2))–O(n log² n) depending on gaps	O(n²)	O(n log n)	O(n²)	O(n log n)
Space	O(1)	O(1)	O(n)	O(log n) stack	O(1)
Stable	No	Yes	Yes	No	No
Adaptive	Yes	Yes	No	Partial	No
In-place	Yes	Yes	No (array)	Yes	Yes
Recursive	No	No	Yes	Yes	No
Cache-friendly	Mostly (small gaps)	Yes	No	Mostly	No (heap structure)
Code size	Tiny	Tiniest	Medium	Medium	Medium

Choose Shell Sort when:

You need in-place + no recursion + small code size (embedded, firmware, kernel).
n is medium (a few hundred to a few thousand).
Stability is not required.

Choose Insertion Sort when:

n is very small (≤ 32) — Shell Sort's outer loop overhead doesn't pay off.
Data is nearly sorted — insertion sort is already O(n) there.

Choose Merge Sort when:

Stability is required.
Worst-case guarantee matters more than speed.
Memory is available.

Choose Quick Sort when:

Large n (n > 10^5) and stability not needed.
Cache locality matters.

Advanced Patterns¶

Pattern: Hybrid — Shell Sort + Insertion Sort Switch¶

For very small subsequences within a pass (length ≤ 8 or so), the pass is just plain insertion sort regardless of gap. Skipping the gap loop overhead for small n / g saves time.

Go¶

func ShellHybrid(arr []int) {
    n := len(arr)
    for gap := n / 2; gap > 0; gap /= 2 {
        if gap == 1 || n/gap <= 8 {
            // Small subsequences: plain insertion sort with this gap
            for i := gap; i < n; i++ {
                x := arr[i]
                j := i
                for j >= gap && arr[j-gap] > x {
                    arr[j] = arr[j-gap]
                    j -= gap
                }
                arr[j] = x
            }
        } else {
            // Larger subsequences: same kernel, branch hint stays warm
            for i := gap; i < n; i++ {
                x := arr[i]
                j := i
                for j >= gap && arr[j-gap] > x {
                    arr[j] = arr[j-gap]
                    j -= gap
                }
                arr[j] = x
            }
        }
    }
}

Java¶

public class ShellHybrid {
    public static void sort(int[] arr) {
        int n = arr.length;
        for (int gap = n / 2; gap > 0; gap /= 2) {
            for (int i = gap; i < n; i++) {
                int x = arr[i];
                int j = i;
                while (j >= gap && arr[j - gap] > x) {
                    arr[j] = arr[j - gap];
                    j -= gap;
                }
                arr[j] = x;
            }
        }
    }
}

Python¶

def shell_hybrid(arr):
    n = len(arr)
    gap = n // 2
    while gap > 0:
        for i in range(gap, n):
            x = arr[i]
            j = i
            while j >= gap and arr[j - gap] > x:
                arr[j] = arr[j - gap]
                j -= gap
            arr[j] = x
        gap //= 2

Pattern: Configurable Gap Sequence Driver¶

Go¶

type GapSequence func(n int) []int

func ShellGeneric(arr []int, seq GapSequence) {
    n := len(arr)
    for _, gap := range seq(n) {
        if gap >= n { continue }
        for i := gap; i < n; i++ {
            x := arr[i]
            j := i
            for j >= gap && arr[j-gap] > x {
                arr[j] = arr[j-gap]
                j -= gap
            }
            arr[j] = x
        }
    }
}

func KnuthGaps(n int) []int {
    out := []int{}
    g := 1
    for g < n {
        out = append([]int{g}, out...) // prepend for descending order
        g = 3*g + 1
    }
    return out
}

func CiuraGapsFn(n int) []int {
    base := []int{701, 301, 132, 57, 23, 10, 4, 1}
    // Extend upward for large n
    out := []int{}
    g := base[0]
    for g < n {
        out = append([]int{g}, out...)
        g = int(float64(g) * 2.25)
    }
    return append(out, base...)
}

Java¶

import java.util.*;
import java.util.function.IntFunction;

public class ShellGeneric {
    public static void sort(int[] arr, IntFunction<int[]> seq) {
        int n = arr.length;
        for (int gap : seq.apply(n)) {
            if (gap >= n) continue;
            for (int i = gap; i < n; i++) {
                int x = arr[i];
                int j = i;
                while (j >= gap && arr[j - gap] > x) {
                    arr[j] = arr[j - gap];
                    j -= gap;
                }
                arr[j] = x;
            }
        }
    }

    public static int[] knuthGaps(int n) {
        List<Integer> out = new ArrayList<>();
        int g = 1;
        while (g < n) { out.add(0, g); g = 3 * g + 1; }
        return out.stream().mapToInt(Integer::intValue).toArray();
    }
}

Python¶

def shell_generic(arr, sequence_fn):
    n = len(arr)
    for gap in sequence_fn(n):
        if gap >= n:
            continue
        for i in range(gap, n):
            x = arr[i]
            j = i
            while j >= gap and arr[j - gap] > x:
                arr[j] = arr[j - gap]
                j -= gap
            arr[j] = x

def knuth_gaps(n):
    out = []
    g = 1
    while g < n:
        out.insert(0, g)
        g = 3 * g + 1
    return out

def ciura_gaps(n):
    base = [701, 301, 132, 57, 23, 10, 4, 1]
    out = []
    g = base[0]
    while g < n:
        out.insert(0, g)
        g = int(g * 2.25)
    return out + base

Pattern: Counting Operations for Empirical Analysis¶

Wrap comparisons and shifts in counters; plot against n on a log-log scale to estimate the empirical exponent.

class Counter:
    def __init__(self):
        self.compares = 0
        self.shifts = 0

def shell_count(arr, gaps):
    c = Counter()
    n = len(arr)
    for gap in gaps:
        for i in range(gap, n):
            x = arr[i]
            j = i
            while j >= gap:
                c.compares += 1
                if arr[j - gap] > x:
                    arr[j] = arr[j - gap]; c.shifts += 1
                    j -= gap
                else:
                    break
            arr[j] = x
    return c

Benchmark Methodology¶

To meaningfully compare Shell Sort variants:

Run on multiple input shapes: random, sorted, reverse-sorted, partially sorted (95% sorted), many duplicates.
Run on multiple sizes: n ∈ {10³, 10⁴, 10⁵, 10⁶}.
Warm up the JIT (Java) / type cache (Python) with 100 throwaway iterations.
Take the median of 50 runs to filter GC noise.
Clone the input before each run — sorting is destructive.

A representative result on random integers (numbers approximate, depend on hardware):

n	Shell (n/2)	Hibbard	Knuth	Ciura	Quick Sort
1,000	0.7 ms	0.45 ms	0.4 ms	0.30 ms	0.15 ms
10,000	9.5 ms	6.5 ms	5.5 ms	4.0 ms	1.5 ms
100,000	220 ms	130 ms	95 ms	65 ms	15 ms
1,000,000	5400 ms	2800 ms	1900 ms	1300 ms	180 ms

Takeaway: Ciura > Knuth > Hibbard > Shell; Quick Sort dominates all of them at large n.

Code Examples¶

Full Implementation: Shell Sort with All Three Sequences¶

Go¶

package main

import "fmt"

func shellPass(arr []int, gap int) {
    n := len(arr)
    for i := gap; i < n; i++ {
        x := arr[i]
        j := i
        for j >= gap && arr[j-gap] > x {
            arr[j] = arr[j-gap]
            j -= gap
        }
        arr[j] = x
    }
}

func ShellOriginal(arr []int) {
    for gap := len(arr) / 2; gap > 0; gap /= 2 {
        shellPass(arr, gap)
    }
}

func ShellKnuth(arr []int) {
    n := len(arr)
    gap := 1
    for gap < n/3 { gap = 3*gap + 1 }
    for ; gap > 0; gap = (gap - 1) / 3 {
        shellPass(arr, gap)
    }
}

var ciuraGaps = []int{701, 301, 132, 57, 23, 10, 4, 1}

func ShellCiura(arr []int) {
    n := len(arr)
    for _, gap := range ciuraGaps {
        if gap < n {
            shellPass(arr, gap)
        }
    }
}

func main() {
    data := []int{12, 34, 54, 2, 3, 1, 7, 19, 8, 25, 11, 4, 100, 50, 70}
    ShellCiura(data)
    fmt.Println(data)
}

Java¶

import java.util.Arrays;

public class ShellSortAll {
    static void shellPass(int[] arr, int gap) {
        for (int i = gap; i < arr.length; i++) {
            int x = arr[i];
            int j = i;
            while (j >= gap && arr[j - gap] > x) {
                arr[j] = arr[j - gap];
                j -= gap;
            }
            arr[j] = x;
        }
    }

    public static void shellOriginal(int[] arr) {
        for (int gap = arr.length / 2; gap > 0; gap /= 2) shellPass(arr, gap);
    }

    public static void shellKnuth(int[] arr) {
        int n = arr.length, gap = 1;
        while (gap < n / 3) gap = 3 * gap + 1;
        for (; gap > 0; gap = (gap - 1) / 3) shellPass(arr, gap);
    }

    private static final int[] CIURA = {701, 301, 132, 57, 23, 10, 4, 1};

    public static void shellCiura(int[] arr) {
        for (int gap : CIURA) if (gap < arr.length) shellPass(arr, gap);
    }

    public static void main(String[] args) {
        int[] data = {12, 34, 54, 2, 3, 1, 7, 19, 8, 25, 11, 4, 100, 50, 70};
        shellCiura(data);
        System.out.println(Arrays.toString(data));
    }
}

Python¶

def shell_pass(arr, gap):
    n = len(arr)
    for i in range(gap, n):
        x = arr[i]
        j = i
        while j >= gap and arr[j - gap] > x:
            arr[j] = arr[j - gap]
            j -= gap
        arr[j] = x

def shell_original(arr):
    gap = len(arr) // 2
    while gap > 0:
        shell_pass(arr, gap)
        gap //= 2

def shell_knuth(arr):
    n = len(arr)
    gap = 1
    while gap < n // 3:
        gap = 3 * gap + 1
    while gap > 0:
        shell_pass(arr, gap)
        gap = (gap - 1) // 3

CIURA_GAPS = [701, 301, 132, 57, 23, 10, 4, 1]

def shell_ciura(arr):
    n = len(arr)
    for gap in CIURA_GAPS:
        if gap < n:
            shell_pass(arr, gap)

if __name__ == "__main__":
    data = [12, 34, 54, 2, 3, 1, 7, 19, 8, 25, 11, 4, 100, 50, 70]
    shell_ciura(data)
    print(data)

Error Handling¶

Scenario	What goes wrong	Correct approach
Gap never reaches 1	Final pass missing → only partially sorted	Ensure formula terminates at exactly 1; assert in tests
Gap stays ≥ n forever	All passes skipped → unsorted output	Always include `gap = 1` in sequence
Integer overflow in `g = 3g + 1`	UB / wrong index on 32-bit	Use 64-bit indices for huge arrays
Negative gap (signed overflow)	Inner loop goes wild	Validate sequence before use
Empty array	Some sequences compute `g = n/2 = 0` and exit early	Test with `[]` explicitly
Concurrent mutation	Race conditions	Shell Sort is single-threaded; clone before sharing

Performance Analysis¶

Empirical Comparison Across Languages¶

Go¶

package main

import (
    "fmt"
    "math/rand"
    "time"
)

func benchmark(fn func([]int), name string, n int) {
    rand.Seed(42)
    data := make([]int, n)
    for i := range data { data[i] = rand.Intn(n) }
    runs := 20
    start := time.Now()
    for i := 0; i < runs; i++ {
        tmp := append([]int{}, data...)
        fn(tmp)
    }
    elapsed := time.Since(start)
    fmt.Printf("%-15s n=%6d : %.2f ms/run\n", name, n, float64(elapsed.Milliseconds())/float64(runs))
}

Java¶

import java.util.Random;

public class ShellBench {
    public static void benchmark(java.util.function.Consumer<int[]> fn, String name, int n) {
        Random rng = new Random(42);
        int[] data = new int[n];
        for (int i = 0; i < n; i++) data[i] = rng.nextInt(n);
        int runs = 20;
        long start = System.nanoTime();
        for (int i = 0; i < runs; i++) fn.accept(data.clone());
        long elapsed = System.nanoTime() - start;
        System.out.printf("%-15s n=%6d : %.2f ms/run%n", name, n, elapsed / runs / 1_000_000.0);
    }
}

Python¶

import random
import time

def benchmark(fn, name, n):
    random.seed(42)
    data = [random.randint(0, n) for _ in range(n)]
    runs = 20
    start = time.perf_counter()
    for _ in range(runs):
        tmp = data.copy()
        fn(tmp)
    elapsed = time.perf_counter() - start
    print(f"{name:<15} n={n:>6} : {elapsed / runs * 1000:.2f} ms/run")

Theoretical Bounds Recap¶

Sequence	Proven worst	Average (empirical)
Shell n/2	O(n²)	~O(n^(3/2))
Hibbard	O(n^(3/2))	O(n^(5/4))
Knuth	unproven	~O(n^(5/4))
Sedgewick	O(n^(4/3))	O(n^(7/6))
Pratt	O(n log² n)	O(n log² n)
Ciura	unproven	empirically the best

Best Practices¶

Pick Ciura for max speed, Knuth for simple production code, Pratt when you must prove the bound.
Always terminate the gap sequence at 1. Make this part of your unit tests.
For embedded code, hardcode the gap array as a const int[] — avoids runtime computation.
For kernel / interrupt context, profile carefully — stack and code size budgets are tight.
Document why Shell Sort and not Merge/Quick. A future maintainer will ask.
Add a debug_assert! / assert that the input is sorted after the final pass during development. Remove for production hot paths.
Do not use Shell Sort when stability matters; switch to TimSort or Merge Sort.

Visual Animation¶

See animation.html for interactive visualization.

Middle-level animation includes: - Color-coded subsequences for the current gap, each in a distinct hue. - Live highlighting of the gapped-insertion kernel (shift / insert). - Pass counter, current gap, and operation counters (comparisons, shifts). - Drop-down to switch gap sequence (Shell, Knuth, Ciura) and watch the resulting pass count change. - "Adversarial input" button to see how Shell's original sequence degrades to O(n²) on interleaved input.

Summary¶

At the middle level, Shell Sort becomes a study in how the gap sequence drives complexity:

The algorithm preserves h-sorted invariants across passes, so later passes do less work.
Shell's original n/2, n/4, ... is O(n²) worst-case because even/odd indices never mix until g = 1.
Hibbard's 2^k − 1 gives O(n^(3/2)) by using coprime gaps.
Sedgewick's mixed sequence improves to O(n^(4/3)).
Pratt's 2^a · 3^b gives the best provable O(n log² n) at the cost of many passes.
Ciura's empirically tuned sequence is the fastest in practice despite no proven bound.

When to choose Shell Sort: in-place, no-recursion, medium n, stability not required, code-size matters. Otherwise reach for Quick Sort / Merge Sort / TimSort.

The next level — senior.md — explores Shell Sort's role in embedded firmware, kernel context, cache behavior, and modern relevance.

Shell Sort — Middle Level¶

Table of Contents¶

Introduction¶

Deeper Concepts¶

Invariant: After h-sorting, the Array Stays h-sorted Forever¶

Invariant: Final Pass Is Plain Insertion Sort¶

Recurrence Sketch — Why Gap Choice Matters¶

Why Shell's n/2, n/4, ... Sequence Is Bad¶

Why Pratt's O(n log² n) Works¶

Why Ciura's Empirical Sequence Wins in Practice¶

Gap Sequence Deep Dive¶

Choosing a Sequence in Production¶

Adversarial Inputs¶

Comparison with Alternatives¶

Advanced Patterns¶

Pattern: Hybrid — Shell Sort + Insertion Sort Switch¶

Go¶

Java¶

Python¶

Pattern: Configurable Gap Sequence Driver¶

Go¶

Java¶

Python¶

Pattern: Counting Operations for Empirical Analysis¶

Benchmark Methodology¶

Code Examples¶

Full Implementation: Shell Sort with All Three Sequences¶

Go¶

Java¶

Python¶

Error Handling¶

Performance Analysis¶

Empirical Comparison Across Languages¶

Go¶

Java¶

Python¶

Theoretical Bounds Recap¶

Best Practices¶

Visual Animation¶

Summary¶

Invariant: After `h`-sorting, the Array Stays `h`-sorted Forever¶

Why Shell's `n/2, n/4, ...` Sequence Is Bad¶