Bubble Sort — Mathematical Foundations and Complexity Theory¶
Table of Contents¶
- Formal Definition
- Correctness Proof — Loop Invariants
- Lower Bound for Comparison Sorts
- Inversion Counting and Optimality of Adjacent-Swap Sorts
- Amortized Analysis (Why It Doesn't Help Here)
- Average-Case Analysis via Permutations
- Sorting Networks: 0–1 Principle
- Cache-Oblivious Analysis
- Parallel Complexity
- Comparison with Alternatives
- Summary
Formal Definition¶
Definition: A sorting algorithm SORT for arrays over a totally ordered set (S, ≤)
is a function SORT : S* → S* such that for every input array A:
1. Permutation property: SORT(A) is a permutation of A.
2. Order property: For all i < j: SORT(A)[i] ≤ SORT(A)[j].
BUBBLE-SORT(A):
n ← length(A)
for i ← 1 to n-1:
for j ← 1 to n-i:
if A[j] > A[j+1]:
SWAP(A[j], A[j+1])
Invariant I_outer(i): At the start of iteration i+1 of the outer loop,
the subarray A[n-i+1 .. n] contains the i largest
elements of A in non-decreasing order, AND every
element in A[1 .. n-i] is ≤ every element in A[n-i+1 .. n].
This is the textbook (CLRS) definition with 1-based indexing. We use 0-based in code.
Correctness Proof — Loop Invariants¶
Inner Loop Invariant¶
Invariant I_inner(j): At the start of iteration j+1 of the inner loop in pass i, A[1 .. j] contains some elements of the original array (some may have been swapped from later positions), and A[j] is the maximum of A[1 .. j] for that pass.
Base (j=1): A[1] is trivially the maximum of {A[1]}. ✓
Inductive step: Assume A[j] is the max of A[1 .. j]. The next iteration compares A[j] with A[j+1]: - If A[j] ≤ A[j+1]: don't swap. A[j+1] is now the max of A[1 .. j+1] because either A[j+1] > A[j] (already max) or A[j+1] = A[j] (still max). - If A[j] > A[j+1]: swap. After swap, A[j+1] (formerly A[j]) holds the max.
In both cases, A[j+1] is the max of A[1 .. j+1] after the iteration. ✓
Termination: When j = n-i+1, A[n-i+1] holds the max of A[1 .. n-i+1] — i.e., the maximum of the unsorted prefix.
Outer Loop Invariant¶
Invariant I_outer(i): At the start of outer iteration i+1 (i ≥ 0), the subarray A[n-i+1 .. n] contains the i largest elements of A in sorted order, AND max(A[1 .. n-i]) ≤ min(A[n-i+1 .. n]).
Base (i=0): A[n+1 .. n] is empty; the conditions hold vacuously. ✓
Inductive step: Assume I_outer(i). At outer iteration i+1, the inner loop bubbles max(A[1 .. n-i]) to position n-i. By inductive hypothesis, this maximum is the (i+1)-th largest of A. After the inner loop: - A[n-i .. n] contains the i+1 largest in sorted order (the (i+1)-th largest is at A[n-i], and A[n-i+1 .. n] was already sorted by IH). - max(A[1 .. n-i-1]) ≤ A[n-i] because A[n-i] was the max of A[1 .. n-i] after the bubble.
So I_outer(i+1) holds. ✓
Termination: When i = n-1, A[2 .. n] contains the n-1 largest in sorted order, and A[1] ≤ A[2]. Therefore A is fully sorted.
QED.
Lower Bound for Comparison Sorts¶
Theorem (Decision Tree Lower Bound)¶
Any comparison-based sorting algorithm requires Ω(n log n) comparisons in the worst case.
Proof sketch: A comparison sort can be modeled as a binary decision tree where each internal node is a comparison A[i] : A[j] and each leaf corresponds to a permutation of the input.
- There are
n!possible input permutations. - Each must lead to a distinct leaf (the algorithm must distinguish them).
- A binary tree with L leaves has depth ≥ ⌈log₂ L⌉.
Therefore depth ≥ log₂(n!) = Θ(n log n) by Stirling's approximation.
Implication for Bubble Sort¶
Bubble Sort uses Θ(n²) comparisons in the worst case — worse than the lower bound. This is not because the lower bound is loose; it's because Bubble Sort is restricted to adjacent swaps, which makes it suboptimal among comparison sorts.
Sub-Theorem: Lower Bound for Adjacent-Swap-Only Sorts¶
Any sorting algorithm restricted to adjacent swaps requires Ω(n²) swaps in the worst case.
Proof: Each adjacent swap removes at most 1 inversion. A reverse-sorted array of size n has n(n-1)/2 inversions. So at least n(n-1)/2 = Θ(n²) swaps are required. Bubble Sort, Insertion Sort (with swaps), and Cocktail Sort all achieve this bound — they are optimal among adjacent-swap sorts.
Inversion Counting and Optimality¶
Definition¶
An inversion in array A is a pair (i, j) with i < j and A[i] > A[j]. Let inv(A) denote the number of inversions in A.
| Property | Value |
|---|---|
| Min inversions | 0 (sorted) |
| Max inversions | n(n-1)/2 (reverse sorted) |
| Expected inversions in random permutation | n(n-1)/4 |
Theorem: Bubble Sort Performs Exactly inv(A) Swaps¶
Proof: Each Bubble Sort swap exchanges adjacent elements A[j] and A[j+1] where A[j] > A[j+1] — i.e., it removes the inversion (j, j+1).
We claim no other inversion is created or destroyed: - Inversions involving indices outside {j, j+1} are unchanged (those elements don't move). - Inversions of the form (k, j) with k < j: pair becomes (k, j) referring to the new A[j] (former A[j+1]). The ≤/> status against A[k] may change either way, but those are inversions with j which are tracked separately...
Hmm, this argument is subtle. The cleaner statement: net change in inv(A) per swap = -1.
Cleaner proof: Consider the swap of A[j] (= a) and A[j+1] (= b) where a > b. After the swap: - Pairs (i, k) with i, k ∉ {j, j+1}: unchanged (no element moved). - Pairs involving exactly one of j, j+1: these are pairs (k, j) with k < j and pairs (j+1, k) with k > j+1. Each such pair compares some A[k] with one of {a, b}. Since a > b, swapping which element is at position j vs j+1 doesn't change whether A[k] is greater or less than the value — it changes which inversion-direction we count. The number of inversions involving position j (with elements outside {j, j+1}) stays the same. - Pair (j, j+1): before = inverted (since a > b); after = not inverted. Net change: -1.
Therefore inv(A) decreases by exactly 1 per swap. Bubble Sort terminates when inv(A) = 0. So total swaps = initial inv(A).
QED.
Corollary: Bubble Sort is Swap-Optimal¶
Any adjacent-swap-only sort must perform at least inv(A) swaps. Bubble Sort performs exactly inv(A). Therefore Bubble Sort is optimal in swap count among adjacent-swap-only sorts.
(Note: Insertion Sort with swap-based implementation also achieves this bound.)
Amortized Analysis¶
Bubble Sort doesn't benefit from amortized analysis — every operation is O(n) per pass, and there's no "expensive but rare" operation to amortize. The Aggregate Method just confirms the obvious:
Total cost = Σ (cost of pass i) = Σ (n-i) for i=0 to n-2 = n(n-1)/2 = O(n²)
Amortized per pass = O(n) — same as actual.
Amortized per element = O(n) — same as worst.
The Potential Method is similarly uninformative because there's no slack to redistribute. By contrast, dynamic-array push has O(1) amortized cost despite O(n) worst-case resize — that's where amortization shines.
Lesson: Amortized analysis helps when expensive operations are rare. Bubble Sort's expensive operations are constant — every pass is O(n).
Average-Case Analysis¶
Setup: Assume input is a uniformly random permutation of n distinct elements.
Expected Number of Inversions¶
Let X_{ij} (for i < j) be the indicator that (i, j) is an inversion. Then:
E[inv(A)] = Σ_{i<j} E[X_{ij}] = Σ_{i<j} Pr[A[i] > A[j]]
= Σ_{i<j} 1/2 (uniform: each pair equally likely either order)
= (1/2) · C(n, 2)
= n(n-1)/4
So expected number of swaps in Bubble Sort on random input = n(n-1)/4.
Expected Number of Comparisons¶
Without early exit: always n(n-1)/2 comparisons.
With early exit: depends on when the array becomes "passes-only-swap-free." For random input, asymptotically still Θ(n²) — early exit usually saves only a constant fraction.
Expected Number of Passes¶
The number of passes equals 1 + (length of the longest "left-to-right migration" needed). For a random permutation, the smallest element's expected position is n/2, so it needs ~n/2 passes to reach position 0 — confirming Θ(n) passes on average.
Sorting Networks: 0–1 Principle¶
The 0–1 Principle (Knuth)¶
A sorting network sorts every input correctly if and only if it sorts every binary (0–1) input correctly.
Why this matters: Verifying sorting correctness on n! inputs is intractable. The 0–1 principle reduces verification to 2^n binary inputs — still exponential, but tractable for small n.
Application to Odd-Even Transposition Sort¶
For odd-even transposition sort with n inputs and n phases (depth O(n) parallel time):
- Verify on all
2^nbinary inputs. - By the 0–1 principle, correctness follows for arbitrary inputs.
This is how parallel sorting algorithm correctness is rigorously verified.
Bubble Sort as a Sorting Network¶
Bubble Sort is not a sorting network in the strict sense because the early-exit optimization makes it data-dependent. The naive (no-early-exit) version IS a sorting network: it's a sequence of fixed comparators.
Network depth: n(n-1)/2 (sequential). Network size (total comparators): n(n-1)/2.
By contrast: - AKS sorting network: O(log n) depth, O(n log n) comparators — but constant factor is huge (impractical). - Batcher's bitonic sort: O(log² n) depth, O(n log² n) comparators — practical for GPU/FPGA. - Odd-even merge sort: similar to bitonic — O(log² n) depth.
Cache-Oblivious Analysis¶
Parameters: N = problem size, M = cache size, B = block (cache line) size
Bubble Sort cache behavior:
- One pass scans N consecutive elements: N/B cache misses.
- Total passes: N (worst case).
- Total cache misses: O(N²/B).
Compare to optimal cache-oblivious sort (e.g., Funnelsort, Lazy Funnelsort):
- Cache misses: O((N/B) · log_{M/B}(N/B))
- For typical N=10⁶, B=64, M=10⁵: ~16k misses for optimal vs. 10¹⁰/64 ≈ 1.5·10⁸ for bubble.
Ratio: Bubble Sort is ~10⁴× worse in cache misses than cache-optimal.
External sorting (data > RAM):
- Bubble Sort: O(N²/B) disk seeks — utterly impractical for any N > RAM.
- Merge Sort with B-way merge: O((N/B) · log_{M/B}(N/B)) — the foundation of database sort.
Implication: For external/disk-based sorting, even the constant factors of Bubble Sort are catastrophic. Database systems and big-data engines never use Bubble Sort or its variants.
Parallel Complexity¶
Sequential Bubble Sort¶
Sequential time: Θ(n²).
Parallel Bubble Sort (Odd-Even Transposition)¶
With p ≤ n processors:
- With
p = 1: T = Θ(n²) — back to sequential. - With
p = n: T = Θ(n) — linear time. - With
p = n²: still Θ(n) — depth is the bottleneck, not work.
PRAM model: Odd-even transposition is in NC₁ if we relax to O(n) parallel depth — though strictly NC requires polylog depth.
True NC sorts: - AKS network: O(log n) depth — theoretically optimal but huge constant. - Cole's parallel merge sort: O(log n) — practical foundation of NC sorting.
Comparison with Alternatives (Theoretical)¶
| Algorithm | Worst Time | Avg Time | Worst Space | Cache Misses | Stable | Parallel Depth |
|---|---|---|---|---|---|---|
| Bubble Sort | Θ(n²) | Θ(n²) | Θ(1) | O(n²/B) | Yes | Θ(n²) seq, Θ(n) parallel |
| Insertion Sort | Θ(n²) | Θ(n²) | Θ(1) | O(n²/B) | Yes | Θ(n²) seq |
| Selection Sort | Θ(n²) | Θ(n²) | Θ(1) | O(n²/B) | No | Θ(n²) seq |
| Merge Sort | Θ(n log n) | Θ(n log n) | Θ(n) | O((n/B)log(n/M)) | Yes | Θ(log² n) parallel |
| Quick Sort (avg) | Θ(n²) | Θ(n log n) | Θ(log n) | O((n/B)log(n/M)) | No | Θ(log² n) parallel (avg) |
| Heap Sort | Θ(n log n) | Θ(n log n) | Θ(1) | O(n log n) | No | Θ(log² n) parallel |
| Bitonic Sort | Θ(n log² n) | Θ(n log² n) | Θ(1) network | O((n/B)log²n) | Yes | Θ(log² n) |
| AKS Network | Θ(log n) | Θ(log n) | Θ(1) network | impractical | Yes | Θ(log n) |
Summary¶
At professional level, Bubble Sort is a complete, well-understood case study in algorithm analysis:
- Correctness is proven by nested loop invariants — the cleanest example of structural induction in sorting.
- Optimality within the class of adjacent-swap algorithms is proven via inversion counting (it does the minimum possible swaps for that class).
- Suboptimality in the comparison-sort hierarchy is established by the Ω(n log n) decision-tree lower bound.
- Cache behavior is O(n²/B) — bad in absolute terms but predictable.
- Parallelism redeems it: odd-even transposition sort achieves Θ(n) time on n processors and forms the basis of sorting networks that run on GPUs, FPGAs, and homomorphic-encryption pipelines.
- Sorting Networks (built from bubble-style comparators) are verified via the 0–1 principle and underpin oblivious / hardware sorting.
The algorithm's pedagogical value is permanent; its production utility is essentially zero outside the sorting-network niche.
Key takeaway: Bubble Sort is what you teach to demonstrate that O(n²) sorts exist and to motivate the search for O(n log n) algorithms. Once Insertion Sort, Merge Sort, and Quick Sort exist, Bubble Sort retires to textbooks and sorting networks.
In this topic