Two Pointers — Mathematical Foundations and Complexity Theory¶
Table of Contents¶
- Formal Setup
- Correctness Proof — Two-Sum on Sorted Array
- Correctness Proof — In-Place Partition
- Amortized Argument for Total O(n) Work
- Lower Bounds — When Two Pointers Is Optimal
- Cache-Friendliness vs Binary Search
- Parallelization Difficulty
- Summary
Formal Setup¶
Let A be an array of length n over a totally ordered domain D. A two-pointer algorithm maintains a state (l, r) in N x N and updates it via a transition function delta : (D x D) -> {move_left, move_right, terminate}. The configuration evolves as:
where step increments l, decrements r, or both, based on delta(A[l_k], A[r_k]) and the problem-specific predicate P.
For the converging variant the invariant is 0 <= l <= r <= n - 1 and the termination predicate is l >= r. For the slow/fast variant the invariant is 0 <= slow <= fast <= n and termination is fast = n. For the merge variant we have two sequences A, B and indices (i, j); the invariant is 0 <= i <= |A|, 0 <= j <= |B|; termination is i = |A| or j = |B|.
Every correctness proof in this file follows the loop-invariant discipline:
- Initialization. The invariant holds before the loop starts.
- Maintenance. If the invariant holds at the start of iteration
k, it still holds at the end. - Termination. When the loop ends, the invariant plus the negation of the loop condition imply the postcondition.
Termination is also a monovariant argument: some non-negative integer strictly decreases each iteration and is bounded below by zero.
Correctness Proof — Two-Sum on Sorted Array¶
Algorithm.
Input: A sorted in non-decreasing order, length n; target t.
Output: (i, j) with 0 <= i < j <= n - 1 and A[i] + A[j] = t, or NONE.
l <- 0; r <- n - 1
while l < r:
s <- A[l] + A[r]
if s = t: return (l, r)
if s < t: l <- l + 1
else: r <- r - 1
return NONE
Loop invariant I:
If a valid pair
(i*, j*)with0 <= i* < j* <= n - 1andA[i*] + A[j*] = texists, thenl <= i* and j* <= r.
Initialization. Before the first iteration, l = 0 and r = n - 1, so any valid (i*, j*) satisfies 0 <= i* and j* <= n - 1 trivially. I holds.
Maintenance. Assume I holds. Let s = A[l] + A[r]. Three cases:
- Case
s = t. Return(l, r).Iis not needed beyond this point; we have a valid pair. - Case
s < t. Claim: no valid pair(i*, j*)hasi* = l. Suppose for contradictioni* = l. Thenj* <= r, soA[j*] <= A[r](sorted). ThereforeA[i*] + A[j*] = A[l] + A[j*] <= A[l] + A[r] = s < t, soA[i*] + A[j*] != t, contradicting validity. Thusi* > l, which givesi* >= l + 1. Afterl <- l + 1, the invariantl <= i*still holds. - Case
s > t. By symmetric argument no valid pair hasj* = r, soj* <= r - 1. Afterr <- r - 1,j* <= rstill holds.
Termination. The quantity phi = r - l is a non-negative integer at the start of every iteration (because l < r is the loop guard, phi >= 1). On every non-returning iteration, exactly one of l or r moves toward the other, so phi strictly decreases by 1. After at most n - 1 iterations, phi = 0 and the loop exits.
Postcondition. Two ways to exit:
- Via
return (l, r)inside the loop: by constructionA[l] + A[r] = tandl < r. Output is valid. - Via
return NONEafter the loop: at that pointl >= r. The invariant says: if a valid pair existed, thenl <= i* < j* <= r. Butl >= rmakes this empty (noi*, j*withl <= i* < j* <= r). So no valid pair exists.
QED. The algorithm is correct in both branches.
Correctness Proof — In-Place Partition¶
We prove correctness of the slow/fast partition: given a predicate P, rearrange A so that A[0..k-1] contains exactly the elements satisfying P, and return k. Element order inside the satisfying prefix is preserved (stable partition); the suffix order is unspecified.
Algorithm (write-only variant).
Note this assumes "destroying" the discarded elements is acceptable (e.g., removeDuplicates semantics). For a true swap-based partition the proof is similar; we focus on the write variant for clarity.
Loop invariant I:
At the start of iteration
fast = k, the multiset{A[0], ..., A[slow - 1]}equals the multiset of elements in the originalA[0..k-1]that satisfyP, and in the same relative order.
Initialization. Before iteration k = 0, slow = 0, so the prefix is empty. The set of satisfying elements in A[0..-1] is also empty. I holds.
Maintenance. Assume I holds. Two cases:
P(A[fast])false.slowdoes not change; the prefix is unchanged. NowImust hold for the prefixA[0..fast](one more element). The new elementA[fast]does not satisfyP, so the set of satisfying elements inA[0..fast]is unchanged.Istill holds.P(A[fast])true. ExecuteA[slow] <- A[fast]; slow <- slow + 1. The new prefixA[0..slow - 1]is the old prefix withA[fast]appended. The new "should-be" set is the old should-be set withA[fast]appended (in order). Match.Istill holds.
Termination. fast strictly increases from 0 to n - 1 and is bounded; the loop runs exactly n times.
Postcondition. After the loop, fast = n. By I, A[0..slow - 1] equals the satisfying subsequence of the original A[0..n - 1]. Therefore the algorithm returns the correct partition size, and the prefix contains the correct elements in the correct order. QED.
Amortized Argument for Total O(n) Work¶
The runtime claim "two pointers is O(n)" follows from an aggregate amortized argument, not a per-iteration argument. Per iteration, the work is O(1), but the number of iterations must be bounded.
Aggregate analysis (converging). Define the potential phi(l, r) = (r - l). Initially phi = n - 1. Every iteration either decreases phi by exactly 1 (move one pointer) or terminates. Since phi >= 0 and is integer-valued, the total number of iterations is at most n - 1. Per-iteration work is O(1). Total: O(n).
Aggregate analysis (slow/fast). fast increases by exactly 1 per iteration and is bounded by n. Total iterations: n. Work per iteration: O(1). Total: O(n).
Aggregate analysis (merge). Define phi = i + j. Initially phi = 0. Every iteration increases phi by at least 1 (one pointer moves; on a match both move, but still at least 1). phi is bounded by |A| + |B|. Total iterations: at most |A| + |B|. Per-iteration work: O(1). Total: O(|A| + |B|).
Note on amortization vs worst case. This is not really "amortized" in the classical sense (no operation is occasionally expensive while others are cheap). It is a straightforward aggregate argument: the total work across the loop is O(n), even though no single iteration costs more than O(1). The terminology "amortized" is sometimes loosely applied here, but pedantically this is a worst-case bound, not an amortized one. Each iteration is O(1) in the worst case; their sum is O(n) in the worst case.
This distinguishes two pointers from genuinely amortized structures like dynamic arrays, where some operations cost O(n) and the amortization shows that the average per operation is O(1).
Lower Bounds — When Two Pointers Is Optimal¶
Sorted two-sum¶
Claim. Any algorithm that solves two-sum on a sorted array must examine Omega(log n) elements in the worst case.
Proof sketch. Consider an adversary that returns the answer "NONE" on all queries until forced to commit. The algorithm must distinguish between an array where the pair exists and one where it does not. By the comparison-tree lower bound argument, the algorithm makes at least log_2(n) comparisons in the worst case. Binary search achieves this.
However, two pointers does Theta(n) comparisons in the worst case, which is not optimal for the decision problem. So why prefer it?
- Two pointers is deterministic linear with
O(1)extra memory. Binary search needs random access; on a linked list or sequential read, you cannot use it. - For producing the pair (not just deciding existence), two pointers is
O(n)while running binary searchntimes costsO(n log n). - Cache and prefetcher behavior: two pointers reads sequentially, which is dramatically faster in practice (see next section), so even at
Theta(n)comparisons the constants beatTheta(log n)random-access comparisons for moderaten.
In summary, two pointers is not optimal in the comparison-complexity sense but is practically optimal for stream / sequential-access models.
Comparison-based merging¶
Claim. Merging two sorted sequences of total length n requires at least n - 1 comparisons in the worst case.
Proof. Each comparison A[i] vs B[j] resolves the relative order of exactly two elements. The output sequence has n elements; in the worst case (e.g., alternating: A[0] < B[0] < A[1] < B[1] < ...), every adjacent pair in the output is one A element next to one B element, requiring a comparison to decide their order. That gives n - 1 adjacencies, hence n - 1 comparisons.
The merge-style two-pointer algorithm achieves exactly <= n - 1 comparisons. It is therefore comparison-optimal.
Partitioning by a predicate¶
Lower bound: Omega(n) because every element must be examined (the predicate is opaque). Two pointers achieves this with one pass. Optimal in both comparison and access model.
Cache-Friendliness vs Binary Search¶
In the external memory model (B = block size, M = cache size, both measured in elements):
| Algorithm | Cache misses |
|---|---|
| Two pointers (sequential scan) | O(n / B) |
| Binary search | O(log_2(n / B)) per query, O(log n) total |
For a single query, binary search wins asymptotically. But for producing all answers (e.g., all triplets, the merged list, etc.), two pointers' single sequential scan often beats n random-access binary searches for any realistic n, because the constant from sequential access is roughly 1-2 nanoseconds per element vs 50-300 nanoseconds per cache miss for a true random access.
This is why production database query planners often prefer sort-merge join over hash join (with binary lookups) when the relations are already sorted or near-sorted: the sequential I/O profile is unbeatable, even though the hash join may have a lower asymptotic comparison count.
Conclusion. In the RAM model, binary search is asymptotically faster for single-query search. In the external-memory model, two pointers is competitive or better for bulk operations. Real systems pick the algorithm based on the access pattern, not just the comparison count.
Parallelization Difficulty¶
Two pointers is inherently sequential because each step's pointer movement depends on the previous step's comparison result. This creates a data dependency chain of length Theta(n), which limits parallel speedup.
Formally, the work-span model of parallel computing:
- Work
T_1 = Theta(n): total operations. - Span
T_infinity = Theta(n): critical path is the chain of pointer updates. - Parallelism
T_1 / T_infinity = 1.
There is no asymptotic parallelism in the straight algorithm.
Workarounds (partial parallelism).
-
Chunk partition (works for slow/fast). Split the array into
pchunks ofn / p, run partition independently in each, then concatenate the satisfying prefixes. Output order is preserved within chunks but the global ordering requires a final compaction step (which is itself sequential). Net speedup is bounded bypignoring the final merge. -
Pipeline (works for merge). With
ksorted streams, the merger is sequential, but thekproducers can run in parallel. Speedup is in I/O / decoding, not in the merger itself. -
Speculative execution. Pre-compute "what if
lmoves" and "what ifrmoves" in parallel, commit one branch. This wastes half the work and only helps when the comparison is itself expensive (rare). -
Map-reduce style for converging. For "count pairs with property", you can transform into prefix-sum + binary search, which parallelizes. This usually changes the algorithm rather than parallelizing two pointers per se.
The takeaway: two pointers gives you excellent serial constants, not algorithmic parallelism. If you need true parallel speedup, you must restructure the problem (e.g., divide-and-conquer merge sort splits into independent sub-merges).
Summary¶
| Aspect | Result |
|---|---|
| Correctness | Proven by loop-invariant arguments: at every step a discardable region can be named |
| Total work | O(n) by aggregate argument; phi = r - l (or i + j) bounds iterations |
| Comparison lower bound (decision) | Omega(log n) for sorted two-sum; two pointers is NOT comparison-optimal |
| Comparison lower bound (merge) | Omega(n); two pointers IS comparison-optimal |
| Cache complexity | O(n / B) sequential — beats binary search in bulk-output scenarios |
| Parallelism | Span = work = Theta(n); no asymptotic parallel speedup |
Two pointers is a tight, low-overhead pattern with provable correctness, optimal sequential I/O, and lower bounds that match its merge-style variant. Its main theoretical weakness — non-optimal for single-query decision problems and unparallelizable on the critical path — is also the reason it dominates the practical bulk-streaming domain where allocations, cache misses, and random access are the real enemy.