Linear Search — Professional Level¶
Table of Contents¶
- Formal Definition
- Correctness Proof
- Expected Comparisons — Probabilistic Analysis
- Lower Bound for Unordered Search
- Cache-Aware Complexity (External Memory Model)
- Parallel Complexity (PRAM)
- Information-Theoretic Bound
- Comparison to Other Search Algorithms
- Summary
Formal Definition¶
Input: - An array A of n elements (1-indexed for the proof; 0-indexed for code). - A target value t. - A comparison predicate eq: U × U → {0, 1} where U is the universe of values.
Output: - An index i ∈ {1, …, n} such that eq(A[i], t) = 1, and for all j < i, eq(A[j], t) = 0. - The sentinel value 0 (or -1 in 0-indexed conventions) if no such index exists.
Algorithm:
Complexity classes: - Time: Θ(n) worst case, O(n) average and best. - Space: Θ(1) auxiliary (one loop counter). - Comparisons: ≤ n in the worst case; exactly n if t ∉ A.
Correctness Proof¶
We prove correctness via loop invariant:
Invariant (P): Before the i-th iteration, for all j ∈ {1, …, i-1}, A[j] ≠ t.
Initialization (i = 1): The set {1, …, 0} is empty, so the invariant holds vacuously. ✓
Maintenance: Suppose P holds before the i-th iteration. The loop body checks A[i] == t. - If true, the algorithm returns i. By the invariant, all positions before i are non-matches; position i is a match. So the returned index satisfies the postcondition: it is the first match. - If false, A[i] ≠ t. Combined with P, we now have A[j] ≠ t for all j ∈ {1, …, i}. The invariant holds at the start of iteration i+1. ✓
Termination: The loop runs at most n iterations (i goes from 1 to n). Each iteration either returns or increments i. After iteration n, the loop exits and the algorithm returns -1. By the invariant, A[j] ≠ t for all j ∈ {1, …, n}, so t ∉ A and -1 is correct. ✓
QED. The algorithm returns the first index where A[i] = t, or -1 if no such index exists.
Expected Comparisons — Probabilistic Analysis¶
Case A: Target Always Present, Uniformly Distributed Position¶
Assume the target t is in A at a position chosen uniformly at random from {1, …, n}.
Let X = number of comparisons performed. If t is at position k, X = k.
Expected value:
So E[X] ≈ n/2 + 0.5 ≈ n/2 for large n.
Variance:
Var[X] = E[X²] - (E[X])²
= (1/n) · Σ k² - ((n+1)/2)²
= (1/n) · n(n+1)(2n+1)/6 - (n+1)²/4
= (n+1)(2n+1)/6 - (n+1)²/4
= (n+1)[(2n+1)/6 - (n+1)/4]
= (n+1)[(4n+2 - 3n - 3)/12]
= (n+1)(n-1)/12
= (n² - 1)/12
Standard deviation σ ≈ n/√12 ≈ 0.289n.
So expected comparisons is n/2, with standard deviation ~0.29n. The distribution is uniform on {1, …, n}, not concentrated.
Case B: Target Possibly Absent¶
Let p = probability target is present in A; assume position uniform when present.
E[X] = p · (n+1)/2 + (1-p) · n
For p = 1: E[X] = (n+1)/2 ≈ n/2. For p = 0: E[X] = n (always full scan). For p = 0.5: E[X] = (n+1)/4 + n/2 = (3n+1)/4 ≈ 3n/4.
This is why "absent target" is the practical worst case — and why production systems often pre-check with a Bloom filter or hash set.
Case C: Skewed Distribution (Self-Organizing)¶
If element at position k is queried with probability p_k, and the array is fixed in some order:
This is minimized when high-frequency elements are at low indices. The optimal arrangement is decreasing-frequency order. This is the foundational result behind move-to-front and transpose heuristics:
- For Zipfian distribution (p_k ∝ 1/k), MTF achieves E[X] = O(log n) instead of Θ(n).
- For uniform distribution, MTF gives no improvement (E[X] = n/2 either way).
Lower Bound for Unordered Search¶
Theorem (Lower Bound): Any comparison-based algorithm that searches an unordered array of n elements for a target t requires Ω(n) comparisons in the worst case.
Proof (adversary argument):
Suppose algorithm A makes only k < n comparisons. We construct an adversary-controlled array where:
- The adversary delays committing to A's contents.
- For each of A's queries "is A[i] == t?", the adversary answers no (claims A[i] ≠ t).
- After k queries, at least one position j ∈ {1, …, n} has not been compared. The adversary places t at position j.
- But A has not inspected position j and must return -1 (or guess) — wrong answer.
Therefore, any correct algorithm must make at least n comparisons in the worst case. ∎
Implication: Linear search is asymptotically optimal for searching unordered arrays. You cannot do better than O(n) without preprocessing (sorting, hashing) or structural assumptions (sortedness, sparsity).
This is a profoundly important result: it tells you that for raw unordered data, the simple algorithm is the best algorithm.
What Doesn't Apply¶
The Ω(n) lower bound is for comparison-based algorithms. It can be beaten by: - Hashing — uses arithmetic, not just comparisons. Achieves O(1) average. - Bit-level tricks — for small-domain values (e.g., bitmaps). Achieves O(n/word_size). - Quantum algorithms — Grover's algorithm achieves O(√n). - Algebraic structure — if values satisfy XOR / arithmetic identities, e.g. "find missing number in 1..n."
Cache-Aware Complexity (External Memory Model)¶
The Model¶
Aggarwal & Vitter's external memory model (1988) parameterizes: - M = size of fast memory (cache or RAM). - B = block (cache line / page) size. - N = total dataset size.
Cost is measured in block transfers between fast and slow memory.
Linear Search in EM Model¶
A linear scan reads each block exactly once. Block reads:
For n = 10⁶ ints (4 MB), B = 16 ints (64-byte cache line), this is 62,500 cache-line reads. At ~10 ns per cache miss, that's 625 µs — but the hardware prefetcher and sequential access pattern reduce this to near-DRAM bandwidth (~10-20 GB/s), giving actual throughput closer to 200-400 µs.
Compare to Binary Search¶
Binary search jumps between widely separated locations. For n > M, each step likely costs a cache miss:
T_io(binary_search) = O(log_2(n / B)) cache misses near the top of the tree,
then settling into block-local access for the last log_2(B) levels.
≈ log_2(n) cache misses worst case
For n = 10⁶, that's ~20 cache misses per query, vs ~62500 for linear search. Binary search is ~3000× more cache-efficient for large n.
But for small n (n < M / B), the linear scan fits in cache after one pass, and subsequent searches hit cache. In hot-cache regimes, linear search's perfect access pattern wins by constant factors.
Cache-Oblivious Algorithms¶
Linear search is naturally cache-oblivious: it works optimally for any cache hierarchy without parameter tuning. This makes it a great primitive for cache-oblivious data structures (e.g., B-tree leaves, van Emde Boas layouts).
Parallel Complexity (PRAM)¶
The PRAM Model¶
P processors, shared memory, unit-cost memory access. We measure parallel time T_p(n) and work W_p(n) = p · T_p(n).
Parallel Linear Search¶
Algorithm: Partition A into p chunks of size n/p. Each processor searches its chunk. First match wins; communicate via a shared "found" flag.
Parallel time:
T_p(n) = O(n / p) — each processor scans its chunk
+ O(log p) — reduction to find the global minimum index
= O(n / p + log p)
For p = O(n / log n), this is O(log n). For p = n (one processor per element), parallel time is O(1) for the search itself, but O(log n) for the reduction.
Work:
For p ≤ n / log n, work is O(n) — work-optimal (no asymptotic overhead vs sequential).
Speedup¶
Theoretical speedup S(p) = T_1(n) / T_p(n) = n / (n/p + log p) ≈ p for p ≪ n.
Practical speedup is limited by: - Memory bandwidth — once all cores saturate the memory bus, adding cores doesn't help. - Coordination overhead — atomic ops on the "found" flag, branch mispredictions. - Load imbalance — if the target is in chunk 0, that worker finishes before others.
Empirical: 4-8× speedup is common; 16-64× speedup requires NUMA-aware partitioning and careful contention management.
Work-Span Tradeoff (Cilk Model)¶
Span (critical path length) for parallel linear search is O(log n) — the depth of the reduction tree. So parallelism is:
This bounds the maximum useful parallelism: for n = 10⁶, you can productively use up to ~50,000 processors before the log-n reduction becomes the bottleneck.
Distributed Linear Search¶
In a distributed setting (data partitioned across machines), the cost model adds network latency:
where α is per-message latency (~µs in a datacenter, ~ms across continents) and β is per-byte cost. For globally distributed search, the constant α dominates and a 100-machine search may have ~ms-level overhead just for coordination.
Information-Theoretic Bound¶
Decision Tree Lower Bound¶
Any deterministic comparison-based search algorithm corresponds to a decision tree: each internal node is a comparison, each leaf is an answer (an index or "not found").
For a search problem with n+1 possible outputs (positions 1..n, or "not found"), the decision tree has at least n+1 leaves. The minimum height of a binary tree with n+1 leaves is ⌈log₂(n+1)⌉.
So binary search is information-theoretically optimal for sorted arrays: O(log n) comparisons.
For unsorted arrays, the decision tree must distinguish not just "where is t?" but also "is t at position k vs position k+1 vs ..." with no a priori ordering. The adversary argument above shows this requires Ω(n) — strictly worse than the log n bound for sorted data.
Probabilistic Algorithms¶
Randomization doesn't help for unordered search: even with random comparison order, the expected number of comparisons to find a target uniformly distributed in n positions is (n+1)/2, and the worst case remains Θ(n).
Quantum Lower Bound¶
Grover's algorithm achieves O(√n) using quantum superposition (probing all positions simultaneously, then amplifying matching states). Bennett, Bernstein, Brassard & Vazirani (1997) proved this is tight: no quantum algorithm can search an unstructured space in fewer than Ω(√n) queries. Even quantum can't get to O(log n) for unsorted data.
Comparison to Other Search Algorithms¶
| Algorithm | Time | Space | Preconditions | Notes |
|---|---|---|---|---|
| Linear search | Θ(n) | O(1) | None | Cache-friendly, branch-predictable |
| Binary search | Θ(log n) | O(1) | Sorted | Cache-unfriendly for large n |
| Jump search | Θ(√n) | O(1) | Sorted | Fewer cache misses than binary for huge arrays |
| Interpolation search | O(log log n) avg / O(n) worst | O(1) | Sorted, uniform distribution | Beats binary on uniform data |
| Hash table | O(1) avg / O(n) worst | O(n) | Hash function, build cost | Best for repeated lookups |
| BST search | O(log n) avg, O(n) worst | O(n) | Built tree | Self-balancing variants (AVL, RB) for guarantees |
| Skip list search | O(log n) avg | O(n) | Probabilistic structure | Lock-free variants exist |
| Trie search | O(k) (k = key length) | O(N · k) | Built trie | Best for string keys with shared prefixes |
| Bloom filter check | O(k) | O(m bits) | Built filter | False positives, no false negatives |
| Grover (quantum) | O(√n) | O(log n) qubits | Quantum hardware | Theoretical |
Key Takeaways¶
-
No comparison-based algorithm beats Θ(n) without preprocessing. Sorting (Θ(n log n)) or hashing (Θ(n)) is the fundamental cost of getting sub-linear search.
-
Linear search has the best cache behavior of any search algorithm. For data already in or near cache, it's hard to beat.
-
Information theory and parallelism set hard limits. Linear search is Θ(n) sequential, Θ(n/p + log p) parallel, and hits a √n quantum floor. These are not engineering choices — they are mathematical bounds.
Summary¶
Linear search is the asymptotic baseline for unstructured search:
- Worst-case Θ(n) — provably optimal for unordered data (adversary argument).
- Average Θ(n/2) if target uniformly placed; Θ(3n/4) with 50% miss rate.
- External memory O(n/B) — perfect sequential access pattern.
- Parallel O(n/p + log p) — work-optimal for p ≤ n / log n.
- Information-theoretically tight for unordered data; can only be beaten by structural preprocessing (sort → log n; hash → 1) or quantum (√n).
Its theoretical simplicity belies its practical importance: it is the algorithm against which every other search is measured, and the algorithm that wins by default whenever preprocessing is impossible or wasteful.
In this topic