Sliding Window — Mathematical Foundations and Complexity Theory¶
Table of Contents¶
- Formal Definition
- Correctness via Loop Invariants
- Amortized O(n) Argument
- Lower Bounds
- Monotonicity Requirement
- Streaming Model
- Parallel Sliding Window
- Comparison Summary
- Summary
Formal Definition¶
Definition. Let A = (a_0, a_1, ..., a_{n-1}) be a sequence over a domain D.
A sliding-window algorithm SW = (S, phi, init, push, pop, query) where:
S : state space
phi : D* -> Bool predicate (the invariant)
init : S initial state
push : S x D -> S add an element to the right
pop : S x D -> S remove an element from the left
query: S -> R answer function
For each step r in [0, n-1] SW maintains a window [l_r, r] satisfying
phi(a_{l_r}, ..., a_r) = true, where l_r is monotonically non-decreasing in r,
and emits answer y_r = query(state at step r).
A sliding window is FIXED-SIZE if l_r = max(0, r - k + 1) (forced).
A sliding window is VARIABLE-SIZE if l_r is chosen to be the smallest index
such that phi holds (shortest-shrink discipline).
The monotone movement of l_r is the structural property that buys the linear time bound; if l_r were allowed to move backwards the algorithm would no longer be a sliding window in any meaningful sense.
Correctness via Loop Invariants¶
We prove correctness of the canonical variable-size sliding window for the problem "longest contiguous segment of a satisfying monotone predicate phi".
ALGORITHM. LongestSlidingWindow(a, phi, push, pop, init)
state <- init
l <- 0
best <- 0
for r = 0 .. n - 1:
state <- push(state, a[r])
while not phi(state):
state <- pop(state, a[l])
l <- l + 1
best <- max(best, r - l + 1)
return best
Loop invariant I. At the top of every iteration with index r:
(i) state == fold(init, push, a[l..r-1]) # state exactly summarises window [l..r-1]
(ii) phi(state) == true # window [l..r-1] is valid
(iii) l is the smallest non-negative index for which (ii) holds
(iv) best == max over (r', l') of (r' - l' + 1) where window [l'..r'-1] is valid
Base case (r = 0). Before the first iteration, state = init, l = 0, best = 0. The window [0..-1] is empty; (i)-(iii) trivially hold because phi is assumed to hold on the empty window (standard assumption for monotone invariants); (iv) holds because no non-empty windows have been examined.
Inductive step. Assume I holds at the top of iteration r. After the line state <- push(state, a[r]):
- The state now equals
fold(init, push, a[l..r]), so (i) holds withr-1replaced byr. phi(state)may now be false (we just expanded).
The while loop pops a[l] while phi(state) is false. Each pop preserves (i) for the smaller window [l+1..r] and increments l. The loop terminates because either phi becomes true at some l <= r (possibly producing an empty window with l = r + 1 if even [r..r] violates phi) or l exceeds r, at which point the empty-window assumption restores phi.
After the while exits, phi(state) is true and l is the smallest index for which phi holds with right endpoint r (because the loop stopped at the first index that restored phi). Thus (ii) and (iii) hold.
The next line updates best <- max(best, r - l + 1), which extends (iv) by exactly the maximum-length valid window ending at r. Together with the previous best, (iv) holds.
Termination. r strictly increases by 1 each iteration of the outer loop; the loop runs exactly n times. The inner while strictly increases l, and l <= r + 1 is bounded.
Postcondition. At loop exit, r = n and (iv) holds over all r' in [0, n-1] and all windows. Therefore best equals the longest valid window length. QED.
The same skeleton applies to fixed-size windows; replace step r's push/pop pair with a forced pair that moves both pointers in lockstep.
Amortized O(n) Argument¶
We give two proofs that the variable-size template runs in O(n) time, both using the standard amortized-analysis toolkit.
Aggregate method¶
Let T(n) = total time over all n iterations.
For each iteration r:
- exactly one push (constant time, by assumption)
- some number k_r >= 0 of pops (each constant time)
- one max comparison (constant)
T(n) = sum over r of (1 + k_r + 1) = 2n + sum k_r
Sum k_r is the total number of pops over the run. Since l can only INCREASE
and l <= n (it never exceeds the rightmost index), the total number of pops
is at most n.
Therefore T(n) <= 2n + n = 3n = O(n).
Potential method¶
Define potential Phi(s) = (number of elements currently in the window)
= (r + 1 - l) at state s after iteration r.
Phi(s_0) = 0 (empty window before iteration 0).
Phi(s) >= 0 always.
For iteration r:
actual cost c_r = 1 (push) + k_r (pops) + 1 (compare)
delta Phi = +1 (push) - k_r (each pop reduces window length by 1)
Amortized cost a_r = c_r + delta Phi = (1 + k_r + 1) + (1 - k_r) = 3.
Total amortized = 3n; total actual = total amortized + Phi(s_0) - Phi(s_n)
<= 3n + 0 - 0 = 3n.
So T(n) = O(n).
Both proofs depend on two structural facts of sliding window: 1. The pointer l is monotonically non-decreasing. 2. push and pop cost O(1) (given a suitable state — hash map ops are amortized O(1); monotonic deque ops are amortized O(1) for the same reason).
If either fails — say, the state has O(log n) per-op cost (a balanced BST), or you allow l to back up — the bound degrades to O(n log n) or worse.
Lower Bounds¶
Theorem (read-once lower bound).
Any algorithm that produces query(window) for every right endpoint
r in [0, n-1] must inspect each input element at least once.
Therefore any sliding-window algorithm is Omega(n).
Proof. Suppose an algorithm A produces the correct output sequence
y_0, ..., y_{n-1} without reading a[i] for some i. Construct two
inputs that differ only in a[i] but whose correct outputs disagree
at some r >= i (always possible for nontrivial query). A produces the
same output on both inputs (since it does not read a[i]). Contradiction.
Therefore A reads every a[i]. Time is Omega(n). QED.
The matching upper bound O(n) from the amortized argument shows sliding-window is asymptotically tight: Theta(n).
Theorem (window-max lower bound).
Computing max of every length-k window of a length-n array, in the
COMPARISON model, requires Omega(n) comparisons even with k fixed.
Sketch. Each output entry is determined by inspecting at most k inputs;
an adversary can force every right shift to require at least one new
comparison. Tighter bounds (Omega(n log k)) hold without amortization;
the monotonic deque achieves O(n) amortized which is optimal.
This is what makes the monotonic deque trick from 14-monotonic-queue optimal — it matches the read-once lower bound.
Monotonicity Requirement¶
The variable-size template depends on a structural property called monotonicity of the invariant under expansion:
Definition. A predicate phi over windows is monotone (closed under shrink) if:
for all l <= l' <= r, phi(a[l..r]) = true implies phi(a[l'..r]) = true.
Equivalently: removing elements from the LEFT can only preserve or
restore validity, never break a valid window.
When phi is monotone the shrink-while loop is sound: once shrinking restores phi, every subsequent shorter window is also valid (so we have found the smallest valid l). If phi is not monotone, shrinking might overshoot a valid window and miss the answer.
Examples and counter-examples¶
| Predicate | Monotone? | Reason |
|---|---|---|
sum(window) <= K with a[i] >= 0 | Yes | Removing non-negatives can only decrease sum. |
sum(window) <= K with mixed signs | No | Removing a negative INCREASES sum, can break validity. |
count of distinct chars <= k | Yes | Removing chars can only reduce distinct count. |
count of distinct chars == k (exactly) | No | Removing chars can drop below k. |
| "all 26 letters present" | Yes | Adding letters can only restore; equivalently shrinking can only break. |
max(window) - min(window) <= K | Yes | Removing elements can only reduce the gap. |
window contains every char of target T | Yes | Removing one occurrence can break the membership requirement. |
For non-monotone invariants, sliding window is the wrong tool. The standard fallback is prefix-sum + hash map (for "exact sum"), or sorted-multiset with O(log n) operations (for == constraints), or a different algorithmic approach entirely (DP, segment tree, two-pointers with bookkeeping).
Streaming Model¶
The sliding window is the prototypical algorithm for the streaming model of computation:
Streaming model.
Input: x_1, x_2, x_3, ... arrives one element at a time.
Memory budget: poly(log n) or O(W) for window of size W.
Output: after seeing any prefix, answer queries about the recent W elements.
Classical theory results for sliding window in the streaming model:
| Problem | Best known memory | Reference |
|---|---|---|
| Count of 1s in last W bits | O(log^2 W / eps) for (1+eps)-approx | Datar-Gionis-Indyk-Motwani 2002 (exponential histograms) |
| Sum of last W reals | O(log W / eps) for (1+eps)-approx | DGIM extension |
| Distinct elements in last W | O(log^2 W * (1/eps) log(1/eps)) | Various — see Gibbons-Tirthapura |
| Heavy hitters in last W | O((1/eps) log W) | Misra-Gries variant |
| Frequency moments F_k | Sub-linear sketches | Charikar-Chen-Farach-Colton |
The takeaway: exact sliding-window aggregates over an infinite stream of count window W require Theta(W) memory. Approximate aggregates within (1 +/- eps) can use polylog memory. Production systems exploit this aggressively — see senior.md on HyperLogLog (distinct), Count-Min Sketch (heavy hitters), t-digest (quantiles), all of which are mergeable across shards.
Cross-link to 22-randomized-structures for sketch families and to 21-advanced-structures for exponential histograms.
Parallel Sliding Window¶
Sliding window resists parallelization because the answer at position r formally depends on the entire prefix up to r. Three production parallelization strategies:
Strategy 1 — Chunk with boundary fixup¶
Split A into chunks C_1, C_2, ..., C_p of size n/p each.
For each chunk in parallel, run sliding window LOCALLY.
For each boundary between C_i and C_{i+1}, recompute the windows that
straddle the boundary (at most W of them).
Cost: O(n/p) per worker + O(W p) merge -> O(n/p + W p).
Speedup is optimal at p = sqrt(n/W).
This works for monotone, decomposable aggregates (sum, count, max). It does not work for inherently sequential aggregates (running median where the boundary depends on the entire prefix's distribution).
Strategy 2 — Mergeable aggregate sketches¶
Build per-chunk sketches that merge in O(polylog) time:
- t-digest for quantiles.
- HyperLogLog for distinct count.
- Count-Min Sketch for heavy hitters.
- Reservoir samples for random samples.
Each chunk produces a sketch in O(n/p); merging p sketches takes O(p polylog); querying takes O(polylog). Total O(n/p + p polylog), optimal for p = sqrt(n / polylog).
Strategy 3 — SIMD / vectorized per-step¶
For fixed-size sum windows on numeric arrays, SIMD instructions process 4-16 elements per cycle. The sequential dependence is the prefix sum of differences; modern CPUs have vpaddq plus shuffle that compute prefix sums in parallel within a 256-bit register.
In practice the production wins come from Strategy 1 + 2 combined; SIMD wins on hot inner loops in databases and analytical engines (DuckDB, ClickHouse).
Comparison Summary¶
| Aspect | Sliding window | Prefix sum | Segment tree |
|---|---|---|---|
| Time per step | O(1) amortized | O(1) after O(n) precompute | O(log n) |
| Mutations | Streaming | Static | O(log n) |
| Range query shape | Contiguous, monotone left pointer | Arbitrary range | Arbitrary range |
| Distinct elements | Yes (with set state) | No (not composable) | Mergeable variants exist |
| Negatives in sum | No | Yes | Yes |
| Parallel friendliness | Hard (sequential pointer) | Easy (prefix scan is parallel) | Easy (tree merge) |
| Cache locality | Excellent (linear scan) | Excellent | Good (tree levels) |
| Worst-case memory | O(W) | O(n) | O(n) |
Summary¶
The sliding window achieves Theta(n) time exactly because its left pointer is monotonically non-decreasing; this single structural property powers both the loop-invariant correctness proof and the amortized aggregate/potential analysis. The matching Omega(n) lower bound comes from the read-once requirement on any algorithm that emits a per-position answer. The variable-size template demands a monotone invariant — without monotonicity, shrink can overshoot and the algorithm is incorrect, which is why mixed-sign sum constraints, exact-count equality, and order statistics all need prefix sums, hash maps, or balanced trees instead. Streaming and parallel settings extend the picture: exact aggregates over infinite streams need Omega(W) memory; approximate aggregates compress to polylog via mergeable sketches; parallelization is bounded by the sequential prefix dependence but admits chunk-with-boundary-fixup and mergeable-sketch strategies that hit O(n/p + sqrt(n)) work.