MEX -- Mathematical Foundations¶
Table of Contents¶
- Formal Definition
- The MEX <= n Bound by Pigeonhole
- Correctness Proof for the Bucket Algorithm
- Amortized Analysis -- Incremental Insert-Only MEX
- Persistent Segment Tree Range MEX
- Lower Bound for Offline Range MEX
- Sprague-Grundy Theorem
- Cache-Oblivious Analysis
- Information-Theoretic Notes
- Summary
Formal Definition¶
Let S be a finite subset of the non-negative integers Z_{>=0}. The minimum excludant of S is
The minimum is well-defined because Z_{>=0} is unbounded above, so the candidate set is non-empty (e.g., |S| + 1 always works -- see the pigeonhole bound below).
For a finite multiset M (or equivalently an array a[0..n-1]), define
Multiplicities do not affect MEX -- only set-presence matters. This is why all MEX algorithms internally collapse multisets to their support.
The MEX <= n Bound by Pigeonhole¶
Theorem. For any multiset M with |M| = n, MEX(M) <= n.
Proof. Consider the n + 1 candidate values {0, 1, ..., n}. The support of M has at most n distinct values. By the pigeonhole principle, at least one element of {0, 1, ..., n} is missing from the support. Therefore there exists k in {0, 1, ..., n} with k not in M, so MEX(M) <= k <= n. Q.E.D.
Tightness. The bound is tight: for M = {0, 1, ..., n-1}, MEX(M) = n. So no algorithm can replace the +1 with a smaller constant.
Corollary. Any value v > n in M is irrelevant to the MEX computation and may be ignored. This is the structural fact that powers the O(n)-time, O(n)-space bucket algorithm: only the marks for values in [0, n] matter.
Correctness Proof for the Bucket Algorithm¶
BucketMEX(a[0..n-1]):
let seen[0..n] = all false
for i = 0 to n-1:
if 0 <= a[i] <= n:
seen[a[i]] = true
for k = 0 to n:
if not seen[k]:
return k
return n + 1 // unreachable
Claim. BucketMEX(a) = MEX(a).
Proof.
Loop invariant of the first loop (marking). After processing a[0..i-1], for every k in [0, n], seen[k] = True iff (exists j in [0, i-1] : a[j] == k).
Base case i = 0: vacuously true.
Inductive step: in iteration i, if a[i] in [0, n] we set seen[a[i]] = True, which is the only change. For all other k, seen[k] is preserved, so the invariant carries to i+1.
After the first loop runs to i = n, the invariant gives seen[k] = True iff k in support(a) and k in [0, n].
Second loop (scan). The loop returns the smallest k in [0, n] with seen[k] = False, i.e., the smallest k in [0, n] not in support(a). By the pigeonhole bound, MEX(a) in [0, n], so this k is exactly MEX(a).
If the loop completes without returning, every k in [0, n] is in support(a), so MEX(a) > n. But by the pigeonhole bound MEX(a) <= n -- contradiction. So this path is unreachable for finite a with |a| = n. Q.E.D.
Time. Two loops of n + 1 iterations: O(n).
Space. Bucket of size n + 1: O(n).
Amortized Analysis -- Incremental Insert-Only MEX¶
When the workload is "insert-only, report MEX after each insert," we can achieve amortized O(1) per insert.
IncMEX:
seen[0..N] = all false // N is the upper bound on inserts
mex = 0
Insert(v):
if 0 <= v <= N: seen[v] = true
while seen[mex]:
mex = mex + 1
return mex
Theorem. A sequence of n inserts runs in total time O(n), giving amortized O(1) per insert.
Aggregate method¶
Count work in two buckets:
- Marking: each insert performs one O(1) write to
seen. Total marking work acrossninserts: O(n). - Pointer advance: the inner
whileloop incrementsmex. Each increment is permanent --mexnever decreases under insert-only. The pointer starts at 0 and ends at most atn(by the pigeonhole bound). Total increments across all inserts: at mostn.
Sum: O(n) total, amortized O(1) per insert. Q.E.D.
Potential method (alternative)¶
Define the potential function
Phi is non-negative because mex is monotone non-decreasing under insert-only and bounded above by n.
Amortized cost of an insert that advances mex by delta:
So the amortized cost per insert is exactly 1 -- O(1) amortized. The actual cost can be O(n) for a single insert that finally advances mex through a long pre-marked prefix, but the prepaid potential covers it. Q.E.D.
Worst-case single-insert cost¶
The amortized bound hides a worst-case single insert of O(n). Example: insert 1, 2, 3, ..., n in order, then insert 0. The first n inserts each take O(1) (mex stays at 0). The n+1-th insert advances mex from 0 to n+1, costing O(n). The amortized average is still O(1) -- but for latency-sensitive systems, the burst matters; see senior.md for instrumentation.
Persistent Segment Tree Range MEX¶
Problem. Given a static array a[0..n-1] and q online queries (l, r), compute MEX(a[l..r]) per query.
Structure¶
For each prefix a[0..i], store a persistent segment tree over the value range [0, n]. Each leaf v in [0, n] stores last[v] = the largest index j <= i such that a[j] = v, or -1 if no such index exists. Internal nodes store the min of last[v] over leaves in their subtree.
Persistent updates: when transitioning from prefix i-1 to prefix i, only the path from root to leaf a[i] changes -- O(log n) new nodes. Total memory across all n + 1 versions: O(n log n).
Query¶
To answer MEX(a[l..r]):
- Take the version for prefix
r(which capturesa[0..r]). - Descend the tree, preferring the left subtree when its
min(last[v]) < l-- that subtree contains a value whose last occurrence ina[0..r]is beforel, so that value is missing ina[l..r]. - Otherwise descend right (and the answer's leaf index has an offset added).
- Terminate at a leaf; its index is the MEX.
RangeMEX(version_r, l):
node = root(version_r)
answer = 0
while node is internal:
if min(node.left) < l:
node = node.left
else:
answer += span(node.left)
node = node.right
return answer
Theorem. Each query runs in O(log n) time.
Proof. The descent visits at most one node per level. The tree has O(log n) levels. Each per-node operation is O(1). Q.E.D.
Theorem. The structure uses O(n log n) preprocessing time and space.
Proof. n + 1 persistent versions; each adds O(log n) new nodes; each takes O(log n) work to construct (rebuild the path). Total: O(n log n) time and space. Q.E.D.
Correctness¶
Claim. A value v is missing in a[l..r] iff last_{r}[v] < l, where last_{r}[v] is the last occurrence of v in a[0..r].
Proof. If last_{r}[v] < l, the last occurrence in a[0..r] is at index < l, so no occurrence falls in [l, r]. Conversely, if v in a[l..r], then some occurrence index j in [l, r] exists, so last_{r}[v] >= j >= l. Q.E.D.
The descent's preference for "left subtree whose min < l" therefore finds the leftmost leaf that satisfies "missing in [l, r]," which is the MEX.
Lower Bound for Offline Range MEX¶
Problem. Given a static array a[0..n-1] and q offline queries (l_i, r_i), compute MEX(a[l_i..r_i]) for each.
Theorem (informal). Under natural computational models (cell-probe, comparison), any offline range MEX algorithm requires Omega((n + q) sqrt(n)) time in the worst case for q = Theta(n) queries.
Sketch of the argument¶
The lower bound comes from a reduction from offline range mode and from multi-pointer Mo's-style lower bounds. The intuition:
- Output size. Each query produces one integer, so
qqueries requireOmega(q)output time. - Information per query. A range MEX query is sensitive to the support of
a[l..r]. Adversary inputs exist where two queries(l, r)and(l, r+1)have MEX values that depend on whethera[r+1]equals the current MEX -- a comparison decision. - Cell-probe lower bound. For balanced query distributions, any data structure must probe
Omega(sqrt(n))cells per query in the worst case for the range-mode problem; range MEX inherits this lower bound through a reduction in which the mode element is the MEX of a transformed array.
The matching upper bound is Mo's algorithm combined with a sqrt-decomposed missing-min structure, achieving O((n + q) sqrt(n)) total time. This matches the lower bound up to constants and is the optimal known offline algorithm.
For online range MEX (queries arrive one at a time, each must be answered before the next), the persistent-segment-tree approach achieves O(log n) per query -- strictly better than the offline Mo's bound. This is one of the rare cases where online beats offline; the difference comes from the persistent structure paying O(n log n) preprocessing up front rather than amortizing across queries.
Reduction sketch (range-mode -> range-MEX)¶
Given an array b[0..n-1] for the range-mode problem, build a[0..n-1] by replacing each b[i] with a unique tag if it is the first occurrence of b[i], else leave it. Then MEX of a[l..r] corresponds to information about which values are absent, which transitively encodes mode information. A full formal reduction requires care; the takeaway is that range MEX is at least as hard as range mode in standard models.
Sprague-Grundy Theorem¶
The Sprague-Grundy theorem is the theoretical foundation of impartial-game analysis. MEX is its defining operation.
Theorem (Sprague 1935, Grundy 1939)¶
Let G be an impartial game (both players have the same moves available, the last player to move wins). Every game position p has a non-negative integer G(p), called its Grundy number, satisfying:
A position p is a losing position for the player to move (a P-position) iff G(p) = 0.
Sum of games¶
If H = G_1 + G_2 + ... + G_k is the sum of independent impartial games (a move in H is a move in exactly one component), then
This is the Sprague-Grundy theorem for sums: the XOR of Grundy values determines the value of the sum.
Why MEX specifically?¶
The MEX choice is essential. Consider an alternative G'(p) = 1 + max { G'(p') : ... }. With this alternative:
- The single-game P-position characterization still holds (G' = 0 iff terminal), but
- The XOR-sum formula breaks: there is no clean composition law.
MEX makes the XOR-sum equivalence work because of a clean bijection. For any non-negative integer g and target g XOR x, the player can move from a position of Grundy g to a position of Grundy g XOR x in at least one component as long as the component's Grundy set contains every value below it. The MEX guarantees this: by definition MEX(S) = k means {0, 1, ..., k-1} subset S, so from a position of Grundy k there is a move to every position of Grundy 0, 1, ..., k-1.
Proof sketch (Sprague-Grundy sum)¶
Let H = G_1 + G_2. Define f(H) = G(G_1) XOR G(G_2). We show G(H) = f(H) by induction on game depth.
Base case. Both games terminal: G(G_1) = G(G_2) = 0, f = 0, H is terminal so G(H) = 0. Match.
Inductive step. Assume the claim for all sub-positions of H.
Show: every Grundy value < f(H) is reachable from H. Pick any x < f(H). Let d = f(H) XOR x. The highest set bit of d is set in f(H), so it is set in exactly one of G(G_1), G(G_2) -- say G(G_1). Then G(G_1) XOR d < G(G_1), so by the MEX property of single-game Grundy values, G_1 has a move to a sub-position p' with G(p') = G(G_1) XOR d. Moving in G_1 to p' reaches a sum with Grundy (G(G_1) XOR d) XOR G(G_2) = G(G_1) XOR G(G_2) XOR d = f(H) XOR d = x. Reachable.
Show: f(H) itself is not reachable from H. Any move from H is a move in exactly one component, say to p' in G_1. The new Grundy is G(p') XOR G(G_2). If this equals f(H) = G(G_1) XOR G(G_2), then G(p') = G(G_1) -- but by the MEX property, G(G_1) is not reachable from G_1 in one move. Contradiction.
So G(H) = MEX(reachable) = f(H). Q.E.D.
This proof is the reason MEX is the operation. Replace it with anything else and the XOR composition breaks.
Cache-Oblivious Analysis¶
The bucket-MEX algorithm has excellent cache behavior. Parameters: N = n (array size), M = cache size, B = block size.
First loop (marking):
Read a[0..n-1] sequentially: O(N / B) misses.
Write seen[a[i]] -- random pattern: O(N / B) misses if seen fits in cache,
O(N) misses otherwise (random access).
Second loop (scan):
Read seen[0..n] sequentially: O(N / B) misses.
If seen fits in cache (i.e., N <= M), the total miss count is O(N / B) -- optimal. Otherwise (N > M), the random-write pattern in the first loop can degrade to one miss per write, giving O(N) misses.
Cache-conscious variant for large N¶
When the value range is much larger than cache, bucket the input before marking:
For very large n:
1. Partition a[0..n-1] into k = ceil(n * sizeof(int) / M) blocks
so each block's distinct values fit in cache.
2. For each block, allocate a per-block bitset, mark within it.
3. Merge bitsets sequentially, OR-ing into a global bitset.
4. Scan the global bitset for the first clear bit.
The marking phase now does O(N / B) misses (sequential writes within each cache-resident bitset). The merge phase is sequential. Total: O(N / B * log_{M/B}(N/B)) -- the same Funnelsort-like bound as cache-oblivious merging.
When bitset wins¶
For very large n (say n = 10^9), using a packed bitset (1 bit per value) instead of a bool[] (1 byte per value, or worse in Java due to boxing) reduces memory by 8x and reduces cache pressure proportionally. BitSet.nextClearBit in Java is hardware-accelerated and matches the cache-oblivious lower bound on modern hardware.
Information-Theoretic Notes¶
Output entropy¶
For a uniformly random multiset M of size n drawn from [0, n], the distribution of MEX(M) is concentrated near n/2. Standard combinatorial arguments give:
The expected MEX for random input is Theta(n / log n) -- much smaller than the worst-case n. This is why in practice the incremental insert-only MEX's mex pointer stays small for most workloads, and why production observability often shows mex << n.
MEX as a hash¶
A MEX-based scheme cannot serve as a cryptographic hash -- the output is highly structured (small integers) and adversarially predictable. But MEX-of-Grundy is essentially a canonical hash of an impartial game's position class: two positions with the same Grundy value are interchangeable in any sum-of-games context. This is why Grundy values are sometimes called Nim values.
Connection to first-missing-positive¶
LeetCode 41 (First Missing Positive) is a 1-indexed variant of MEX restricted to positive integers. The classic O(n)-time O(1)-space in-place algorithm (cycle sort with index = value swaps) is a clever realization of the bucket approach that reuses the input array as the bucket. The pigeonhole bound answer in [1, n+1] is the same argument with a shifted base.
Comparison with Alternatives¶
| Attribute | Bucket MEX | Sort + scan | Hash set | Segment tree (dynamic) | Persistent seg tree |
|---|---|---|---|---|---|
| Time | O(n) | O(n log n) | O(n) avg | O(log n) per op | O(log n) per query |
| Space | O(n) | O(1) extra | O(n) | O(n) | O(n log n) |
| Static / dynamic | Static | Static | Static | Dynamic | Static, online range queries |
| Cache I/Os | O(N/B) | O((N/B) log(N/M)) | O(N) worst | O(log n * N/B) | O(log n * N/B) |
| Deterministic | Yes | Yes | No (hash) | Yes | Yes |
Summary¶
The MEX function admits a tight MEX <= n bound by pigeonhole, an O(n) bucket algorithm provable by loop invariant, and an amortized O(1) per-insert incremental variant via the potential method. Persistent segment trees lift offline-only range MEX to online O(log n) per query, while Mo's algorithm achieves the matching offline lower bound of Omega((n + q) sqrt(n)).
The Sprague-Grundy theorem makes MEX the canonical operation of impartial-game analysis -- it is the unique choice that allows clean XOR composition of game sums. Cache-oblivious analysis favors the bucket form for arrays that fit in memory and the bitset variant for very large n. The algorithmic depth is small; the connections to game theory and to the cell-probe lower bounds are what make MEX worth proving carefully.