MEX -- Senior Level¶
Table of Contents¶
- Introduction
- Production Use Cases
- Distributed Sprague-Grundy Game Engines
- ID Allocators and Slot Recycling
- Telemetry Gap Detection
- Range MEX in Analytical Query Engines
- Distributed MEX in Sharded Systems
- Online MEX with Concurrent Inserts and Deletes
- Lock-Free MEX Structures
- Observability
- Failure Modes
- Capacity Planning
- Comparison with Alternatives
- Summary
Introduction¶
Focus: "How does MEX scale across cores, shards, and machines, and how do you observe it in production?"
At senior level MEX is no longer the one-liner "scan for the first missing index." It shows up as a load-bearing primitive in:
- ID allocators that recycle slots in long-running systems (file descriptors, session tokens, PID tables)
- Distributed game engines computing Sprague-Grundy values across sharded position tables
- Streaming telemetry gap detection (which sequence number did we never receive?)
- Range-MEX queries inside analytical engines over append-only event logs
In each of these the interesting questions are not "what is the smallest missing value" but rather "how do I compute it correctly when the data lives across N shards, and how do I observe its health when the working set drifts over hours of uptime?"
Production Use Cases¶
Distributed Sprague-Grundy Game Engines¶
Game-theoretic AIs for impartial games (Nim variants, hackenbush, sprouts, large-board subtraction games) compute Grundy numbers offline, store them in a position table, and serve them online to the move-search engine. For board sizes where the state space exceeds single-node memory (10^9+ positions), the position table is sharded by state hash across a fleet of workers.
The MEX-of-reachable computation now becomes a distributed reduction: for each position p, the worker emits (p, G(p_next)) for every successor p_next; a downstream reducer collects all reachable Grundy values for a given p and computes their MEX. The reducer cannot start until all upstream emissions for p are complete -- otherwise the MEX may exclude a Grundy value that arrives late.
Implementation strategies:
- Topological wave: sort positions by depth; process all positions at depth
d+1only after depthdis fully written. Each wave reads only the previous wave's results -- a clean BSP boundary. - Iterative until fixed point: for games with cycles in the position graph (rare for impartial games, common for partial-information variants), iterate until no Grundy value changes. Convergence is guaranteed but the per-iteration cost is the full position count.
The MEX operation per position is bounded by |moves| (the branching factor), so the per-position work is O(b). The total work is O(S * b) where S is the state-space size. For Nim-like games with b = O(polylog S), this is essentially linear in the state space. Production engines parallelize the per-position MEX trivially -- each position's reachable Grundy set is independent.
ID Allocators and Slot Recycling¶
Long-running services allocate and free numeric IDs constantly: file descriptors, session tokens, PIDs, slot indices in a thread-local cache, in-memory page numbers in a buffer pool. Two common allocation strategies:
- Monotonic counter: never recycle. Simple, but wastes IDs and eventually overflows. Used for log offsets where you never want reuse.
- MEX-based recycling: the next allocated ID is the MEX of the currently allocated set. Keeps IDs compact, reuses freed slots in bounded space.
The MEX-based version is what select()-style file-descriptor allocation does internally (allocate the smallest unused fd). Postgres uses a similar pattern for buffer pool slot reclamation. Redis uses MEX-like reasoning when assigning slot numbers to lazily-freed objects.
In multi-threaded systems the MEX-allocator must support concurrent insert/delete. A naive implementation locks a TreeSet of free IDs; high-throughput variants shard the free-list per core to avoid contention. Each core's MEX is local; on cross-core allocation the requesting core scans its peers' free lists in priority order. The global MEX is min(per-core MEX); you allocate from whichever core won.
Telemetry Gap Detection¶
A monitoring pipeline ingests events tagged with monotonically increasing sequence numbers per source. The question "which sequence numbers did we never receive?" is MEX-shaped: the answer is the smallest sequence number not present in the ingested set.
In practice the input is unbounded (event stream), so the structure must be windowed: keep a bitset of "seen sequence numbers within the last 60 seconds" and report the MEX of that window. As sequence numbers fall outside the window they're forgotten; new sequence numbers extend the window's right edge.
The MEX-pointer drift metric (described in the observability section below) tells operators whether the input stream is well-formed (MEX advances at ~the input rate) or stalled (MEX advances slowly, indicating gaps).
Range MEX in Analytical Query Engines¶
Some OLAP systems support queries like "find the smallest event type that never occurred in the time range [t1, t2]." This is range MEX over an event-type-id array indexed by time. Production implementations build a persistent segment tree at ingestion time, keyed by event-type, storing the last-occurrence timestamp. The MEX query descends the tree looking for the leftmost event type whose last occurrence falls before t1.
The complexity is O(log V) per query where V is the cardinality of event types -- typically a few thousand, so each query is microseconds. The persistent tree is rebuilt incrementally as new events arrive, which keeps the ingestion-side overhead bounded at O(log V) per event.
Distributed MEX in Sharded Systems¶
A common requirement: compute the MEX of a set whose elements live across k independent shards.
Naive: gather and recompute¶
Every shard ships its full set to a coordinator; the coordinator unions them and computes MEX. Correct but expensive: O(N) network bytes per query. Only acceptable when the set is small or the query rate is low.
Partial MEX with bound exchange¶
Each shard i computes a partial MEX -- the MEX of its local set -- and ships only (local_mex_i, presence_bitset_below_threshold_T) to the coordinator. The threshold T is chosen so the presence bitset is bounded (e.g., 64 KB = 524288 bits). The coordinator OR-merges the bitsets and computes the global MEX in one pass.
Correctness: if the global MEX M < T, the OR-merged bitset's first clear bit is M exactly. If M >= T (rare in practice), the coordinator falls back to a second round that requests more bits from each shard. Adaptive T keeps the common case fast.
Bloom-style probabilistic MEX¶
For approximate MEX in extreme-scale settings, each shard maintains a Bloom filter of its set; the coordinator checks the union of filters in order 0, 1, 2, ... and returns the first index that all filters report as absent. False positives bias the answer downward (you may return a value that is actually present); false negatives are impossible. Useful for "give me a small unused ID quickly" workloads where occasional collision is recoverable via a retry.
Pseudocode -- partial-MEX coordinator¶
function distributed_mex(shards, T):
bitsets = []
for shard in shards in parallel:
local = shard.compute_local_mex_and_bitset(threshold=T)
bitsets.append(local.bitset)
merged = OR_all(bitsets)
m = merged.first_clear_bit()
if m == T:
# threshold was too small; recurse with larger T
return distributed_mex(shards, T * 2)
return m
Go: coordinator skeleton¶
package main
import (
"context"
"math/bits"
"sync"
)
type ShardClient interface {
LocalBitset(ctx context.Context, threshold int) ([]uint64, error)
}
// DistributedMEX merges per-shard presence bitsets via OR and returns the
// global MEX. The threshold T is the bit-length of each shard's bitset.
// Time: O(T/64 * k) for k shards. Network: O(T/8) bytes per shard.
func DistributedMEX(ctx context.Context, shards []ShardClient, T int) (int, error) {
words := (T + 63) / 64
merged := make([]uint64, words)
var mu sync.Mutex
var wg sync.WaitGroup
errCh := make(chan error, len(shards))
for _, s := range shards {
wg.Add(1)
go func(sc ShardClient) {
defer wg.Done()
bs, err := sc.LocalBitset(ctx, T)
if err != nil {
errCh <- err
return
}
mu.Lock()
for i := 0; i < len(bs) && i < words; i++ {
merged[i] |= bs[i]
}
mu.Unlock()
}(s)
}
wg.Wait()
close(errCh)
if err := <-errCh; err != nil {
return 0, err
}
for i, w := range merged {
if w != ^uint64(0) {
return i*64 + bits.TrailingZeros64(^w), nil
}
}
// All bits set; recurse with larger threshold (caller decides retry policy).
return T, nil
}
Java: parallel partial-MEX merge¶
import java.util.BitSet;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class DistributedMEX {
interface ShardClient {
BitSet localBitset(int threshold);
}
public static int compute(List<ShardClient> shards, int threshold) {
ExecutorService pool = Executors.newFixedThreadPool(shards.size());
BitSet merged = new BitSet(threshold);
List<CompletableFuture<BitSet>> futures = shards.stream()
.map(s -> CompletableFuture.supplyAsync(() -> s.localBitset(threshold), pool))
.toList();
for (CompletableFuture<BitSet> f : futures) {
synchronized (merged) { merged.or(f.join()); }
}
pool.shutdown();
int m = merged.nextClearBit(0);
return Math.min(m, threshold); // caller retries on equality
}
}
Python: asyncio coordinator¶
import asyncio
from typing import Callable, List
async def distributed_mex(shard_fetchers: List[Callable], threshold: int) -> int:
"""
shard_fetchers: each is an async callable returning a bytearray bitset.
Returns global MEX, or threshold if all bits below threshold are set.
"""
bitsets = await asyncio.gather(*(f(threshold) for f in shard_fetchers))
merged = bytearray((threshold + 7) // 8)
for bs in bitsets:
for i in range(len(merged)):
merged[i] |= bs[i]
for i, byte in enumerate(merged):
if byte != 0xFF:
for bit in range(8):
if not (byte >> bit) & 1:
return i * 8 + bit
return threshold
Online MEX with Concurrent Inserts and Deletes¶
For high-throughput single-node MEX services (e.g., a session-ID allocator handling 200k requests per second) the structures from middle.md need to be made thread-safe without becoming serialization points.
Coarse-grained lock¶
Wrap the TreeSet-of-missing in a single mutex. Simple, correct, scales poorly above ~50k ops/sec on a typical commodity server. Use this as the baseline.
Striped locks over value buckets¶
Partition the value range [0, cap] into B buckets. Each bucket owns its own lock and a small bitset. The MEX is min(per-bucket MEX); insert/delete acquire only the bucket lock. Throughput scales near-linearly with B until contention on the global MEX-pointer dominates.
Lock-free linked list of "free ranges"¶
Maintain a sorted lock-free linked list of contiguous "free" ranges. Insert(v) removes v from its containing range, possibly splitting it into two; delete(v) inserts v into its successor or predecessor range, possibly merging. MEX is the smallest endpoint of the first range. Implementations use Harris-Michael lock-free linked lists with hazard pointers or epoch-based reclamation.
Approximate MEX with sloppy counter¶
For workloads where strict MEX is not required (e.g., "any small unused ID"), each thread maintains a local "best guess" MEX and only synchronizes with the global structure on conflict. Throughput is excellent; correctness is bounded by an O(thread-count) skew.
Go example: striped MEX allocator¶
package main
import (
"math/bits"
"sync"
)
const stripeBits = 16 // 65536 IDs per stripe
type StripedMEX struct {
stripes []*stripe
}
type stripe struct {
mu sync.Mutex
bits []uint64 // 1 = allocated
minFree int // cached MEX hint
}
func NewStripedMEX(numStripes int) *StripedMEX {
s := &StripedMEX{stripes: make([]*stripe, numStripes)}
for i := range s.stripes {
s.stripes[i] = &stripe{
bits: make([]uint64, 1<<stripeBits/64),
}
}
return s
}
func (s *StripedMEX) Allocate() int {
for sIdx, st := range s.stripes {
st.mu.Lock()
// Scan from minFree hint
for w := st.minFree / 64; w < len(st.bits); w++ {
if st.bits[w] != ^uint64(0) {
b := bits.TrailingZeros64(^st.bits[w])
st.bits[w] |= 1 << b
st.minFree = w*64 + b + 1
st.mu.Unlock()
return sIdx*(1<<stripeBits) + w*64 + b
}
}
st.mu.Unlock()
}
return -1 // all stripes full
}
func (s *StripedMEX) Free(id int) {
sIdx := id >> stripeBits
local := id & ((1 << stripeBits) - 1)
st := s.stripes[sIdx]
st.mu.Lock()
st.bits[local/64] &^= 1 << (local % 64)
if local < st.minFree {
st.minFree = local
}
st.mu.Unlock()
}
Java: ConcurrentSkipListSet-based MEX¶
import java.util.concurrent.ConcurrentSkipListSet;
import java.util.concurrent.atomic.AtomicInteger;
public class ConcurrentMEX {
// Tracks free IDs explicitly (small set). The MEX is the first element.
private final ConcurrentSkipListSet<Integer> free = new ConcurrentSkipListSet<>();
private final AtomicInteger highWater = new AtomicInteger(0);
public int allocate() {
Integer fromFree = free.pollFirst();
if (fromFree != null) return fromFree;
return highWater.getAndIncrement();
}
public void release(int id) {
if (id == highWater.get() - 1) {
highWater.decrementAndGet();
} else {
free.add(id);
}
}
public int currentMEX() {
Integer first = free.first();
return (first != null) ? first : highWater.get();
}
}
Python: asyncio-friendly MEX with optimistic concurrency¶
import asyncio
from sortedcontainers import SortedList
class AsyncMEX:
def __init__(self, cap):
self._missing = SortedList(range(cap + 1))
self._lock = asyncio.Lock()
async def insert(self, v):
async with self._lock:
self._missing.discard(v)
async def delete(self, v):
async with self._lock:
self._missing.add(v)
async def mex(self):
async with self._lock:
return self._missing[0]
The asyncio variant trades parallelism for simplicity; for true multi-thread parallelism in Python use multiprocessing with shared memory or move to a Go/Java implementation behind a service boundary.
Lock-Free MEX Structures¶
True lock-free MEX with insert/delete/query is hard. The published approaches:
| Approach | Insert/Delete | MEX query | Memory |
|---|---|---|---|
| Harris-Michael free-range list | O(log n) expected | O(1) (read head) | O(free ranges) |
| Lock-free bitmap + CAS | O(1) per bit flip | O(n / 64) | O(n/8) bytes |
| Hazard-pointer skip list of missing | O(log n) | O(1) | O(n) |
The free-range list is the most space-efficient and the simplest to reason about. The bitmap version trades memory for predictability -- every operation is wait-free, but MEX is O(n/64), unacceptable for n > 10^6.
In practice most production systems pick striped locks instead of true lock-free. The throughput gap is small once contention is well-managed, and the engineering effort to get a lock-free version correct is large. Reserve lock-free MEX for truly latency-critical paths (sub-microsecond budgets) where every cache miss matters.
Observability¶
MEX-based components are quiet until they explode. The metrics that matter:
| Metric | What it tells you | Alert threshold |
|---|---|---|
mex.value | Current MEX | n/a (sampled) |
mex.pointer_drift | Rate of MEX increase per second | Compare against ingestion rate; large gap = missed inputs |
mex.missing_set_size | Cardinality of currently-tracked missing values | > expected baseline by 3x |
mex.query_p99_us | p99 latency of MEX queries | > 100 us |
mex.contention_ratio | Lock acquisition retries / successful acquisitions | > 0.1 |
mex.recompute_count | Number of full recomputations triggered | > 0 in steady state |
mex.shard_skew | max(per-shard local MEX) - min(per-shard local MEX) | > 10000 (skewed allocation) |
mex.threshold_overflow | Distributed MEX retries because threshold T was too small | > 5% of queries |
MEX pointer drift¶
The most diagnostic single metric. For a healthy MEX-based allocator under steady-state allocation pressure, the MEX increases at roughly the allocation rate. If MEX stalls (drift = 0) while allocations continue, you have a leak in the deallocator -- IDs are being allocated and never freed. If MEX retreats faster than allocations occur, you have a double-free.
Track drift as a windowed rate over 60 seconds; alert on stall + nonzero allocation traffic.
Missing-set size¶
For TreeSet-of-missing implementations, the missing set grows as the working set's high-water mark grows. A monotonically increasing missing-set size over hours is fine. A sudden jump (e.g., from 10k to 1M in one minute) signals a flood of new allocations or a bug in the bounded-cap logic.
Go: instrumented MEX with metrics¶
package main
import (
"sync/atomic"
"time"
)
type Metrics interface {
Histogram(name string, value float64)
Counter(name string, delta int64)
}
type ObservedMEX struct {
inner *StripedMEX
metrics Metrics
lastMEX atomic.Int64
lastSampled atomic.Int64 // unix nanos
}
func (o *ObservedMEX) Allocate() int {
start := time.Now()
id := o.inner.Allocate()
o.metrics.Histogram("mex.allocate.latency_us", float64(time.Since(start).Microseconds()))
if int64(id) > o.lastMEX.Load() {
prev := o.lastMEX.Swap(int64(id))
prevTs := o.lastSampled.Swap(time.Now().UnixNano())
if prevTs > 0 {
drift := float64(int64(id)-prev) /
(float64(time.Now().UnixNano()-prevTs) / 1e9)
o.metrics.Histogram("mex.pointer_drift_per_sec", drift)
}
}
return id
}
Failure Modes¶
| Failure | Symptom | Fix |
|---|---|---|
| ID leak | MEX advances; missing-set never grows | Audit deallocators; add finalizer-based safety net |
| Double-free | MEX retreats; same ID allocated twice | Track allocated-set with bloom filter; assert on insert collision |
| Threshold overflow on distributed MEX | Retry rate > 5% | Adaptive threshold; baseline T on observed MEX history |
| Shard skew | One shard's local MEX much higher | Rebalance allocation traffic; revisit shard key |
| Coordinator OOM on partial bitsets | Memory spike during merge | Stream-merge bitsets; cap T per call |
| Stale Grundy cache | Engine returns wrong Grundy after rule change | Version the cache key; invalidate on rule version bump |
| Lost gap detection in telemetry | MEX-stream window evicts before detection | Lengthen window; alert on window-edge eviction |
| Lock contention | Allocate latency p99 spikes under load | Switch coarse lock to striped; add backoff |
| Hash collision in concurrent map | Throughput collapses on adversarial keys | Use secure hash; or switch to skip-list |
| MEX of empty multiset returns nonzero | Initialization bug | Pre-populate missing set with [0..cap] at construction |
Capacity Planning¶
For an MEX allocator with at most N concurrent live IDs:
| Structure | Memory per ID | Notes |
|---|---|---|
| TreeSet of missing | ~50 bytes (Java) | Heavy on GC pressure |
| Segment tree on missing | 8 bytes per node, 2N nodes | Compact, cache-friendly |
| BitSet | 1 bit per slot | Cheapest; O(N/64) per MEX query |
| Sorted free-list | 16 bytes per free range | Best when allocations are clustered |
| Distributed bitset (per shard) | 1 bit per slot per shard | Multiply by replica count |
For N = 10^7 live IDs: - TreeSet: ~500 MB -- impractical - Segment tree: ~160 MB -- workable on a 32 GB box - BitSet: 1.25 MB -- trivial - Sorted free-list: depends; for a well-recycled allocator with few free ranges, < 1 MB
Production choice for high-N: BitSet for static workloads, segment tree for high-update workloads. Distributed MEX adds network cost; budget 100 us per coordinator round-trip plus per-shard local-MEX cost (typically < 10 us for a striped allocator).
Comparison with Alternatives¶
| Approach | Best for | Time/op | Concurrency story |
|---|---|---|---|
| TreeSet of missing | Small-to-medium N, single-node | O(log n) | Coarse lock |
| Segment tree on missing | Medium-to-large N, single-node | O(log n) | Coarse lock or per-leaf lock |
| BitSet + nextClearBit | Static or insert-only, single-node | O(n/64) MEX | Striped CAS |
| Striped lock allocator | High-throughput allocator | O(1) avg | Per-stripe lock |
| Lock-free free-range list | Sub-microsecond latency budget | O(log n) amortized | Hazard pointers / epoch |
| Distributed bitset + coordinator | Multi-shard, large N | O(T/64 + RTT) | Per-shard local + central merge |
| Bloom-based approximate MEX | Allocation hint only | O(1) | Eventually consistent |
Pick the simplest structure that meets your latency SLO. The default for a new service should be a coarse-locked TreeSet; only graduate to striped or lock-free under measured contention.
Summary¶
At senior level MEX is a primitive embedded in allocators, distributed Sprague-Grundy engines, telemetry pipelines, and analytical query engines. The interesting axes are concurrency (coarse lock vs striped vs lock-free), distribution (partial-MEX-with-bitset coordination), and observability (pointer drift, missing-set size, shard skew). The algorithmic invariant MEX <= n from junior.md still holds locally; the production challenge is keeping it correct across shards, across cores, and across hours of uptime.
The structure is small; the failure modes are sneaky. Instrument early, alert on drift, and prefer simple locking until measured contention forces you up the complexity ladder.