Wavelet Tree — Senior Level¶
Read time: ~55 minutes · Audience: Engineers building or operating systems on top of wavelet trees — succinct text indexes (FM-index), genomics pipelines, columnar range analytics — who need to make the structure actually small and fast in production: succinct rank/select bitvectors, the wavelet matrix reorganization, memory layout, and failure modes.
At the senior level the question is no longer "what query?" but "how do I make this fit in RAM, stay in cache, and serve millions of queries per second — and where does it break?". The naive per-node int prefix[] of the earlier levels uses O(n log σ) words (32× the bits it should). To be genuinely succinct — n log σ + o(n log σ) bits — we need real rank/select on bitvectors, and to be fast we want the wavelet matrix instead of a pointer tree. This document covers both, then the systems that depend on them.
Table of Contents¶
- Introduction — Succinctness Is the Point
- Rank/Select on Bitvectors — The Real Engine
- The Wavelet Matrix — Flatter, Faster, Cache-Friendly
- Memory Engineering and Layout
- System Design — FM-Index and Text Indexing
- System Design — Columnar Range Analytics
- Comparison with Alternatives (Production View)
- Code — Succinct Bitvector + Wavelet Matrix
- Observability
- Failure Modes
- Summary
1. Introduction — Succinctness Is the Point¶
A wavelet tree's headline claim is succinct space: it stores a sequence of n symbols over alphabet σ in n⌈log₂σ⌉ bits plus a lower-order term, while supporting access, rank, select, quantile, and range count in O(log σ). That claim only holds if two things are true in the implementation:
- The bitvectors are bit-packed (one bit per element, not one int).
rank(and ideallyselect) on those bitvectors is O(1) using onlyo(n)extra bits — not anint prefix[]that doubles or 32×'s the memory.
If you skip these, you have a correct wavelet tree that is bigger than the raw data — defeating the entire purpose. Senior work on wavelet trees is mostly about getting (1) and (2) right, then choosing the wavelet matrix layout so the bits stay in cache during a query.
The payoff is dramatic: a human genome (~3.1 Gbp over σ=4) compresses into a wavelet-tree-backed FM-index of a few hundred MB that answers "where does this 100-bp read occur?" in time proportional to the read length — the foundation of read aligners like bowtie2 and BWA and of compressed search in tools like the succinct library and SDSL.
2. Rank/Select on Bitvectors — The Real Engine¶
Everything in a wavelet tree reduces to rank1(B, i) (count of 1-bits in B[0..i)) and select1(B, j) (position of the j-th 1-bit). Doing these in O(1) with o(|B|) extra bits is a classic succinct-data-structures result (Jacobson 1989, Clark 1996, Vigna 2008).
2.1 Constant-time rank — the two-level (superblock/block) index¶
Pack B into 64-bit words. Build two precomputed arrays:
- Superblocks: every
sbits (e.g.,s = 512), store the cumulative rank up to that superblock as a full 32/64-bit integer. - Blocks: within a superblock, every 64-bit word stores the cumulative rank relative to the superblock start in a small integer (fits in
log₂ sbits).
Then:
rank1(B, i):
sb = i / 512 # superblock index
blk = (i % 512) / 64 # block (word) index within superblock
base = superblock[sb] + block[sb*8 + blk]
# popcount of the partial last word, masked to i % 64 bits
word = B.words[i / 64]
partial = popcount(word & ((1 << (i % 64)) - 1))
return base + partial
Three array reads + one hardware popcount. The extra space is superblock (n/512 × 64 bits) + block (n/64 × 9 bits) ≈ 0.2 n bits — comfortably o(n). Modern CPUs have a POPCNT instruction (Go: math/bits.OnesCount64, Java: Long.bitCount, Python: int.bit_count), so the partial popcount is one cycle.
Practical note. A common engineering choice is the "rank9" layout (Vigna): 64-bit superblock counters interleaved with 9-bit block counters, all in one cache line per 512 bits, giving rank in a single cache miss. This is what high-performance libraries ship.
2.2 Constant-time select — harder, often binary-searched¶
select1(B, j) is genuinely harder. Options, in increasing sophistication:
- Binary search on rank —
O(log n)per call. Simple; fine ifselectis rare. (This is whatmiddle.mdused.) - Sampled select — store every
t-th 1-bit's position; binary-search only within a sampled gap.O(log t)typical. - Clark's select / "select9" — true
O(1)witho(n)extra bits, but intricate (dense vs sparse subranges handled differently).
Most production wavelet trees use rank9 + a sampled select, accepting that select is slightly slower than rank. Because rank dominates text-index inner loops, this is the right trade.
2.3 The space accounting¶
| Component | Bits |
|---|---|
| Bitvectors across all levels | exactly n⌈log₂σ⌉ |
| Rank index (rank9) | ≈ 0.25 × n⌈log₂σ⌉ |
| Select samples | ≈ 0.05–0.20 × n⌈log₂σ⌉ |
| Total | n log σ · (1 + o(1)) |
That (1 + o(1)) is what "succinct" means: dominant term equals the information-theoretic content of the sequence; overhead is a constant fraction (here ~30%) that you tune.
3. The Wavelet Matrix — Flatter, Faster, Cache-Friendly¶
The wavelet matrix (Claude & Navarro, 2012) is a reorganization of the wavelet tree that keeps the same query semantics and O(log σ) cost but stores, per level, a single bitvector of length n (instead of one fragmented bitvector per node). This is faster (one contiguous bitvector per level → great locality) and removes per-node bookkeeping.
3.1 Construction¶
For each level ℓ from most-significant bit down to least:
- Write
B_ℓ[i] = (bit ℓ of value at current position)— the bit that decides left/right at that level. - Stable-sort the sequence by that bit: all
0-bit elements first (preserving order), then all1-bit elements. Recordz_ℓ= number of0-bits at level ℓ.
After all levels, the sequence is fully sorted (radix-sort by value, MSB first). You store only the per-level bitvectors B_ℓ and the counts z_ℓ.
3.2 Navigation (the matrix mapping)¶
At level ℓ, a position p:
- If
B_ℓ[p] = 0: its position at level ℓ+1 isrank0(B_ℓ, p)(the 0s stay at the front). - If
B_ℓ[p] = 1: its position at level ℓ+1 isz_ℓ + rank1(B_ℓ, p)(the 1s move past allz_ℓzeros).
Compare to the tree, where the child was a different array; here the "child" is just the same level-(ℓ+1) bitvector, with the z_ℓ offset distinguishing the two halves. That single global offset is the matrix's trick.
3.3 Queries on the matrix¶
- access(i): at each level read
B_ℓ[p], append the bit to the answer, remapp. Afterlog σlevels you have all bits ofS[i]. - rank_c(i): follow the bits of
c, mappingiwith the same rule. - quantile / range count: carry
[l, r)and map both endpoints; left-subtree count at level ℓ isrank0(B_ℓ, r) − rank0(B_ℓ, l).
Same O(log σ), but each level is a single contiguous bitvector — far fewer cache misses than chasing tree nodes. Use the wavelet matrix in production. The pointer tree is for teaching.
| Aspect | Wavelet Tree | Wavelet Matrix |
|---|---|---|
| Bitvectors | one per node (fragmented) | one per level (contiguous) |
| Per-level offset | implicit in node boundaries | explicit global z_ℓ |
| Cache behavior | medium | good |
| Query cost | O(log σ) | O(log σ) (smaller constant) |
| Code complexity | tree pointers / arithmetic | flat arrays + z[] |
4. Memory Engineering and Layout¶
- Bit-pack into 64-bit words. Never store a bit as a byte or int.
n log σbits meansn log σ / 64words per the whole matrix. - Interleave rank metadata (rank9) so each
rankis one cache line. - One allocation per level (matrix) — contiguous bitvector + its rank index — to avoid pointer chasing and fragmentation.
- mmap for read-only indexes. Text indexes are built once and queried for years; memory-map the serialized structure so the OS page cache shares it across processes.
- Coordinate compression up front sets σ = #distinct, minimizing both height (
log σ) and total bits (n log σ). - Huffman-shaped wavelet tree for skewed alphabets: give frequent symbols shorter root-to-leaf paths, lowering average query cost and total bits to
nH₀(S)(zero-order empirical entropy) instead ofn log σ. This is how compressed text indexes hit entropy bounds.
Memory budget example (human genome FM-index)¶
| Quantity | Value |
|---|---|
| n (with BWT) | ~3.1 × 10⁹ |
| σ | 4 (ACGT) + sentinel → 3 levels |
| Bitvector bits | ~3.1e9 × 2 = 6.2 Gbit ≈ 775 MB |
| + rank9 (~25%) | ≈ 970 MB total wavelet structure |
| Suffix-array samples, etc. | extra, tunable |
With Huffman shaping and 2-bit packing tricks, real aligners get this well under 1 GB.
5. System Design — FM-Index and Text Indexing¶
The FM-index (Ferragina–Manzini, 2000) is the killer application. It indexes a text T for substring search using:
- The Burrows–Wheeler Transform
BWT(T)— a permutation ofT's characters. - A wavelet tree over
BWT(T)providingrank_c(i)inO(log σ). - A small C[] table:
C[c]= number of characters inTsmaller thanc.
5.1 Backward search (the LF-mapping loop)¶
To count occurrences of a pattern P[0..m), scan P right to left, maintaining a half-open suffix-array interval [sp, ep):
sp, ep = 0, n
for k = m-1 down to 0:
c = P[k]
sp = C[c] + rank_c(sp) # rank on the wavelet tree over BWT
ep = C[c] + rank_c(ep)
if sp >= ep: return 0 # pattern absent
return ep - sp # number of occurrences
Each step is one rank_c on the wavelet tree = O(log σ), so searching for a length-m pattern is O(m log σ) — independent of the text length n. That is the entire reason genomics and full-text search use this: search time scales with the query, not the corpus, in near-compressed space.
5.2 Locating and extracting¶
- Locate (where, not just how many): sample the suffix array every
tpositions; for an interval entry, LF-step until you hit a sampled position. Trade space (n/tsamples) for locate time (O(t log σ)). - Extract a substring
T[i..j): useaccesson the wavelet tree plus LF-mapping to walk the BWT.
5.3 Where wavelet trees show up in real systems¶
- Read aligners (BWA, bowtie2) — DNA/RNA short-read mapping.
- Compressed full-text search —
SDSL-lite,succinct,rust-bio,vg(variation graphs). - Self-indexes — the index is the (compressed) data; no separate copy of the text.
- Pangenomics / metagenomics — indexing huge reference collections.
6. System Design — Columnar Range Analytics¶
Outside text, wavelet trees serve immutable analytical columns:
- Quantize a numeric column (latency, price, score) into σ buckets, coordinate-compress to σ = #distinct, and build a wavelet matrix over the row order.
quantile(l, r, k)→ "p50/p95/p99 of this column over a row range (e.g., a time window)" without scanning.rangeCount(l, r, lo, hi)→ "how many rows in this window fall in[lo, hi]".rank_c(r) − rank_c(l)→ "how many rows equal valuecin this window".
Because wavelet matrices are static, the architecture is immutable snapshot + rebuild: ingest appends, periodically you seal a snapshot and build (or merge) wavelet matrices offline, then atomically swap the served version. This mirrors LSM-style segment lifecycles (../04-lsm-tree/) and persistent/immutable design (../11-persistent-segment-tree/).
7. Comparison with Alternatives (Production View)¶
| Attribute | Wavelet Matrix | Persistent Segment Tree | Merge Sort Tree |
|---|---|---|---|
| Range k-th | O(log σ) | O(log σ) | O(log² n)–O(log³ n) |
| rank_c / select_c | O(log σ) | not native | not native |
| Space | n log σ (1+o(1)) bits | O(n log σ) pointer nodes (~32× bits) | O(n log n) ints |
| Persistence | No | Yes | No |
| Mutability | rebuild snapshot | append new version | rebuild |
| Cache locality | excellent (flat) | poor (pointers) | good |
| Text indexing (FM) | yes (core) | no | no |
| Build | O(n log σ) | O(n log σ) | O(n log n) |
| Production niche | compressed indexes, immutable analytics columns | as-of/version queries, large σ | quick offline range stats |
Senior takeaways: - For compressed text / genomics, the wavelet tree/matrix is essentially mandatory (FM-index). - For immutable analytics columns with small σ, the wavelet matrix gives the best space and flat-in-n latency. - For versioned ("as of time T") order statistics or σ ≈ n, choose the persistent segment tree (../11-persistent-segment-tree/). - For a quick one-off offline range stat, a merge sort tree is least effort.
8. Code — Succinct Bitvector + Wavelet Matrix¶
A production-shaped, bit-packed succinct bitvector (block rank) plus a wavelet matrix. Rank is O(1) via word popcount over precomputed block sums; we omit a full select index for brevity (binary-search fallback).
Go¶
package wm
import "math/bits"
// BitVector is a bit-packed vector with O(1) rank via per-block (64-bit word) prefix sums.
type BitVector struct {
words []uint64
rankS []int32 // rankS[w] = #1-bits in words[0..w)
n int
}
func NewBitVector(n int) *BitVector {
nw := (n + 63) / 64
return &BitVector{words: make([]uint64, nw), rankS: make([]int32, nw+1), n: n}
}
func (b *BitVector) Set(i int) { b.words[i>>6] |= 1 << uint(i&63) }
func (b *BitVector) Get(i int) int { return int((b.words[i>>6] >> uint(i&63)) & 1) }
// Build the rank index after all Set calls.
func (b *BitVector) Build() {
acc := int32(0)
for w := range b.words {
b.rankS[w] = acc
acc += int32(bits.OnesCount64(b.words[w]))
}
b.rankS[len(b.words)] = acc
}
// Rank1 returns #1-bits in [0, i).
func (b *BitVector) Rank1(i int) int {
w := i >> 6
base := int(b.rankS[w])
if r := i & 63; r > 0 {
base += bits.OnesCount64(b.words[w] & ((1 << uint(r)) - 1))
}
return base
}
func (b *BitVector) Rank0(i int) int { return i - b.Rank1(i) }
func (b *BitVector) Zeros() int { return b.n - int(b.rankS[len(b.words)]) }
// Matrix is a wavelet matrix over alphabet [0, 1<<bitsLen).
type Matrix struct {
bv []*BitVector
z []int // z[l] = #zeros at level l
bitsLen int
n int
}
// NewMatrix builds a wavelet matrix; values in [0, 1<<bitsLen).
func NewMatrix(data []int, bitsLen int) *Matrix {
n := len(data)
m := &Matrix{bitsLen: bitsLen, n: n, bv: make([]*BitVector, bitsLen), z: make([]int, bitsLen)}
cur := append([]int(nil), data...)
for level := 0; level < bitsLen; level++ {
shift := uint(bitsLen - 1 - level) // MSB first
bv := NewBitVector(n)
for i, x := range cur {
if (x>>shift)&1 == 1 {
bv.Set(i)
}
}
bv.Build()
m.bv[level] = bv
m.z[level] = bv.Zeros()
// stable partition: zeros first, then ones
next := make([]int, 0, n)
for i, x := range cur {
if bv.Get(i) == 0 {
next = append(next, x)
}
}
for i, x := range cur {
if bv.Get(i) == 1 {
next = append(next, x)
}
}
_ = next
cur = next
}
return m
}
// Access returns data[i].
func (m *Matrix) Access(i int) int {
val := 0
for level := 0; level < m.bitsLen; level++ {
b := m.bv[level]
if b.Get(i) == 0 {
i = b.Rank0(i)
} else {
val |= 1 << uint(m.bitsLen-1-level)
i = m.z[level] + b.Rank1(i)
}
}
return val
}
// RangeCount: #elements in [l, r) with value < x (single descent).
func (m *Matrix) CountLess(l, r, x int) int {
cnt := 0
for level := 0; level < m.bitsLen; level++ {
b := m.bv[level]
bit := (x >> uint(m.bitsLen-1-level)) & 1
l0, r0 := b.Rank0(l), b.Rank0(r)
if bit == 1 {
cnt += r0 - l0 // all left (0-bit) elements are < x here
l = m.z[level] + (l - l0)
r = m.z[level] + (r - r0)
} else {
l, r = l0, r0
}
}
return cnt
}
// Quantile: k-th smallest (0-indexed) in [l, r).
func (m *Matrix) Quantile(l, r, k int) int {
val := 0
for level := 0; level < m.bitsLen; level++ {
b := m.bv[level]
l0, r0 := b.Rank0(l), b.Rank0(r)
cntLeft := r0 - l0
if k < cntLeft {
l, r = l0, r0
} else {
k -= cntLeft
val |= 1 << uint(m.bitsLen-1-level)
l = m.z[level] + (l - l0)
r = m.z[level] + (r - r0)
}
}
return val
}
Java¶
public final class WaveletMatrix {
static final class BitVector {
final long[] words;
final int[] rankS; // prefix popcount per word
final int n;
BitVector(int n) { this.n = n; words = new long[(n + 63) >> 6]; rankS = new int[words.length + 1]; }
void set(int i) { words[i >> 6] |= 1L << (i & 63); }
int get(int i) { return (int) ((words[i >> 6] >>> (i & 63)) & 1); }
void build() {
int acc = 0;
for (int w = 0; w < words.length; w++) { rankS[w] = acc; acc += Long.bitCount(words[w]); }
rankS[words.length] = acc;
}
int rank1(int i) {
int w = i >> 6, base = rankS[w], r = i & 63;
if (r > 0) base += Long.bitCount(words[w] & ((1L << r) - 1));
return base;
}
int rank0(int i) { return i - rank1(i); }
int zeros() { return n - rankS[words.length]; }
}
private final BitVector[] bv;
private final int[] z;
private final int bitsLen, n;
public WaveletMatrix(int[] data, int bitsLen) {
this.bitsLen = bitsLen; this.n = data.length;
bv = new BitVector[bitsLen]; z = new int[bitsLen];
int[] cur = data.clone();
for (int level = 0; level < bitsLen; level++) {
int shift = bitsLen - 1 - level;
BitVector b = new BitVector(n);
for (int i = 0; i < n; i++) if (((cur[i] >> shift) & 1) == 1) b.set(i);
b.build(); bv[level] = b; z[level] = b.zeros();
int[] next = new int[n]; int p = 0;
for (int i = 0; i < n; i++) if (b.get(i) == 0) next[p++] = cur[i];
for (int i = 0; i < n; i++) if (b.get(i) == 1) next[p++] = cur[i];
cur = next;
}
}
public int access(int i) {
int val = 0;
for (int level = 0; level < bitsLen; level++) {
BitVector b = bv[level];
if (b.get(i) == 0) i = b.rank0(i);
else { val |= 1 << (bitsLen - 1 - level); i = z[level] + b.rank1(i); }
}
return val;
}
public int countLess(int l, int r, int x) {
int cnt = 0;
for (int level = 0; level < bitsLen; level++) {
BitVector b = bv[level];
int bit = (x >> (bitsLen - 1 - level)) & 1;
int l0 = b.rank0(l), r0 = b.rank0(r);
if (bit == 1) { cnt += r0 - l0; l = z[level] + (l - l0); r = z[level] + (r - r0); }
else { l = l0; r = r0; }
}
return cnt;
}
public int quantile(int l, int r, int k) {
int val = 0;
for (int level = 0; level < bitsLen; level++) {
BitVector b = bv[level];
int l0 = b.rank0(l), r0 = b.rank0(r), cntLeft = r0 - l0;
if (k < cntLeft) { l = l0; r = r0; }
else { k -= cntLeft; val |= 1 << (bitsLen - 1 - level); l = z[level] + (l - l0); r = z[level] + (r - r0); }
}
return val;
}
}
Python¶
class _BitVector:
__slots__ = ("words", "rankS", "n")
def __init__(self, n):
self.n = n
self.words = [0] * ((n + 63) // 64)
self.rankS = [0] * (len(self.words) + 1)
def set(self, i):
self.words[i >> 6] |= 1 << (i & 63)
def get(self, i):
return (self.words[i >> 6] >> (i & 63)) & 1
def build(self):
acc = 0
for w in range(len(self.words)):
self.rankS[w] = acc
acc += self.words[w].bit_count()
self.rankS[len(self.words)] = acc
def rank1(self, i):
w = i >> 6
base = self.rankS[w]
r = i & 63
if r:
base += (self.words[w] & ((1 << r) - 1)).bit_count()
return base
def rank0(self, i):
return i - self.rank1(i)
def zeros(self):
return self.n - self.rankS[-1]
class WaveletMatrix:
"""Succinct wavelet matrix over alphabet [0, 1<<bits_len)."""
def __init__(self, data, bits_len):
self.bits_len = bits_len
self.n = len(data)
self.bv = []
self.z = []
cur = list(data)
for level in range(bits_len):
shift = bits_len - 1 - level
b = _BitVector(self.n)
for i, x in enumerate(cur):
if (x >> shift) & 1:
b.set(i)
b.build()
self.bv.append(b)
self.z.append(b.zeros())
zeros = [x for x in cur if not ((x >> shift) & 1)]
ones = [x for x in cur if (x >> shift) & 1]
cur = zeros + ones
def access(self, i):
val = 0
for level in range(self.bits_len):
b = self.bv[level]
if b.get(i) == 0:
i = b.rank0(i)
else:
val |= 1 << (self.bits_len - 1 - level)
i = self.z[level] + b.rank1(i)
return val
def count_less(self, l, r, x):
cnt = 0
for level in range(self.bits_len):
b = self.bv[level]
bit = (x >> (self.bits_len - 1 - level)) & 1
l0, r0 = b.rank0(l), b.rank0(r)
if bit == 1:
cnt += r0 - l0
l = self.z[level] + (l - l0)
r = self.z[level] + (r - r0)
else:
l, r = l0, r0
return cnt
def quantile(self, l, r, k):
val = 0
for level in range(self.bits_len):
b = self.bv[level]
l0, r0 = b.rank0(l), b.rank0(r)
cnt_left = r0 - l0
if k < cnt_left:
l, r = l0, r0
else:
k -= cnt_left
val |= 1 << (self.bits_len - 1 - level)
l = self.z[level] + (l - l0)
r = self.z[level] + (r - r0)
return val
9. Observability¶
| Metric | Why it matters | Alert threshold |
|---|---|---|
bits_per_symbol (structure size / n) | Detects regressions where overhead creeps above succinct | > 1.5 × log₂σ |
rank_index_overhead_ratio | Tracks o(n) term staying small | > 0.35 |
query_latency_p99 (quantile/rank) | Should be flat in n | > 2 µs |
cache_miss_per_query | Wavelet matrix should be ~log σ lines | > 2 × log σ |
build_throughput (MB/s) | Rebuild/snapshot SLOs | < target for snapshot window |
alphabet_size σ | Sudden growth (failed compression) inflates height | unexpected jump |
Because queries are flat in n, a latency that grows with data size is a red flag — usually a position-bound structure crept in, or rank degraded to linear scan.
10. Failure Modes¶
- σ explosion (no coordinate compression). If raw values are used and σ ≈ n, height becomes
log nand bits balloon ton log n. Fix: always coordinate-compress; monitor σ. - Rank degraded to O(log n) or O(n). Forgetting to build the rank index, or using binary search where O(1) rank was assumed, makes every query slower by a
logfactor. Fix: build rank9; benchmark singlerankin isolation. - Memory bigger than the data. Storing
int prefix[]per node (the teaching layout) in production. Fix: bit-pack; switch to the wavelet matrix. - Cache thrashing on the pointer tree. Fragmented per-node bitvectors miss cache repeatedly. Fix: use the matrix (one contiguous bitvector per level).
- Update attempts. Treating the structure as mutable corrupts it. Fix: immutable snapshot + offline rebuild/merge; or a dedicated dynamic wavelet tree (rare, slow).
- Select assumed O(1) but is binary search. A select-heavy workload (e.g., locate in FM-index without SA samples) tanks. Fix: add sampled/Clark select, or suffix-array samples for locate.
- Skewed alphabet wasting depth. Balanced split wastes levels on rare symbols. Fix: Huffman-shaped wavelet tree for entropy-bound space and average cost.
11. Summary¶
- Succinctness is the whole point. A wavelet tree is only worthwhile in production if its bitvectors are bit-packed and
rankis O(1) via a two-level (rank9) index usingo(n)extra bits — totaln log σ (1+o(1))bits. - The wavelet matrix is the production layout: one contiguous bitvector per level with a global
z_ℓzero-count offset, giving the sameO(log σ)queries with far better cache locality. Ship the matrix; teach the tree. - The FM-index is the headline system: a wavelet tree over
BWT(T)plus aC[]table answers substring search inO(m log σ)— time scaling with the query, not the corpus — in near-compressed space, powering BWA/bowtie2 and compressed full-text search. - For immutable analytics columns with small σ, the wavelet matrix gives flat-in-n p50/p95/p99 and range-count queries; manage it as immutable snapshots + offline rebuild.
- Choose the persistent segment tree (
../11-persistent-segment-tree/) for versioned/as-of queries or σ ≈ n; choose the merge sort tree for quick offline range stats. - Watch bits-per-symbol, rank overhead, and flat-in-n latency in production; σ explosion, un-built rank, and pointer-tree cache misses are the classic failures.
Next step: professional.md — the O(log σ) query proofs, the succinct-space theorem (n log σ + o(·) bits), rank/select lower/upper bounds, and a rigorous complexity comparison.
In this topic
- junior
- middle
- senior
- professional