Tries (Prefix Trees) for String Processing — Senior Level¶
One-line summary: At production scale, a trie is a memory-engineering problem first and an algorithm second. This file covers array vs map vs double-array tries, DAWG / MA-FSA minimization, IP longest-prefix-match routing tables, large-dictionary loading and persistence, testing strategies, and the failure modes that bite real systems.
Table of Contents¶
- Introduction
- Memory Engineering: Array vs Map vs Double-Array
- DAWG and MA-FSA: Sharing Suffixes Too
- Longest-Prefix Match for IP Routing
- Large Dictionaries: Loading, Memory, Cache
- Persistence and Serialization
- Concurrency and Immutability
- Testing Strategy
- Failure Modes and Operations
- Code Examples
- Coding Patterns
- Performance Tips
- Edge Cases & Pitfalls
- Common Mistakes
- Cheat Sheet
- Visual Animation
- Summary
- Further Reading
Introduction¶
A trie that holds a few thousand words in a demo behaves nothing like a trie that holds a 100-million-entry routing table, a 5-million-word spell-check dictionary, or an autocomplete index serving 50k queries per second. At that scale the algorithm (walk L edges) is trivially correct and fast; the hard problems are memory, load time, cache behavior, persistence, and concurrency. A pointer-per-node trie with map[char]node children can easily consume 10–20× the size of the raw key data, blowing your memory budget and trashing the CPU cache.
This file is about the engineering. We compare the three canonical node representations and introduce the double-array trie (DAT) — the representation used by production tokenizers (MeCab, Kuromoji) and IME dictionaries — which stores the whole trie in two flat integer arrays with near-zero per-node overhead and excellent cache locality. We cover the DAWG / MA-FSA (directed acyclic word graph / minimal acyclic finite-state automaton), which shares suffixes as well as prefixes, collapsing a dictionary into the minimal DFA recognizing it. We work through IP longest-prefix match, the routing-table application where the deepest matching prefix wins. We discuss loading multi-gigabyte dictionaries, memory-mapping them, persisting them, making them safe under concurrent reads, and the failure modes (alphabet assumptions, deletion corruption, unbounded memory) that take production tries down.
The senior mindset: pick the representation from the workload. Read-heavy and static? Build a DAWG or double-array trie offline and mmap it. Write-heavy and small? A plain map trie is fine. Longest-prefix routing? A bit-trie / radix structure tuned for the IP-address alphabet.
Memory Engineering: Array vs Map vs Double-Array¶
| Representation | Per-node cost | Lookup | Build | Mutable? | Best for |
|---|---|---|---|---|---|
Array [Σ]*node | Σ pointers always | O(1) index | easy | yes | small fixed alphabet, dense |
Map char→node | hash entries only | O(1) hash | easy | yes | sparse / Unicode |
| List-of-pairs | deg pairs | O(deg) scan | easy | yes | tiny degree nodes |
| Double-array | ~2 ints/node | O(1) per char | hard (offline) | painful | static, read-heavy, huge |
| DAWG / MA-FSA | minimal states | O(1) per char | medium | no (build once) | static dictionaries |
Double-array trie (DAT). Two parallel integer arrays, base[] and check[]. For a state s and input character c, the next state is t = base[s] + code(c), valid only if check[t] == s. This encodes the entire trie transition table in flat arrays — no per-node objects, no pointers, tiny memory, sequential memory access. The catch: insertion may require relocating a node's whole child block to find a non-colliding base, so DATs are built once from a sorted key set and treated as read-only. They are the workhorse of Japanese/CJK tokenizers and large IME dictionaries precisely because the alphabet is huge and the dictionary is static.
The decision is workload-driven: mutable + small → map; mutable + small alphabet + dense → array; static + huge + read-heavy → double-array or DAWG.
DAWG and MA-FSA: Sharing Suffixes Too¶
A trie shares prefixes. A DAWG (directed acyclic word graph), equivalently a minimal acyclic finite-state automaton (MA-FSA), also shares suffixes by merging equivalent subtrees. The words nation, station, ration all end in ation; a trie stores that suffix three times, a DAWG stores it once.
Construction: build the trie (or build incrementally), then minimize by merging states that have identical outgoing transitions and identical isEnd status — exactly the DFA-minimization equivalence used in automata theory (see professional.md). The result is the unique minimal DFA recognizing the finite language of your dictionary. Daciuk's algorithm builds the minimal automaton incrementally from sorted input in linear time.
| Trie | DAWG / MA-FSA | |
|---|---|---|
| Shares | prefixes | prefixes and suffixes |
| Structure | tree | DAG |
| Size | O(total chars) | minimal DFA states (often far smaller) |
| Mutable | yes | build-once |
| Use | autocomplete, prefix queries | static membership, spell-check, compression |
Trade-off: a DAWG gives the smallest membership structure but, because subtrees are shared, you lose the ability to attach distinct per-word data to a node by identity (two words sharing a suffix share the node). If you need per-word values, use a transducer variant (FST — finite-state transducer, as in Lucene) that emits an output along the path, or keep a separate value array keyed by a perfect-hash ordinal computed during the walk.
Longest-Prefix Match for IP Routing¶
A router has a forwarding table of (prefix, next-hop) rules like 10.0.0.0/8 → A, 10.1.0.0/16 → B, 10.1.2.0/24 → C. For a destination address it must choose the most specific (longest) matching prefix. This is longest-prefix match (LPM), and a binary trie over the address bits is the classic structure.
Walk the address bit by bit (MSB first) down a binary trie.
Each node may carry a next-hop (a prefix ends there).
Remember the deepest next-hop seen on the path; that is the answer.
- Each prefix
p/lenis a path oflenbits from the root; mark the node with its next-hop. - To route a 32-bit (IPv4) or 128-bit (IPv6) address, descend bit by bit; the last marked node on the descent is the longest match.
- A default route
0.0.0.0/0is the root mark — the fallback when nothing deeper matches.
The plain bit-trie is O(W) per lookup (W = address width, 32 or 128). Production routers compress it: path-compressed (Patricia) tries, multibit tries (consume several bits per node to cut depth), and LC-tries / Lulea / DIR-24-8 schemes trade memory for fewer memory accesses. This is the same path-compression idea as the radix trie (sibling 21/24), specialized to the fixed-width binary alphabet. The integer/XOR bit-trie machinery itself lives in sibling 18-bit-manipulation/06; here the point is the longest-prefix-match semantics — deepest mark wins, with a fallback default.
Large Dictionaries: Loading, Memory, Cache¶
- Build from sorted input. Inserting keys in sorted order improves locality during build and is required for incremental DAWG/DAT construction.
- Estimate memory before building. A map-trie node in a managed runtime can be 64–128 bytes plus the map's bucket array. For 50M nodes that is multiple gigabytes — measure, do not guess. Use the space bounds from
professional.md. - Prefer compact representations at scale. A double-array trie or DAWG can be 5–20× smaller than a pointer trie and lives in contiguous memory, so it stays cache-resident.
- Intern / pool nodes if you must use objects, to cut allocator overhead and fragmentation.
- Cache locality dominates. A pointer-chasing trie incurs a cache miss per character at depth; a flat double-array trie streams sequentially. At scale, cache misses, not instructions, set your latency.
- Lazy / chunked loading. For huge static dictionaries, mmap the serialized structure and let the OS page it in on demand rather than parsing it into the heap.
Persistence and Serialization¶
A trie built offline should be serialized so it can be loaded without rebuilding:
- Flat-array formats serialize trivially. A double-array trie is two integer arrays — write them to disk and
mmapthem back. No parsing, instant load, shared across processes. - Pointer tries need a layout pass. Assign each node an ordinal, write nodes in a fixed format with child references as ordinals (not pointers), then reconstruct on load. Or do a structural DFS dump.
- Versioning. Embed an alphabet/codec version and a format version in the header; a dictionary built with one character encoding must not be loaded under another.
- mmap + read-only. Memory-map the file and treat it as immutable. Multiple processes share one physical copy; the OS handles paging. This is how FST-backed search indices (Lucene) ship terms dictionaries.
- Checksums. A truncated or corrupted dictionary file should fail loudly at load (CRC/length check), not silently route packets to the wrong place.
Concurrency and Immutability¶
- Read-mostly is the common case. Build the trie once, then serve concurrent reads. An immutable trie needs no locks — readers traverse a structure that never changes.
- Copy-on-write for updates. To update an immutable trie, build new nodes along the changed path and share the rest (a persistent/functional trie), then atomically swap the root pointer. Readers see either the old or new root, never a torn state. This is the HAMT idea (hash array mapped trie) applied to string keys.
- Avoid in-place mutation under concurrent reads. Inserting into a shared mutable trie without synchronization is a data race; a reader can observe a half-linked node.
- Sharding. Partition keys by first character (or hash) across independent tries to spread write contention and parallelize builds.
- Double-buffering. Keep an active dictionary serving reads while a background thread builds the next version, then flip an atomic pointer — zero-downtime dictionary reloads.
Testing Strategy¶
- Oracle comparison. Test
search/startsWith/countPrefix/autocomplete against a brute-force list-of-strings implementation on random inputs (property-based testing). They must agree for every query. - Round-trip serialization. Build a random trie, serialize, deserialize, and assert identical query answers — catches layout/encoding bugs.
- Deletion invariants. After a delete, assert: the word is gone, all other words still present, no orphaned nodes, and
passCountequals an independent recount. - Alphabet fuzzing. Feed Unicode, empty strings, very long keys, keys that are prefixes of each other, and (for array tries) characters outside the assumed range — assert graceful handling.
- Memory regression. Track node count and bytes-per-key in CI; a refactor that doubles memory should fail the budget.
- LPM correctness. For routing tries, test that the longest matching prefix wins, the default route is the fallback, and overlapping prefixes resolve to the most specific.
Failure Modes and Operations¶
| Failure | Symptom | Mitigation |
|---|---|---|
| Array-trie alphabet overflow | crash / corruption on out-of-range char | validate input; use map for arbitrary alphabets |
| Deletion corrupts shared paths | other words vanish or search lies | prune only nodes with no children and passCount==0 |
| Unbounded memory growth | OOM as dictionary grows | use radix/DAWG/double-array; budget bytes-per-key |
| Cache thrash on deep keys | latency spikes under load | flat layout (DAT), shorten depth (multibit/radix) |
| Stale dictionary after reload | wrong answers post-deploy | atomic root/pointer swap, version header, checksum |
| Data race on concurrent insert | torn nodes, intermittent wrong reads | immutable + copy-on-write, or proper locking |
| Wrong char encoding on load | garbled keys, false negatives | embed codec version; fail on mismatch |
| LPM returns wrong route | packets misrouted | test deepest-mark-wins + default fallback explicitly |
Code Examples¶
Example: Longest-prefix match over a binary trie (IP routing core)¶
Go¶
package main
import "fmt"
type RNode struct {
child [2]*RNode
nextHop string
hasHop bool
}
type RouteTrie struct{ root *RNode }
func NewRouteTrie() *RouteTrie { return &RouteTrie{root: &RNode{}} }
// Insert prefix given as its high bits and length, with a next hop.
func (rt *RouteTrie) Insert(prefix uint32, length int, hop string) {
cur := rt.root
for i := 0; i < length; i++ {
bit := (prefix >> (31 - i)) & 1
if cur.child[bit] == nil {
cur.child[bit] = &RNode{}
}
cur = cur.child[bit]
}
cur.nextHop = hop
cur.hasHop = true
}
// Lookup returns the next hop of the longest matching prefix.
func (rt *RouteTrie) Lookup(addr uint32) (string, bool) {
cur := rt.root
best, found := "", false
if cur.hasHop { // default route 0.0.0.0/0
best, found = cur.nextHop, true
}
for i := 0; i < 32 && cur != nil; i++ {
bit := (addr >> (31 - i)) & 1
cur = cur.child[bit]
if cur != nil && cur.hasHop {
best, found = cur.nextHop, true // deepest mark wins
}
}
return best, found
}
func ip(a, b, c, d uint32) uint32 { return a<<24 | b<<16 | c<<8 | d }
func main() {
rt := NewRouteTrie()
rt.Insert(ip(10, 0, 0, 0), 8, "A")
rt.Insert(ip(10, 1, 0, 0), 16, "B")
rt.Insert(ip(10, 1, 2, 0), 24, "C")
rt.Insert(0, 0, "DEFAULT")
fmt.Println(rt.Lookup(ip(10, 1, 2, 99))) // C true (most specific)
fmt.Println(rt.Lookup(ip(10, 1, 9, 9))) // B true
fmt.Println(rt.Lookup(ip(8, 8, 8, 8))) // DEFAULT true
}
Java¶
public class RouteTrie {
static class Node {
Node[] child = new Node[2];
String nextHop;
boolean hasHop;
}
private final Node root = new Node();
public void insert(long prefix, int length, String hop) {
Node cur = root;
for (int i = 0; i < length; i++) {
int bit = (int) ((prefix >> (31 - i)) & 1);
if (cur.child[bit] == null) cur.child[bit] = new Node();
cur = cur.child[bit];
}
cur.nextHop = hop;
cur.hasHop = true;
}
public String lookup(long addr) {
Node cur = root;
String best = null;
if (cur.hasHop) best = cur.nextHop; // default route
for (int i = 0; i < 32 && cur != null; i++) {
int bit = (int) ((addr >> (31 - i)) & 1);
cur = cur.child[bit];
if (cur != null && cur.hasHop) best = cur.nextHop; // deepest wins
}
return best;
}
static long ip(long a, long b, long c, long d) {
return (a << 24) | (b << 16) | (c << 8) | d;
}
public static void main(String[] args) {
RouteTrie rt = new RouteTrie();
rt.insert(ip(10, 0, 0, 0), 8, "A");
rt.insert(ip(10, 1, 0, 0), 16, "B");
rt.insert(ip(10, 1, 2, 0), 24, "C");
rt.insert(0, 0, "DEFAULT");
System.out.println(rt.lookup(ip(10, 1, 2, 99))); // C
System.out.println(rt.lookup(ip(10, 1, 9, 9))); // B
System.out.println(rt.lookup(ip(8, 8, 8, 8))); // DEFAULT
}
}
Python¶
class RNode:
__slots__ = ("child", "next_hop", "has_hop")
def __init__(self):
self.child = [None, None]
self.next_hop = None
self.has_hop = False
class RouteTrie:
def __init__(self):
self.root = RNode()
def insert(self, prefix, length, hop):
cur = self.root
for i in range(length):
bit = (prefix >> (31 - i)) & 1
if cur.child[bit] is None:
cur.child[bit] = RNode()
cur = cur.child[bit]
cur.next_hop = hop
cur.has_hop = True
def lookup(self, addr):
cur = self.root
best = None
if cur.has_hop: # default route
best = cur.next_hop
for i in range(32):
if cur is None:
break
bit = (addr >> (31 - i)) & 1
cur = cur.child[bit]
if cur is not None and cur.has_hop:
best = cur.next_hop # deepest mark wins
return best
def ip(a, b, c, d):
return (a << 24) | (b << 16) | (c << 8) | d
if __name__ == "__main__":
rt = RouteTrie()
rt.insert(ip(10, 0, 0, 0), 8, "A")
rt.insert(ip(10, 1, 0, 0), 16, "B")
rt.insert(ip(10, 1, 2, 0), 24, "C")
rt.insert(0, 0, "DEFAULT")
print(rt.lookup(ip(10, 1, 2, 99))) # C
print(rt.lookup(ip(10, 1, 9, 9))) # B
print(rt.lookup(ip(8, 8, 8, 8))) # DEFAULT
Run: go run main.go | javac RouteTrie.java && java RouteTrie | python route.py
Capacity Planning and Memory Budgeting¶
Before you ship a trie at scale, estimate its footprint — guessing leads to OOM in production. Use concrete per-node accounting tied to the representation.
Per-node cost by representation (rough, managed runtime).
| Representation | Bytes per node (approx) | 50M nodes |
|---|---|---|
| Map children (HashMap/dict) | object header + map (buckets, entries) ≈ 80–150 B | 4–7.5 GB |
| Array[26] of pointers | header + 26 × 8 B ≈ 220 B | ~11 GB |
| List-of-pairs (small nodes) | header + deg × (1 + 8) B | varies with degree |
| Double-array trie | ~8 B (one base + one check int) | ~0.4 GB |
| DAWG / MA-FSA | minimal states × ~8–16 B | often < 0.3 GB |
The lesson is stark: a pointer/map trie can be 10–25× larger than a flat double-array trie or DAWG for the same key set. For a 5M-word English dictionary (~m ≈ 45M characters), a naive map trie may need 3–5 GB while a DAWG fits in tens of MB because English words share enormous prefix and suffix structure.
Estimating node count. From professional.md Theorem 2, character-trie nodes = |distinct prefixes|, bounded by 1 + m. Measure the real number on a sample: nodes = countDistinctPrefixes(sample), then extrapolate linearly. For a radix trie, Theorem 3 caps nodes at 2n − 1, so nodes ≈ 2n is a safe upper bound regardless of key length.
Bytes-per-key as a CI metric. Track total_bytes / n and alert on regressions. A refactor that swaps an array node for a map node, or forgets __slots__ in Python, can silently double this. Put a hard budget in the test suite so memory blowups fail the build, not production.
When the dictionary does not fit in RAM. Memory-map a serialized double-array trie or FST; the OS pages in only the touched nodes. For truly massive routing/lookup tables, shard by first character or prefix across machines and route the query to the owning shard — the same partition-by-prefix idea used for distributed key-value stores.
Real-World Systems Using Tries¶
Knowing where tries actually run in production sharpens the engineering intuition for which variant to pick.
- Redis
rax(radix tree). Redis stores its keyspace cluster routing, stream consumer-group metadata, and other internal indexes in a compact radix tree (rax.c) rather than pointer-per-char tries, precisely for theO(n)-node memory win on long keys. - Linux kernel IP routing. The kernel's FIB (forwarding information base) uses an LC-trie / level-compressed trie variant (
fib_trie) for longest-prefix match on routes — the LPM structure from this file, tuned with level compression to cut memory accesses per lookup. - Lucene / Elasticsearch terms dictionary. The inverted index's term dictionary is an FST (finite-state transducer) — a DAWG that emits an output (the term's posting-list offset) along the accepting path. This gives minimal size and per-term payloads, exactly the "DAWG with payloads" trade-off discussed above.
- CJK tokenizers (MeCab, Kuromoji, Sudachi). Morphological analyzers store their lexicons in double-array tries because the alphabet (kanji/kana) is huge, the dictionary is static, and lookups are on the hot path of every tokenization.
- Browser/IDE autocomplete and IME. Typeahead engines keep a trie (often with precomputed per-node top-k) so each keystroke yields suggestions in
O(L + k). - DNS and URL filtering. Domain blocklists and CDN routing use suffix/prefix tries over reversed domain labels to match
*.example.comrules by longest suffix.
Across all of these the pattern repeats: static + read-heavy + large → compact flat structure (double-array / DAWG / FST), mmap-loaded, atomically reloaded. Mutable, small, or write-heavy workloads keep the simple pointer/map trie.
Observability and Operating a Trie Service¶
A trie powering autocomplete or routing is a service component; treat it like one with metrics, alerts, and runbooks.
Metrics to emit.
trie_nodesandtrie_bytes— gauge the structure size; alert if it grows beyond the capacity budget (OOM precursor).trie_bytes_per_key— derived; a sudden jump signals a representation regression (array vs map, lost compression).lookup_latency_p50 / p99— autocomplete and LPM are on the request hot path; p99 spikes usually mean cache thrash from a too-large pointer trie.autocomplete_subtree_sizehistogram — a few very-short prefixes (a,s) have huge subtrees; cap the DFS and track how often the cap is hit.dictionary_versionandreload_count— confirm the served dictionary matches the deployed one after a reload.cache_miss_rate(if profiled) — the real driver of trie latency at scale, more than instruction count.
Runbook: slow autocomplete. Check autocomplete_subtree_size first — an uncapped DFS on a 1-character prefix can walk millions of nodes. Mitigate by capping output (k) and/or precomputing per-node top-k. If subtree sizes are normal but latency is high, suspect cache misses from a bloated pointer trie; switch to a flat representation.
Runbook: memory alert. Inspect trie_bytes_per_key. If it regressed, diff the node representation; if it is steady but total grew, the dictionary genuinely grew — move to a DAWG/double-array or shard. Never "fix" memory by silently dropping keys, which corrupts query correctness.
Runbook: wrong answers after deploy. Compare dictionary_version to the build artifact; a stale mmap or a failed atomic swap serves the old dictionary. Verify the checksum on load and that the reload flipped the root pointer. For routing tries, additionally assert the default route exists.
Graceful degradation. If the dictionary fails to load (corruption, checksum mismatch), keep serving the previous good version rather than starting empty — an autocomplete that returns last-known suggestions beats one that returns nothing, and a router with the last-known table beats one that drops all packets.
Migration and Versioning of Dictionaries¶
A long-lived trie service ships new dictionaries constantly (new words, new routes, new product names). Treat each dictionary like a deployable artifact with a lifecycle.
Build pipeline.
- Collect and normalize the key set (lowercasing, Unicode NFC, trimming) — the same normalization must run at query time, or you get silent false negatives.
- Sort keys (required for incremental DAWG / double-array construction).
- Build the compact structure offline; record
n,node_count,bytes, and a content checksum. - Validate against the previous version: assert membership for a regression corpus, diff the added/removed key sets, and fail the build if the diff is implausibly large (a sign of an upstream data bug).
- Publish the serialized artifact with a header carrying
format_version,codec_version,build_timestamp, andchecksum.
Rollout. Load the new artifact into memory beside the live one (double-buffer), run a smoke test against it, then flip the atomic root pointer. Keep the previous artifact resident for fast rollback. Never mutate the live structure in place to "patch" it under traffic.
Compatibility rules.
| Change | Safe? | Action |
|---|---|---|
| Add keys | yes | rebuild + atomic swap |
| Remove keys | yes | rebuild + atomic swap (verify dependents) |
| Change normalization | breaking | bump codec_version; query side must match before deploy |
| Change serialization format | breaking | bump format_version; loaders reject unknown versions |
| Change alphabet/encoding | breaking | rebuild from source under new codec; never reinterpret old bytes |
Failure containment. A loader that encounters a version or checksum mismatch must refuse the artifact and keep the prior version — silent acceptance routes packets or suggests words from a garbled structure. Emit a loud alert; a stale-but-correct dictionary is operationally far better than a fresh-but-corrupt one.
Per-environment dictionaries. Staging and production often need different key sets (test data vs real). Tag artifacts by environment and assert at load that the artifact's environment matches the host — a production node loading a staging dictionary is a classic, hard-to-diagnose incident.
Choosing the Representation: A Decision Checklist¶
Run through these questions in order; the first "yes" usually decides the structure.
- Do you need only exact membership (no prefix queries at all)?
- Yes → use a hash set, not a trie. Simpler, faster, smaller for disjoint keys.
- Is the key set mutable at runtime (frequent inserts/deletes)?
- Yes, and small alphabet, dense → array-backed trie.
- Yes, sparse / Unicode → map-backed trie.
- Yes, but mostly reads with rare writes → immutable trie + copy-on-write swaps.
- Is the dictionary static and large and read-heavy?
- Yes, need prefix queries / autocomplete → double-array trie (mmap) or FST.
- Yes, need only membership, minimize size → DAWG / MA-FSA.
- Yes, need per-key payloads on a minimal structure → FST or ordinal-indexed values.
- Are keys long with little branching (URLs, paths, IPs)?
- Yes → radix / compressed trie (
O(n)nodes). For fixed-width binary keys → multibit / Patricia trie. - Is this longest-prefix match (routing)?
- Yes → binary/multibit trie, deepest-mark-wins, with a default
/0route. - Is memory the binding constraint at extreme scale?
- Yes → succinct trie (bit-vectors with rank/select) or sharded tries across machines.
| Workload | Recommended structure |
|---|---|
| Small mutable set, prefix queries | map / array trie |
| Read-mostly, rare writes | immutable trie + COW |
| Huge static dictionary, autocomplete | double-array trie / FST (mmap) |
| Huge static set, membership only | DAWG / MA-FSA |
| Long sparse keys | radix / compressed trie |
| IP/route longest-prefix match | multibit / Patricia binary trie |
| Per-key payload, minimal size | FST (transducer) |
| Beyond single-machine RAM | succinct trie or prefix-sharded cluster |
Document the chosen representation and the reasoning in the code; the next engineer (or you in six months) needs to know why it is a double-array trie and not a map, so they do not "simplify" it back into an OOM.
Coding Patterns¶
Pattern 1: Build-once, serve-many (immutable static trie)¶
Intent: For read-heavy dictionaries, build offline from sorted keys into a compact, immutable structure (DAWG / double-array), serialize it, mmap it at startup, serve concurrent lock-free reads. Updates ship as a whole new dictionary swapped atomically.
Pattern 2: Deepest-mark-wins traversal (LPM)¶
Intent: For longest-prefix-match, carry a "best so far" as you descend and overwrite it every time you pass a marked node. The default route is the root mark and the initial value. The deepest mark on the path is the answer.
Pattern 3: Ordinal-indexed values (DAWG with payloads)¶
Intent: Because a DAWG shares nodes, you cannot attach per-word data by node identity. Instead, count keys to the left during the walk to compute a unique ordinal per word, and index a flat value array by that ordinal — restoring per-word payloads on a minimal structure.
Performance Tips¶
- At scale, optimize cache misses per lookup, not instruction count: flat arrays beat pointer chasing.
- Use a double-array trie for static, read-heavy, large-alphabet dictionaries; build from sorted keys.
- Use a DAWG/MA-FSA when you only need membership and want minimal size (shares suffixes).
- For IP routing, use multibit / path-compressed tries to cut the 32/128-depth descent to a few memory accesses.
- mmap serialized tries for instant load and cross-process sharing.
- Reload dictionaries with an atomic pointer swap and double-buffering for zero downtime.
Edge Cases & Pitfalls¶
- DAWG payloads — shared suffix nodes cannot hold distinct per-word values; use ordinals or an FST.
- Double-array insertion — mutating a DAT may relocate whole child blocks; treat it as read-only after build.
- Default route — forgetting the
/0mark makes unmatched addresses return "no route" instead of the fallback. - IPv6 width — descend 128 bits, not 32; a hardcoded 32 silently truncates IPv6.
- Encoding mismatch on load — a dictionary built as UTF-8 loaded as Latin-1 yields silent false negatives.
- Concurrent mutation — never insert into a trie being read without synchronization; prefer immutable + COW.
- Sorted-input assumption — incremental DAWG/DAT builders require sorted keys; unsorted input corrupts them.
Common Mistakes¶
- Shipping a pointer trie at scale — burning 10–20× memory and cache misses where a DAT or DAWG would fit in cache.
- Attaching values to shared DAWG nodes — two words collide on the shared node; use ordinal-indexed values.
- Forgetting the default route in LPM — unmatched packets get dropped instead of the fallback hop.
- Mutating an immutable/shared trie under reads — data races and torn reads; build new + swap atomically.
- No version/codec header on serialized dictionaries — a future load under a different encoding silently misbehaves.
- Deletion that prunes shared nodes — guard pruning with
passCount==0and no-children checks. - Hardcoding the alphabet — array tries crash on out-of-range characters; validate or use maps.
Cheat Sheet¶
| Need | Choose |
|---|---|
| Mutable, small, fixed alphabet | array[Σ] trie |
| Mutable, sparse / Unicode | map trie |
| Static, huge, read-heavy | double-array trie (mmap) |
| Minimal membership structure | DAWG / MA-FSA |
| Per-word payload on minimal structure | FST / ordinal-indexed values |
| IP routing (LPM) | binary / multibit / Patricia trie, deepest mark wins |
| Zero-downtime reload | double-buffer + atomic root swap |
| Lock-free concurrent reads | immutable trie + copy-on-write updates |
LPM core:
lookup(addr):
best = root.hop if root marked else none # default route
node = root
for bit in addr bits (MSB first):
node = node.child[bit]; if node == null: break
if node marked: best = node.hop # deepest wins
return best
Visual Animation¶
See
animation.htmlfor an interactive visualization.The animation builds a character trie and runs search/prefix/autocomplete walks. The same descend-and-track-best pattern underlies the longest-prefix-match routing trie in this file; picture marking nodes with next-hops and remembering the deepest mark as you descend.
Summary¶
At production scale a trie is a memory and systems problem. The representation must match the workload: map tries for small mutable sets, array tries for dense small alphabets, double-array tries for static huge read-heavy dictionaries (flat arrays, cache-friendly, mmap-able), and DAWG/MA-FSA when you want the minimal membership structure that shares suffixes as well as prefixes. Longest-prefix match powers IP routing: descend the address bits and keep the deepest marked node, with the default route as fallback. Persist compact tries by serializing flat arrays and memory-mapping them; reload with atomic swaps; serve concurrent reads lock-free via immutability and copy-on-write. Test against brute-force oracles, round-trip serialization, deletion invariants, and LPM correctness. The failure modes — alphabet overflow, deletion corruption, unbounded memory, encoding mismatch, missing default route — are all preventable with the discipline above.
Further Reading¶
- Aoe (1989) — "An Efficient Digital Search Algorithm by Using a Double-Array Structure."
- Daciuk et al. — incremental construction of minimal acyclic finite-state automata (DAWG/MA-FSA).
- Computer Networks (Tanenbaum) and the Lulea / DIR-24-8 papers — IP longest-prefix-match hardware/software.
- Lucene FST documentation — finite-state transducers as terms dictionaries with payloads.
- Sibling topic
21-advanced-structures/24-25— radix/Patricia tries and TSTs. - Sibling topic
18-bit-manipulation/06— binary/XOR tries over integer keys.