Suffix Automaton (SAM) — Junior Level¶
One-line summary: A suffix automaton of a string
sis the smallest deterministic finite automaton (DFA) that accepts exactly the suffixes ofs. Because every substring ofsis a prefix of some suffix, walking the automaton from the start state also recognizes every substring ofs. It is built online inO(n)by adding one character at a time, and it has at most2n-1states and3n-4transitions.
Table of Contents¶
- Introduction
- Prerequisites
- Glossary
- Core Concepts
- Big-O Summary
- Real-World Analogies
- Pros & Cons
- Step-by-Step Walkthrough
- Code Examples
- Coding Patterns
- Error Handling
- Performance Tips
- Best Practices
- Edge Cases & Pitfalls
- Common Mistakes
- Cheat Sheet
- Visual Animation
- Summary
- Further Reading
Introduction¶
Imagine someone hands you a long string s and then keeps asking: "Is p a substring of s?" for many different p. The naive answer scans s for each query in O(|s|) time. The suffix automaton turns this into a tiny O(|p|) walk: you follow one transition per character of p, and if you never fall off the automaton, p is a substring. That is the headline use, but it is only the beginning.
A suffix automaton (SAM) is a remarkable little machine. Formally it is the minimal deterministic finite automaton that recognizes the set of all suffixes of s. Two facts make it powerful:
- Every substring of
sis a prefix of some suffix ofs. So the same automaton that accepts suffixes also lets you recognize every substring by simply walking transitions from the start.- It is astonishingly small — at most
2n-1states and3n-4transitions for a string of lengthn— and it is built online in linear timeO(n)(for a fixed alphabet), adding the characters ofsone by one.
That combination — recognizes all substrings, linear size, linear online construction — is what makes the suffix automaton one of the most useful structures in all of string processing. With it you can, in time proportional to the structure (not re-scanning s each query):
- Check whether a pattern
pis a substring ofsinO(|p|). - Count the number of distinct substrings of
swith a single formula. - Count how many times each substring occurs.
- Find the longest common substring of two strings.
- Find the k-th smallest distinct substring, total length of distinct substrings, and more.
The two structures it relies on internally are the suffix link tree and the len value of each state. The whole document keeps circling back to one central idea — the endpos equivalence — which explains why states are what they are. We will build intuition for that gently and leave the full proofs to professional.md.
The suffix automaton is a sibling of the suffix array (topic 04-suffix-array) and the suffix tree (topic 13-suffix-tree). All three index every substring of s; they trade off differently in size, build difficulty, and which queries are natural. We contrast them throughout.
Prerequisites¶
Before reading this file you should be comfortable with:
- Strings and substrings — a substring is a contiguous slice
s[l..r]; a suffix is a substring ending at the last position. - Deterministic finite automata (DFA) — states, transitions on characters, a start state, accepting states. Walking a DFA means following one transition per input character.
- Graphs and trees — the suffix links form a tree; basic tree traversal (DFS/BFS) is used to aggregate values.
- Hash maps / arrays as maps — transitions are stored as
state × character → state, often a small array or a hash map. - Big-O notation —
O(n),O(|p|),O(n · alphabet).
Optional but helpful:
- A glance at tries (topic
11-trie) — a SAM is a much smaller cousin that shares prefixes and suffixes. - Familiarity with amortized analysis — the linear-time build relies on an amortized argument (each step does
O(1)work on average).
Glossary¶
| Term | Meaning |
|---|---|
| Substring | A contiguous slice of s. "ab" is a substring of "cab"; "cb" is not. |
| Suffix | A substring that ends at the last character of s. Suffixes of "aba" are "aba", "ba", "a", "". |
| DFA | Deterministic finite automaton: states + per-character transitions; at most one transition per (state, char). |
| State | A node of the automaton. Each state corresponds to an endpos equivalence class of substrings. |
len[v] | The length of the longest substring in state v's equivalence class. |
| endpos(t) | The set of ending positions of all occurrences of substring t in s. |
Suffix link link[v] | A pointer to the state holding the longest proper suffix of v's strings that lies in a different endpos class. Forms a tree. |
| Clone | A copied state created during extend to split an endpos class (the "split" case). |
extend(c) | The online step that adds one character c to the automaton. |
Initial state (t0) | The start state; represents the empty substring; len = 0, link = -1. |
| Terminal / accepting state | A state whose class contains a suffix of s; marked after the build by walking suffix links from the last state. |
Core Concepts¶
1. Each State Is a Set of Substrings (the endpos idea)¶
The single most important idea: a SAM state does not represent one substring — it represents a whole group of substrings that all occur in exactly the same places.
Define endpos(t) = the set of positions where occurrences of substring t end in s. For s = "abcbc":
endpos("bc") = {3, 5} // "bc" ends at index 3 and index 5
endpos("c") = {3, 5} // "c" ends at index 3 and index 5
endpos("cbc") = {5} // only once, ending at 5
Two substrings are endpos-equivalent if they have the same endpos set. "bc" and "c" above share {3,5}, so they live in the same state. The set of distinct endpos classes is exactly the set of states. This is why the automaton is small: many substrings collapse into one state.
2. len and the Span of a State¶
Inside one state, the substrings form a contiguous range of lengths. The state stores len[v] = the length of the longest substring it holds. The shortest length it holds is len[link[v]] + 1. So state v represents substrings of lengths in the range:
That range — len[v] - len[link[v]] distinct substrings per state — is the key to counting distinct substrings (Concept 6).
3. Suffix Links Form a Tree¶
The suffix link link[v] points to the state whose longest string is the longest proper suffix of v's longest string that belongs to a different endpos class. Following links repeatedly always reaches the initial state. These links form a tree rooted at the initial state — the suffix-link tree. It is the backbone for counting occurrences and answering "how many times does this substring appear".
4. The Automaton Recognizes All Substrings¶
Start at the initial state and feed the characters of a query string p one at a time, following transitions. If at any point there is no transition for the next character, p is not a substring of s. If you consume all of p, then p is a substring. This is an O(|p|) walk — independent of |s|.
5. Online Construction: extend One Character at a Time¶
You build the SAM by starting with just the initial state and calling extend(c) for each character c of s left to right. Each extend:
- Creates a new state
curfor the whole prefix just formed (len[cur] = len[last] + 1). - Walks suffix links from
last, adding transitions onctocurfor every state that lacks one. - Handles three cases when it finds a state
pthat already has ac-transition to some stateq: - No such
p(reached the root):link[cur] = t0. qis "solid" (len[q] == len[p] + 1):link[cur] = qdirectly.- Otherwise: clone/split — make a copy
cloneofq, fix lengths and links, and redirect transitions.
The clone/split is the only subtle part; we trace it concretely in the walkthrough. The whole build is amortized O(n).
6. Counting Distinct Substrings (one formula)¶
Every distinct substring of s lives in exactly one state, and each state v contributes exactly len[v] - len[link[v]] distinct substrings (its length range). So:
number of distinct substrings of s = Σ over all states v (except the root) of ( len[v] - len[link[v]] )
No extra traversal — just sum that over the states you already built. (Proof in professional.md.)
7. Counting Occurrences (endpos sizes via the link tree)¶
The number of times a substring (in state v) occurs equals |endpos(v)|. You compute these sizes by setting cnt = 1 on every non-clone state created as a prefix-end, then summing cnt up the suffix-link tree (children push their counts to parents). After that, cnt[v] is the occurrence count for every substring in state v.
Big-O Summary¶
| Operation | Time | Space | Notes |
|---|---|---|---|
| Build SAM (online, fixed alphabet) | O(n) | O(n · σ) array / O(n) map | σ = alphabet size; array transitions are O(σ) per state. |
One extend(c) | Amortized O(1) (×σ for array init) | — | Worst single step can be longer; amortizes to O(n) total. |
Substring check p ∈ s? | O(|p|) | O(1) | Walk transitions; map lookup per char. |
| Count distinct substrings | O(n) | O(1) extra | One pass: Σ len[v] - len[link[v]]. |
| Occurrence count of every state | O(n) | O(n) | Counting-sort the link tree by len, accumulate cnt. |
Longest common substring of s, t | O(|s| + |t|) | O(|s|) | Build SAM of s, run t through it. |
| k-th distinct substring | O(answer length · σ) | O(n) | DFS guided by per-state path counts. |
| #states / #transitions | ≤ 2n−1 / ≤ 3n−4 | — | Tight bounds (for n ≥ 2/n ≥ 3). |
The headline numbers: O(n) to build, O(|p|) to query — both independent of how many queries you ask.
Real-World Analogies¶
| Concept | Analogy |
|---|---|
| The automaton accepting all substrings | A switchboard: dial the characters of p one at a time; if every digit connects, p is a valid "number" (substring) in the directory. |
| endpos equivalence class | A mailing list: substrings that always show up at the same set of locations get grouped into one "subscriber group" (state). |
len[v] range | A shelf labeled with a span of lengths — the longest item is len[v], the shortest is len[link[v]] + 1; everything in between lives there. |
| Suffix link tree | A "drop the front letter" hierarchy: from any group, the link points to where the longest suffix lives once the group changes. |
| The clone/split during build | Reorganizing a folder when a new file arrives that only some of the old folder's items now share a property — you split off a sub-folder. |
| Counting occurrences via link tree | Tallying votes up an org chart: leaves report their single vote, managers sum the votes of their reports. |
Where the analogy breaks: real switchboards have no "suffix link"; the link tree is a structural artifact that exists purely to keep the machine minimal and to aggregate counts.
Pros & Cons¶
| Pros | Cons |
|---|---|
Linear O(n) online construction — add characters as they stream in. | The clone/split logic in extend is genuinely subtle and easy to get wrong. |
Tiny: ≤ 2n-1 states, ≤ 3n-4 transitions. | Array transitions cost O(n·σ) memory; large alphabets hurt. |
O(|p|) substring queries, independent of |s|. | Reasoning about why it works (endpos) takes effort to internalize. |
| Distinct-substring count is a one-line sum. | Not as directly suited to some suffix-array tasks (e.g., sorted suffix ranks). |
| Occurrence counts fall out of the suffix-link tree. | Generalizing to many strings (generalized SAM) needs extra care. |
| Longest common substring of two strings in linear time. | Persisting / serializing the structure is fiddlier than a flat array. |
When to use: many substring membership queries, counting distinct substrings, occurrence counting, longest common substring of two (or few) strings, online indexing of a streaming text.
When NOT to use: you need sorted suffix order or LCP arrays (prefer suffix array, 04), the alphabet is huge and memory is tight (prefer a map-based or hashing approach), or a single substring check on a short text where a plain scan is simpler.
Step-by-Step Walkthrough¶
Let us build the SAM of s = "aba" character by character. Each state stores len and link; the initial state is t0 with len=0, link=-1.
Start. One state t0 (len=0, link=-1). last = t0.
extend('a'). Create cur=1 with len=1. Walk links from last=t0: t0 has no 'a' transition, so add t0 --a--> 1. We reach the root (link=-1), so link[1] = t0. Now last = 1.
extend('b'). Create cur=2 with len=2. Walk from last=1: 1 has no 'b', add 1 --b--> 2; go to link[1]=t0; t0 has no 'b', add t0 --b--> 2; reach root, so link[2]=t0. Now last=2.
extend('a'). Create cur=3 with len=3. Walk from last=2: 2 has no 'a', add 2 --a--> 3; go to link[2]=t0; t0 does have 'a' → it goes to state 1 (call it q=1, found at p=t0).
Now the case analysis: is q "solid", i.e. len[q] == len[p] + 1? Here len[1]=1 and len[t0]+1 = 1. Yes, solid. So link[3] = q = 1. No clone needed. Now last=3.
states: t0, 1(len1, link t0), 2(len2, link t0), 3(len3, link 1)
trans: t0 --a--> 1 ; 1 --b--> 2 ; t0 --b--> 2 ; 2 --a--> 3
The automaton is built. Let us use it.
- Substring check
"ba": fromt0,'b'→2,'a'→3. Consumed all of"ba", so yes, it is a substring (s = "aba", and"ba"occurs). - Substring check
"bb": fromt0,'b'→2, then'b'from2? No transition. So"bb"is not a substring.
Distinct substrings: sum len[v] - len[link[v]] over states 1,2,3:
state 1: 1 - 0 = 1 ("a")
state 2: 2 - 0 = 2 ("b", "ab"? -> actually "b" and "ab") -> wait, recount below
state 3: 3 - 1 = 2
total = 1 + 2 + 2 = 5
The distinct substrings of "aba" are: "a", "b", "ab", "ba", "aba" — exactly 5. The formula matches. (State 2 holds "ab" and "b"; state 3 holds "aba" and "ba"; state 1 holds "a".) Each multiply of length range tallies its own distinct strings, and the sum is the total.
Now consider a case that forces a clone: s = "abab", last step extend('b') after "aba". When you reach a state p with a 'b'-transition to q where len[q] != len[p]+1, the long string "abab" and the shorter occurrences of "ab"/"b" no longer share the same endpos set, so the old state q must be split into a clone (the shared shorter part) and the original q (the longer part). The clone copies q's transitions and link, takes len[clone] = len[p]+1, and both q and cur link to it. We trace this in full in middle.md and prove it in professional.md.
Code Examples¶
Example: Build a SAM and Count Distinct Substrings¶
This builds the automaton with map-based transitions (works for any alphabet) and prints the number of distinct substrings.
Go¶
package main
import "fmt"
type State struct {
next map[byte]int
link int
length int
}
type SAM struct {
st []State
last int
}
func NewSAM() *SAM {
s := &SAM{}
s.st = append(s.st, State{next: map[byte]int{}, link: -1, length: 0})
s.last = 0
return s
}
func (s *SAM) Extend(c byte) {
cur := len(s.st)
s.st = append(s.st, State{next: map[byte]int{}, link: -1, length: s.st[s.last].length + 1})
p := s.last
for p != -1 {
if _, ok := s.st[p].next[c]; ok {
break
}
s.st[p].next[c] = cur
p = s.st[p].link
}
if p == -1 {
s.st[cur].link = 0
} else {
q := s.st[p].next[c]
if s.st[p].length+1 == s.st[q].length {
s.st[cur].link = q
} else {
clone := len(s.st)
nq := State{next: map[byte]int{}, link: s.st[q].link, length: s.st[p].length + 1}
for k, v := range s.st[q].next {
nq.next[k] = v
}
s.st = append(s.st, nq)
for p != -1 {
if s.st[p].next[c] == q {
s.st[p].next[c] = clone
p = s.st[p].link
} else {
break
}
}
s.st[q].link = clone
s.st[cur].link = clone
}
}
s.last = cur
}
func (s *SAM) DistinctSubstrings() int64 {
var total int64
for v := 1; v < len(s.st); v++ {
total += int64(s.st[v].length - s.st[s.st[v].link].length)
}
return total
}
func main() {
sam := NewSAM()
for i := 0; i < len("aba"); i++ {
sam.Extend("aba"[i])
}
fmt.Println("distinct substrings of aba =", sam.DistinctSubstrings()) // 5
}
Java¶
import java.util.*;
public class SuffixAutomaton {
static class State {
Map<Character, Integer> next = new HashMap<>();
int link = -1;
int length = 0;
}
List<State> st = new ArrayList<>();
int last;
SuffixAutomaton() {
st.add(new State());
last = 0;
}
void extend(char c) {
int cur = st.size();
State s = new State();
s.length = st.get(last).length + 1;
st.add(s);
int p = last;
while (p != -1 && !st.get(p).next.containsKey(c)) {
st.get(p).next.put(c, cur);
p = st.get(p).link;
}
if (p == -1) {
st.get(cur).link = 0;
} else {
int q = st.get(p).next.get(c);
if (st.get(p).length + 1 == st.get(q).length) {
st.get(cur).link = q;
} else {
int clone = st.size();
State nq = new State();
nq.length = st.get(p).length + 1;
nq.link = st.get(q).link;
nq.next.putAll(st.get(q).next);
st.add(nq);
while (p != -1 && st.get(p).next.getOrDefault(c, -1) == q) {
st.get(p).next.put(c, clone);
p = st.get(p).link;
}
st.get(q).link = clone;
st.get(cur).link = clone;
}
}
last = cur;
}
long distinctSubstrings() {
long total = 0;
for (int v = 1; v < st.size(); v++) {
total += st.get(v).length - st.get(st.get(v).link).length;
}
return total;
}
public static void main(String[] args) {
SuffixAutomaton sam = new SuffixAutomaton();
for (char c : "aba".toCharArray()) sam.extend(c);
System.out.println("distinct substrings of aba = " + sam.distinctSubstrings()); // 5
}
}
Python¶
class SuffixAutomaton:
def __init__(self):
self.nxt = [{}] # transitions
self.link = [-1] # suffix links
self.length = [0] # len of each state
self.last = 0
def extend(self, c):
cur = len(self.nxt)
self.nxt.append({})
self.link.append(-1)
self.length.append(self.length[self.last] + 1)
p = self.last
while p != -1 and c not in self.nxt[p]:
self.nxt[p][c] = cur
p = self.link[p]
if p == -1:
self.link[cur] = 0
else:
q = self.nxt[p][c]
if self.length[p] + 1 == self.length[q]:
self.link[cur] = q
else:
clone = len(self.nxt)
self.nxt.append(dict(self.nxt[q]))
self.link.append(self.link[q])
self.length.append(self.length[p] + 1)
while p != -1 and self.nxt[p].get(c) == q:
self.nxt[p][c] = clone
p = self.link[p]
self.link[q] = clone
self.link[cur] = clone
self.last = cur
def distinct_substrings(self):
return sum(self.length[v] - self.length[self.link[v]]
for v in range(1, len(self.nxt)))
if __name__ == "__main__":
sam = SuffixAutomaton()
for ch in "aba":
sam.extend(ch)
print("distinct substrings of aba =", sam.distinct_substrings()) # 5
What it does: builds the SAM of "aba" online and prints 5, the number of distinct substrings. Run: go run main.go | javac SuffixAutomaton.java && java SuffixAutomaton | python sam.py
Coding Patterns¶
Pattern 1: Substring Membership Walk¶
Intent: Check whether p occurs in s by walking transitions from the start state.
def contains(sam, p):
v = 0 # initial state
for ch in p:
if ch not in sam.nxt[v]:
return False
v = sam.nxt[v][ch]
return True
This is the canonical use of a SAM: O(|p|) per query after an O(n) build.
Pattern 2: Distinct-Substring Sum¶
Intent: Count all distinct substrings without enumerating them.
def distinct(sam):
return sum(sam.length[v] - sam.length[sam.link[v]]
for v in range(1, len(sam.nxt)))
Each state contributes its len-range; summing gives the total.
Pattern 3: Order States by len (topological order of the link tree)¶
Intent: Many aggregations (occurrence counts, total lengths) need states processed from longest to shortest. Because len[link[v]] < len[v], sorting by len is a valid topological order of the suffix-link tree.
Error Handling¶
| Error | Cause | Fix |
|---|---|---|
| Wrong distinct-substring count | Included the root state in the sum, or used link of the root (-1). | Sum over states 1.. only; root contributes nothing. |
IndexError / nil map on transition | Forgot to initialize the per-state transition map/array. | Always allocate next when creating a state (including clones). |
| Build never terminates / wrong links | Forgot to advance p = link[p] in a loop, or mishandled p == -1. | Mirror the reference extend exactly; the two while loops differ. |
| Clone missing transitions | Did not copy q's next map into the clone. | Clone copies q.next, q.link, and sets len = len[p]+1. |
| Occurrence counts all 1 | Forgot to propagate cnt up the suffix-link tree. | After build, accumulate cnt[link[v]] += cnt[v] in len-descending order. |
| Counting clones as occurrences | Gave clones cnt = 1. | Only original prefix-end states get cnt = 1; clones start at 0. |
Performance Tips¶
- Choose transition storage by alphabet. Small fixed alphabet (e.g. lowercase letters) → fixed-size array per state for
O(1)lookups; large/unknown alphabet → hash map to save memory. - Reserve capacity. You will create at most
2n-1states; preallocate arrays of that size to avoid reallocation during the build. - Sort by
lenwith counting sort.lenvalues are in[0, n], so a counting sort gives the topological order inO(n)instead ofO(n log n). - Avoid recursion for link-tree aggregation. Use the
len-ordered array and iterate; deep recursion can blow the stack on long strings. - Reuse the structure across queries. Build once; answer every substring query by walking — never rebuild per query.
Best Practices¶
- Keep
extendbyte-for-byte faithful to a reference implementation; it is the part everyone gets wrong. Test it against a brute-force distinct-substring counter on random short strings. - Mark terminal states after the build (walk suffix links from
last) if you need to distinguish "is a suffix" from "is a substring". - Store
lenandlinkin parallel arrays indexed by state id; it is cache-friendlier than an array of structs with maps. - Separate the build from the queries; do not interleave mutation and querying.
- Validate the size invariants (
states ≤ 2n-1,transitions ≤ 3n-4) in tests as a cheap correctness check.
Edge Cases & Pitfalls¶
- Empty string — the SAM is just the initial state; zero distinct substrings. Handle
n = 0without crashing. - Single character (
s = "a") — one extend, two states, one distinct substring. - All-equal string (
s = "aaaa") — many clones; a good stress test for the split logic. Distinct substrings =n("a", "aa", "aaa", "aaaa"). - Highly repetitive strings — exercise the clone path heavily; the bounds
2n-1/3n-4still hold. - Querying the empty pattern — it is a substring of everything; the walk consumes nothing and returns true.
- Substrings vs suffixes — the automaton accepts suffixes; substring recognition is just "did the walk survive", not "did it end in an accepting state".
Common Mistakes¶
- Summing distinct substrings over all states including the root — the root has
link = -1and must be excluded. - Forgetting to copy the clone's transitions — the clone must inherit
q'snextmap and link, only changinglen. - Giving clones an occurrence count of 1 — clones represent shorter shared substrings; they accumulate counts from below, starting at 0.
- Stopping the redirect loop too early or too late — redirect
p --c--> qtocloneonly whilenext[p][c] == q, walking up links. - Confusing "is a substring" with "ends in an accepting state" — substring membership is purely whether the transition walk survives.
- Rebuilding the SAM per query — build once, query many; the whole point is amortizing the
O(n)build. - Treating one state as one substring — a state holds a range of substrings (an endpos class), not a single one.
Cheat Sheet¶
| Question | Tool | Formula / step |
|---|---|---|
Is p a substring? | walk transitions | survive all |p| steps from t0 |
| Distinct substrings | per-state ranges | Σ (len[v] - len[link[v]]) |
| Occurrences of a substring | endpos size | cnt[v] after link-tree accumulation |
Longest substring length in v | stored | len[v] |
Shortest substring length in v | from link | len[link[v]] + 1 |
| #states / #transitions | bounds | ≤ 2n-1 / ≤ 3n-4 |
Longest common substring of s,t | run t through SAM(s) | track current match length |
Core build skeleton:
extend(c):
cur = new state, len[cur] = len[last] + 1
p = last
while p != -1 and no c-transition at p:
next[p][c] = cur; p = link[p]
if p == -1: link[cur] = root
else:
q = next[p][c]
if len[q] == len[p] + 1: link[cur] = q
else: clone q -> clone (len = len[p]+1, copy next & link),
redirect c-edges p..->q to clone,
link[q] = link[cur] = clone
last = cur
# whole build: amortized O(n)
Visual Animation¶
See
animation.htmlfor an interactive visualization.The animation demonstrates: - Adding characters one at a time via
extend, with each new state showing itslen. - The new transitions created as we walk suffix links fromlast. - The suffix links drawn as a tree growing alongside the automaton. - The clone/split event highlighted when it happens, showing how one state divides into two. - Play / Pause / Step controls and an editable string so you can build the SAM of any input.
Summary¶
A suffix automaton is the minimal DFA accepting all suffixes — and therefore recognizing all substrings — of a string s. Its states are the endpos equivalence classes of substrings; each state stores len (the longest string it holds) and a suffix link to the class of its longest differing proper suffix. The suffix links form a tree. The automaton is tiny (≤ 2n-1 states, ≤ 3n-4 transitions) and is built online in O(n) by extend-ing one character at a time, with the only subtlety being the clone/split case. Once built, you check substrings in O(|p|), count distinct substrings by the one-line sum Σ (len[v] - len[link[v]]), count occurrences via endpos sizes aggregated up the link tree, and find the longest common substring of two strings by running one through the other's SAM. Master extend and the endpos idea, and a whole family of substring problems becomes linear-time.
Further Reading¶
- cp-algorithms.com — "Suffix Automaton" (the canonical reference implementation and proofs).
- Blumer et al. (1985) — the original DAWG / suffix automaton construction.
- Crochemore & Hancart — Automata for Matching Patterns, on factor and suffix automata.
- Sibling topic
04-suffix-array— the array view of all suffixes (sorted order, LCP). - Sibling topic
13-suffix-tree— the tree view; SAM is its automaton-minimization cousin. - Sibling topic
11-trie— the simpler prefix-sharing structure the SAM generalizes. middle.md,senior.md,professional.md— endpos classes, production concerns, and full proofs.