SSA Backend — Senior¶
At this tier you should hold a correct mental model of the SSA backend: how the IR becomes SSA, the shape of the pass pipeline, how rewrite rules and rulegen work, where machine-independent optimization ends and lowering begins, and a working understanding of register allocation. The goal is to reason about codegen without dumping HTML for every question.
1. The SSA data model¶
Three types in cmd/compile/internal/ssa carry everything:
ssa.Func(func.go) — one compiled function. Owns the blocks, an entry block, and the*Configdescribing the target architecture.ssa.Block(block.go) — a basic block: a list ofValues, aKind(BlockPlain,BlockIf,BlockRet, arch branch kinds…), and successor edges. Control flow lives in the block's kind andControls.ssa.Value(value.go) — a single SSA value: anOp, a*types.Type, anAuxInt/Aux(immediates, symbols), and anArgs []*Valuelist. Each value has a uniqueID. This is the "defined once" object.
Ops are an enum (Op, generated into opGen.go). Generic ops (Add64, Load, IsInBounds, Phi, NilCheck) are architecture-neutral; arch ops (AMD64ADDQ, ARM64MOVDload) appear only after lowering. Phi is just an op whose args line up with its block's predecessor edges.
2. Construction: IR → SSA (ssagen)¶
cmd/compile/internal/ssagen/ssa.go walks the typed IR (ir.Node tree) and emits SSA. It builds blocks, threads values, and — crucially — handles the SSA-construction problem of placing phi functions and renaming variables. Go uses a variant of the classic dominance-frontier construction (Cytron et al.): a variable assigned in multiple blocks needs a phi in every block on the dominance frontier of those definitions. Dominator computation lives in dom.go. After this pass the function is valid, unoptimized SSA — what you see in the start column.
3. The pass pipeline¶
The ordered list is the passes slice in cmd/compile/internal/ssa/compile.go. Passes marked required: true run always; others can be disabled. The pipeline splits cleanly into machine-independent optimization (before lower) and machine-dependent lowering + scheduling + allocation (from lower on):
number lines → early phielim/copyelim → early deadcode → decompose user
opt (generic rewrite rules) → zcse → deadcode
generic cse → phiopt → nilcheckelim → prove ← BCE / range facts
expand calls → late opt → sccp → deadcode
branchelim → check bce → dse → memcombine → writebarrier
--- machine boundary ---
lower (generic → arch ops) → addressing modes → late lower → pair
lowered cse → tighten tuple selectors → checkLower
late phielim/copyelim → tighten → critical (split critical edges)
layout (schedule blocks) → schedule (order values)
flagalloc → regalloc → loop rotate → trim → stackalloc → ...
(Names are verbatim from the source; the list evolves between releases — always read compile.go for the version you ship.) Key passes to know by name:
| Pass | File | Purpose (one line) |
|---|---|---|
opt / late opt | rewrite*.go | apply generic rewrite rules: fold, simplify, strength-reduce |
generic cse | cse.go | merge identical pure values |
nilcheckelim | nilcheck.go | drop redundant nil checks |
prove | prove.go | derive integer ranges → bounds-check elimination |
sccp | sccp.go | sparse conditional constant propagation |
deadcode | deadcode.go | remove unreachable/unused values & blocks |
lower | generated rewrite<ARCH>.go | rewrite generic ops to arch ops |
schedule | schedule.go | linearize values within each block |
regalloc | regalloc.go | assign registers + spill/stack slots |
critical | critical.go | split critical edges so phis are placeable |
4. Rewrite rules and rulegen¶
Most optimization and all lowering is expressed as rewrite rules, not hand-written Go. Rules live in cmd/compile/internal/ssa/_gen/:
generic.rules— machine-independent rewrites (folding, simplification, BCE).<ARCH>.rules(e.g.AMD64.rules,ARM64.rules) — lowering of generic ops to that architecture's ops.genericOps.go/<ARCH>Ops.go— the op definitions (arg count, types, commutativity, flags).
Rule syntax is (pattern) [&& cond] => (replacement):
(Add64 (Const64 [c]) (Const64 [d])) => (Const64 [c+d])
(Mul64 (Const64 [c]) (Const64 [d])) => (Const64 [c*d])
(Sub64 x (Const64 <t> [c])) && x.Op != OpConst64 => (Add64 (Const64 <t> [-c]) x)
Lowercase identifiers (x, c) bind subtrees/AuxInts; <t> binds the type; && adds a Go boolean guard; _ matches anything. The program rulegen (_gen/main.go + rulegen.go) reads these .rules files and generates the Go matcher functions rewriteValuegeneric.go, rewriteValueAMD64.go, etc., which the opt/lower passes call. You never edit the generated files; you edit the .rules and run go generate in the ssa directory.
5. Lowering: the machine boundary¶
Up to lower, the function speaks generic ops with no notion of registers or addressing modes. The lower pass applies the <ARCH>.rules to replace each generic op with one or more arch ops:
Lowering also collapses address arithmetic into the target's addressing modes (addressing modes pass): a Load (ADDQ ptr (SHLQconst idx)) becomes a single MOVQloadidx8. After lowering, every value maps to (roughly) one machine instruction, which is what makes scheduling and register allocation possible.
6. Register allocation (overview)¶
regalloc.go assigns each value a physical register, inserting spills to the stack when registers run out. Go uses a fast, linear-scan-style allocator (not a graph-coloring allocator) tuned for compile speed:
- Values flagged rematerializable (e.g. a constant) are recomputed instead of spilled — cheaper than a load.
flagallocruns first to handle the condition-flags "register," which is special because most arch ops clobber flags.- The allocator respects calling-convention registers and clobbers around calls.
phivalues are resolved here into moves on the incoming edges (which is whycriticalmust split critical edges first — there must be a block to put the move in).
After regalloc come stack-frame layout, final deadcode/trim, and the obj writer that emits the actual machine encoding.
7. Why SSA enables all of this¶
Every pass above leans on the single-assignment invariant:
- Folding/CSE: equal ops with equal args are provably equal — no aliasing of names to worry about.
- Prove/BCE: a def has exactly one source, so range facts propagate cleanly down dominators.
- Dead code: a value with zero uses is trivially dead; use-counts are exact.
- Register allocation: live ranges are precise because each value has one def and well-defined uses.
The cost is phi management and edge splitting — a price the backend pays once at construction and unwinds before codegen.
Further reading¶
compile.go(thepasseslist) andREADME.mdin$(go env GOROOT)/src/cmd/compile/internal/ssa/rulegenand rules:_gen/main.go,_gen/generic.rules,_gen/AMD64.rules- Construction:
cmd/compile/internal/ssagen/ssa.go; dominators:ssa/dom.go - Cytron et al., "Efficiently Computing Static Single Assignment Form" (1991)
- Keith Randall, "Go's SSA backend" talk & design notes: https://go.dev/wiki/CompilerOptimizations