Go Nil Pointer Dereference — Optimize¶
Instructions¶
Each exercise focuses on the cost of nil checks, opportunities the compiler takes to eliminate them, and patterns that minimize both panic risk and runtime cost. Difficulty: Easy, Medium, Hard.
Exercise 1 (Easy) — Cost of an Explicit Nil Check¶
Problem:
Question: How does the compiler implement the nil check? What is the cost?
Solution
**Discussion**: The check compiles to a single `TEST` and conditional branch: Cost: ~1 cycle for the TEST, plus branch prediction overhead. The branch predictor handles consistent paths well (~0 effective cost when one branch dominates). **Verify**: **Key insight**: An explicit nil check is essentially free in normal code. Don't avoid them for performance.Exercise 2 (Easy) — Implicit Nil Check on Field Access¶
Problem:
Question: Does the compiler insert an explicit nil check?
Solution
**Discussion**: For `t.field` where `field` is at a small offset (well below 64 KB), no explicit check. The load itself faults if `t == nil`: The CPU's MMU handles the check via the page protection on the nil page. Cost: 0 instructions for the check; the only cost is the load itself. For very large offsets (>64 KB), the compiler inserts an explicit check because the offset could reach into mapped memory. **Verify**: **Key insight**: Most field accesses get free nil checking via the MMU. The "check" is just letting the load trap.Exercise 3 (Easy) — Redundant Check Elimination¶
Problem:
Question: How many nil checks does the compiler emit?
Solution
**Discussion**: One check at the top. The SSA pass `nilcheckelim` proves that after the `if p == nil` guard, `p` is non-nil. All three field accesses inherit this fact. The compiled code: Even the implicit checks are elided where possible (the load itself still acts as the safety net at the hardware level for most platforms; the compiler relies on this). **Verify**: You should see "removed nil check" lines. **Key insight**: The compiler's nilcheckelim pass eliminates redundant checks. Write clear code; the compiler handles the optimization.Exercise 4 (Medium) — When Manual Check Beats Compiler¶
Problem:
func process(items []*Item) {
for _, it := range items {
// do work using it
sum += it.Value
sum += it.Count
sum += it.Total
}
}
Question: Could the slice contain nil entries? Should you check?
Solution
**Discussion**: If the slice is allowed to contain nil, every `it.X` panics. The compiler cannot prove `it` is non-nil because the slice's pointer-to-Item is not statically known. **Optimization** — filter or check: After the check, the compiler proves `it != nil` for the rest of the loop body — no further nil checks for the three field accesses. **Better optimization** — eliminate nils at the source: **Benchmark** (1M items, 5% nil): - Loop with check: ~2.5 ns/iter, no panics - No check: panics on first nil, undefined throughput **Key insight**: Manual nil checks restore the compiler's ability to prove non-nilness for subsequent operations in the same block.Exercise 5 (Medium) — Nil-Safe Method vs Caller Check¶
Problem:
type Logger struct{ w io.Writer }
func (l *Logger) Log(s string) {
if l == nil { return }
fmt.Fprintln(l.w, s)
}
vs
func (l *Logger) Log(s string) {
fmt.Fprintln(l.w, s) // requires non-nil
}
// Caller does the check.
Question: Which is faster? Which is better?
Solution
**Discussion**: Performance is identical in the common case (non-nil receiver). The check inside `Log` is one TEST + JE; outside, callers do the same. **Code-quality difference**: - Nil-safe method: callers don't need to know `Log` permits nil. Cleaner call sites. - External check: `Log` documents non-nil precondition; callers are explicit. **When nil-safe wins**: - Logger is optional and frequently absent (e.g., test code). - Many call sites; per-site check is repetitive. **When non-safe wins**: - The receiver should always be valid; nil indicates a bug. - Performance-critical inner loop where even one TEST per call counts (rare). **Benchmark** (1M calls, mostly non-nil): - Both approaches: ~3 ns/op (essentially identical) **Key insight**: Choose based on API style, not performance. The cost is negligible either way.Exercise 6 (Medium) — Inline Allocation vs Pointer¶
Problem:
type Container struct {
item *Item
}
func process(c *Container) int {
if c.item == nil {
return 0
}
return c.item.value
}
vs
type Container struct {
item Item
hasItem bool
}
func process(c *Container) int {
if !c.hasItem {
return 0
}
return c.item.value
}
Question: Which is more efficient? Memory layout?
Solution
**Discussion**: - Pointer version: 8 bytes for the pointer + N bytes for the heap-allocated Item (separate allocation). - Inline version: sizeof(Item) + 1 byte for hasItem (with padding for alignment). For small Item types (a few words), inline wins: - Single allocation for Container. - Better cache locality. - No pointer dereference. For large Items, pointer wins when Container has many instances and Item is rarely populated: - Sparse Items don't waste memory. **Cost comparison** (1M Containers, 50% with item, Item is 8 bytes): - Pointer version: 16 + (50% × 8) = ~20 MB total, plus GC overhead per nested allocation. - Inline version: 16 MB total, single allocation per Container. **Benchmark access**: - Pointer: load Container, load item ptr, branch, load item.value — 3 loads. - Inline: load Container, load hasItem, branch, load item.value — 2 loads (item is at known offset). **Key insight**: Inline is cheaper for small, frequently-populated optional fields. Pointer is cheaper for large, sparse fields.Exercise 7 (Medium) — Avoiding Nil Slices in Hot Paths¶
Problem:
func sum(xs []int) int {
if xs == nil {
return 0
}
total := 0
for _, x := range xs {
total += x
}
return total
}
Question: Is the nil check needed?
Solution
**Discussion**: No. A nil slice has length 0; `range` over a nil slice iterates zero times. The early return is functionally equivalent to falling through. **Optimization**: For nil `xs`, this returns 0 — same as the original. **Benchmark** (1M calls with mix of nil and populated): - With check: 2.0 ns/op for nil; 200 ns/op for populated (length 100). - Without check: 1.5 ns/op for nil (loop check skipped immediately); 200 ns/op for populated. **Key insight**: Nil slices are valid for read operations. Don't add checks the language already handles.Exercise 8 (Hard) — Recover Cost in Hot Path¶
Problem:
func safeProcess(p *T) (result int, err error) {
defer func() {
if r := recover(); r != nil {
err = fmt.Errorf("panic: %v", r)
}
}()
return p.Compute(), nil
}
Question: What is the cost of the deferred recover? When does it dominate?
Solution
**Discussion**: `defer` itself has cost (in older Go, ~50 ns; in Go 1.14+ with open-coded defers, much less). `recover` is essentially free unless a panic is in flight. **Cost decomposition**: - defer setup: ~5 ns (open-coded) or ~50 ns (heap-allocated). - recover when no panic: ~1 ns. - recover when panic: stack scan, ~µs. For 1M calls/sec, the defer overhead is significant: 5 ns × 10^9 = 5 seconds of CPU per second of wall time per core. **Optimization** — avoid defer in hot paths: If the check is sufficient, no recover needed. **Optimization** — recover at higher level, not per call: One defer for many calls. **Benchmark** (1M calls): - Per-call defer+recover: ~50 ns/op (Go 1.13) - Per-call defer+recover: ~7 ns/op (Go 1.14+, open-coded) - No defer: ~3 ns/op **Key insight**: Recover is for boundaries, not per call. Open-coded defers reduce the cost dramatically in modern Go but it's not zero.Exercise 9 (Hard) — Compiler-Inserted Check at Large Offset¶
Problem:
type Big struct {
pad [70 * 1024]byte // 70 KB
last int
}
func read(b *Big) int {
return b.last // offset > 64 KB
}
Question: Why does this case need an explicit nil check?
Solution
**Discussion**: The first 64 KB of the address space (or whatever `mmap_min_addr` reserves) is unmapped. A field at offset less than 64 KB always falls in this protected region when the pointer is nil — the load itself faults. But for offset > 64 KB (e.g., 70 KB), the address `nil + 70 KB = 0x11800` could land in mapped memory. If by chance another mapping occupies that region, the load would silently succeed and read garbage. The compiler must insert an explicit check: **Verify**: You'll see the explicit check. **Key insight**: Large struct field offsets force explicit checks. Small structs benefit from MMU-level free checking.Exercise 10 (Hard) — Cache-Friendly Nil Filtering¶
Problem:
func sumValues(items []*Item) int {
sum := 0
for _, it := range items {
if it != nil {
sum += it.Value
}
}
return sum
}
Question: How can you optimize for branch-prediction with mostly-non-nil data?
Solution
**Discussion**: If `items` rarely contains nil, the branch predictor handles `if it != nil` well. If nils are frequent and unpredictable (e.g., 30%), the branch is less predictable and costs more cycles. **Optimization 1** — pre-filter once:items = items[:0:cap(items)]
for _, it := range source {
if it != nil {
items = append(items, it)
}
}
// Now sum without checks
sum := 0
for _, it := range items {
sum += it.Value
}
Bonus Exercise (Hard) — Profile and Tune Nil-Heavy Code¶
Problem: A service shows high CPU in functions doing chained pointer access. How do you investigate and improve?
Solution
**Step 1** — profile: Look for hot lines that include field accesses on possibly-nil pointers. **Step 2** — check for unnecessary checks: Identify checks the compiler couldn't elide. **Step 3** — restructure: - Add a single nil guard at function entry; the compiler propagates. - Replace `*T` chains with embedded structs or value types. - Pre-filter slices of pointers. **Step 4** — measure with `-benchmem`: Compare allocations before and after. **Step 5** — verify with assembly: Ensure the inner loop has minimal nil-check overhead. **Key insight**: Profile first. The compiler is good at nil checks; the wins come from data structure changes, not micro-optimization of individual checks.Bonus Exercise 2 (Hard) — nilcheck.go Deep Dive¶
Problem: Read cmd/compile/internal/ssa/nilcheck.go in the Go source. What invariants does it preserve?