Go Functions Basics — Optimize¶
Instructions¶
Each exercise presents a slow, allocation-heavy, or otherwise wasteful use of functions. Identify the issue, write an optimized version, and explain the improvement. Always benchmark before and after — go test -bench is your friend. Difficulty: 🟢 Easy, 🟡 Medium, 🔴 Hard.
Exercise 1 🟢 — Indirect Call in a Hot Loop¶
Problem: A function passes a callback through a function-typed parameter and is called millions of times per second.
func sumWith(xs []int, transform func(int) int) int {
total := 0
for _, x := range xs {
total += transform(x)
}
return total
}
func double(x int) int { return x * 2 }
// In hot path:
// _ = sumWith(data, double)
Question: Why might this be slower than necessary, and how do you fix it without changing the API?
Solution
**Issue**: Each call to `transform` is an **indirect call** through a function value. The compiler cannot inline `double` because it doesn't know what `transform` is at compile time. Each iteration costs ~3-5 cycles for the indirect call instead of 0-1 for an inlined `x * 2`. **Optimization** — write a direct version when the transform is fixed: **Benchmark** (1M ints): - `sumWith` (indirect callback): ~1.4 ms - `sumDoubled` (direct): ~0.4 ms (~3.5×) **When the API must stay generic** — devirtualize at the call site by storing the concrete function in a typed variable that the compiler can track, or use generics (Go 1.18+): Go 1.21+ devirtualizes some interface and function-typed calls when the concrete type is statically knowable. PGO (`go build -pgo=...`) extends this to hot indirect calls. **Key insight**: Function-typed parameters are great for flexibility but defeat inlining. In hot paths, prefer direct calls or generics.Exercise 2 🟢 — Closure Allocation in a Loop¶
Problem: A function is built fresh each iteration of a tight loop.
Question: What is the cost, and how do you fix it?
Solution
**Issue**: Each iteration creates a new closure value capturing `x`. If the closure escapes to `run`, it heap-allocates. Even if it doesn't escape, the per-iteration allocation pressure shows up in benchmarks. **Run** to confirm: **Optimization** — pass the value as an argument instead of capturing: Even better, lift the literal outside the loop so it's only allocated once: **Benchmark** (1M iterations, closure escapes): - Per-iteration closure: ~80 ns/op, 24 B/op, 1 alloc/op - Lifted closure with arg: ~10 ns/op, 0 B/op, 0 allocs/op **Key insight**: A closure capturing per-iteration data forces an allocation per iteration when it escapes. Capture nothing — pass data as arguments — and lift the closure outside the loop.Exercise 3 🟢 — defer in a Tight Loop¶
Problem: A function mu.Lock / mu.Unlock pair is wrapped with defer inside a million-iteration loop.
func bumpAll(items []int, mu *sync.Mutex, m map[int]int) {
for _, k := range items {
mu.Lock()
defer mu.Unlock() // BUG
m[k]++
}
}
Question: Two issues here. What are they, and how do you fix?
Solution
**Issues**: 1. `defer mu.Unlock()` runs at **function exit**, not at end of iteration. After the first iteration the mutex stays locked; the second iteration deadlocks. 2. Even if iteration didn't deadlock (hypothetically), defer in a loop is ~50 ns/iter and prevents open-coded defer optimization. **Fix** — explicit unlock per iteration, or split into a helper:// Option A: explicit unlock
func bumpAll(items []int, mu *sync.Mutex, m map[int]int) {
for _, k := range items {
mu.Lock()
m[k]++
mu.Unlock()
}
}
// Option B: helper function
func bumpAll(items []int, mu *sync.Mutex, m map[int]int) {
for _, k := range items {
bump(mu, m, k)
}
}
func bump(mu *sync.Mutex, m map[int]int, k int) {
mu.Lock()
defer mu.Unlock() // open-coded defer; ~1 ns
m[k]++
}
Exercise 4 🟡 — Returning a Pointer Forces Heap Allocation¶
Problem: A constructor returns a pointer to a small struct.
type Point struct{ X, Y float64 }
func newPoint(x, y float64) *Point {
return &Point{X: x, Y: y}
}
// In hot path:
// for i := 0; i < N; i++ {
// p := newPoint(float64(i), float64(i*2))
// consume(p)
// }
Question: Where does the Point live, and is there a cheaper alternative?
Solution
**Where it lives**: `&Point{...}` escapes to the heap because the function returns a pointer. Each call allocates ~16 bytes plus GC tracking metadata. Verify: **Optimization** — return a value when the type is small (≤ ~64 bytes): The `Point` lives on the caller's stack, and Go's register ABI passes/returns it efficiently in registers (typically X0, X1 for two float64s). **Benchmark** (1M iterations): - Return `*Point`: ~30 ns/op, 16 B/op, 1 alloc/op - Return `Point`: ~3 ns/op, 0 B/op, 0 allocs/op (~10×) **When to keep the pointer return**: when callers must mutate the struct, when the type embeds a mutex/lock, or when the struct is large and copying would dominate. **Key insight**: "Pointer = fast, value = slow" is wrong for small types. The register ABI makes value-typed returns very cheap; pointers force heap allocations.Exercise 5 🟡 — Boxing Through interface{}¶
Problem: A logging helper takes any, and is called in the hot path.
func log(msg string, fields ...any) {
// ... format msg with fields ...
_ = msg
_ = fields
}
// Hot path:
// for _, x := range data {
// log("processed", x.ID, x.Score)
// }
Question: Where do allocations come from?
Solution
**Issue**: Passing a value through `any` (== `interface{}`) **boxes** non-pointer-typed values. Each `int`, `float64`, `bool`, etc. allocates an interface header + value on the heap (typically 8-16 B for scalars). For `log("processed", x.ID, x.Score)` with two ints, that's 2 boxing allocations + 1 slice allocation for the `fields` variadic = 3 allocs per call. **Optimization** — typed APIs for hot paths: Or use a structured logger like `zap` / `zerolog` that avoids reflection for typed fields: **Benchmark** (1M iterations): - `log("processed", id, score)` via `any`: ~95 ns/op, 64 B/op, 3 allocs/op - `logTyped("processed", LogFields{...})`: ~12 ns/op, 0 B/op, 0 allocs/op **Key insight**: `any` parameters are extremely flexible but force boxing for non-pointer values. For high-frequency call sites, typed APIs eliminate hidden allocations.Exercise 6 🟡 — Method Value Allocation in a Loop¶
Problem: Inside a loop, the code passes a method value as a callback.
type Handler struct{ counter int }
func (h *Handler) Process(x int) { h.counter += x }
func runAll(h *Handler, data []int, fn func(int)) {
for _, x := range data {
fn(x)
}
}
// Caller:
// h := &Handler{}
// for i := 0; i < N; i++ {
// runAll(h, data, h.Process)
// }
Question: What allocates and how do you fix it?
Solution
**Issue**: `h.Process` is a **method value**. Creating a method value allocates a small `funcval` header that captures the receiver (`h`). This allocation happens at **every iteration** of the outer loop because `h.Process` is re-bound each call. Verify: **Optimization** — bind the method value once outside the loop: **Or** use a method expression and pass the receiver explicitly (no boxing): **Benchmark** (1M outer iterations × 100 inner): - `h.Process` re-bound each iter: ~120 ns/outer-op, 16 B/outer-op, 1 alloc/outer-op - Bound once outside: ~110 ns/outer-op, 0 allocs - Method expression: ~108 ns/outer-op, 0 allocs **Key insight**: Each binding of a method value is a tiny allocation. In hot loops, bind once, or use a method expression.Exercise 7 🟡 — Variadic Slice Allocation¶
Problem: A variadic function is called with no args inside a loop.
func event(name string, tags ...string) {
// ... emit ...
_ = name; _ = tags
}
for i := 0; i < N; i++ {
event("tick")
}
Question: Does this allocate?
Solution
**Surprisingly**: when called with **no variadic arguments**, Go passes a `nil` slice — no allocation. Let's verify: When called with **a few arguments**, Go allocates a small slice on the **caller's stack** (since Go 1.4 for known-small variadic counts): **The allocation appears** when the slice escapes (gets stored in a long-lived field, captured by an escaping closure, sent on a channel, etc.): **Optimization** — when you call variadic with a slice you already have, use the spread `...`: But beware: the callee may modify or hold references to the passed slice. **Key insight**: Variadic params are not inherently allocating. They allocate only when the slice's backing array escapes. Read `-gcflags="-m"` to verify.Exercise 8 🔴 — Inlining Blocked by defer¶
Problem: A small function with defer won't inline, even though it looks tiny.
func tryUpdate(m map[string]int, k string, v int) {
defer func() {
if r := recover(); r != nil {
// log
}
}()
m[k] = v
}
Question: Why doesn't this inline, and how do you fix it?
Solution
**Issue**: Functions containing `defer` were historically not inlinable. Since Go 1.20-1.22 the inliner has become more permissive, but `defer` with `recover` (especially from a closure) still often blocks inlining. Verify:go build -gcflags="-m -m" .
# cannot inline tryUpdate: function too complex: cost X exceeds budget 80
# (or specifically: contains a defer)
Exercise 9 🔴 — return &local Allocates Where a Caller Stack Frame Could Suffice¶
Problem:
func newPair(a, b int) *[2]int {
return &[2]int{a, b}
}
// Hot path:
for i := 0; i < N; i++ {
p := newPair(i, i+1)
consume(p)
}
Question: Verify this allocates, then propose a non-allocating alternative.
Solution
**Verification**: Per call: ~16 B (2 × 8-byte ints) on the heap. **Optimization** — return a value: `[2]int` is 16 B, well within the register ABI budget — passed via X0/X1 registers, no allocation. **Benchmark** (1M iters): - Return `*[2]int`: ~30 ns/op, 16 B/op, 1 alloc/op - Return `[2]int`: ~3 ns/op, 0 B/op, 0 allocs/op **When the caller writes a sink pattern**, you can also have the caller pre-allocate and the function fill in: This pattern avoids both the heap allocation and the return-value copy when `consume` doesn't need a fresh value each iteration. **Key insight**: Returning a pointer to a freshly-constructed value forces heap allocation. For small fixed-size types, return by value. For caller-controlled lifetime, take a pointer parameter to fill in.Exercise 10 🔴 — PGO-Sensitive Function¶
Problem: A function is called via an interface in 99% of invocations from a single concrete type, but the compiler treats every call as fully indirect.
type Doer interface {
Do(int) int
}
type RealDoer struct{ scale int }
func (r *RealDoer) Do(x int) int { return x * r.scale }
func runMany(d Doer, xs []int) int {
total := 0
for _, x := range xs {
total += d.Do(x) // indirect interface call
}
return total
}
// In production, called with *RealDoer 99% of the time:
// _ = runMany(real, data)
Question: How do you tell the compiler about the dominant concrete type so calls get inlined?
Solution
**Optimization 1 — PGO (Profile-Guided Optimization, Go 1.21+)**: 1. Capture a CPU profile from production: 2. Save profile as `default.pgo` next to `main.go`. 3. Build with PGO: The compiler sees that `d.Do` is dominantly `(*RealDoer).Do` and **devirtualizes** the call — inlining `RealDoer.Do` directly with a type-check fallback. **Optimization 2 — Manual specialization** (no PGO):func runManyReal(d *RealDoer, xs []int) int {
total := 0
for _, x := range xs {
total += d.Do(x) // direct call; inlinable
}
return total
}
func runMany(d Doer, xs []int) int {
if r, ok := d.(*RealDoer); ok {
return runManyReal(r, xs) // fast path
}
// generic fallback
total := 0
for _, x := range xs {
total += d.Do(x)
}
return total
}
Bonus Exercise 🔴 — Verify Inlining of a Hot Function¶
Problem: You wrote a small helper and want to confirm it inlines:
Task: Show the commands and output that prove clamp inlines into its callers.
Solution
# Step 1: see inlining decisions
go build -gcflags="-m -m" 2>&1 | grep clamp
# Expected output (truncated):
# ./main.go:N:6: can inline clamp with cost X as: ...
# ./main.go:M:N: inlining call to clamp
go build -gcflags="-S" 2>asm.txt
# Look at the caller's body — clamp's instructions appear inline,
# with no CALL to "main.clamp".