Go Variadic Functions — Optimize¶
Instructions¶
Each exercise presents an inefficient or wasteful use of variadic functions. Identify the issue, write an optimized version, and explain the improvement. Always benchmark before and after. Difficulty: 🟢 Easy, 🟡 Medium, 🔴 Hard.
Exercise 1 🟢 — Pre-allocate Concat Output¶
Problem: A function concatenates multiple slices using append in a loop.
func concat(groups ...[]int) []int {
var out []int
for _, g := range groups {
out = append(out, g...)
}
return out
}
Question: How can you reduce allocations?
Solution
**Issue**: `append` on a nil starting slice triggers ~log2(N) reallocations as the result grows. For 10 groups totaling 100k items, that's ~17 allocations and copies. **Optimization** — count first, allocate once: **Benchmark** (10 groups × 10k ints): - Naive: ~250 µs/op, 800 KB/op, 17 allocs/op - Pre-allocated: ~80 µs/op, 800 KB/op, 1 alloc/op **This is exactly what `slices.Concat` does** (Go 1.21+): **Key insight**: When you know the final size, always pre-allocate. `append`'s amortized growth is wasteful when you can predict the total.Exercise 2 🟢 — Avoid ...any for Typed Logging¶
Problem: A logging helper takes ...any.
func logf(format string, args ...any) {
fmt.Printf(format+"\n", args...)
}
// Hot path:
// for _, ev := range events {
// logf("processed %d items in %dms", ev.Count, ev.DurMs)
// }
Question: What allocations occur, and how do you eliminate them?
Solution
**Issue**: Each `logf` call boxes `ev.Count` and `ev.DurMs` (both `int`) into `any`. For ints in the staticuint64s pool (0-255) this is free; for larger ints it's an allocation each. **Optimization** — provide typed variants:type Field struct {
Key string
Int64 int64
Str string
Type fieldType // tInt64 | tStr ...
}
func IntField(k string, v int) Field { return Field{Key: k, Int64: int64(v), Type: tInt64} }
func StrField(k, v string) Field { return Field{Key: k, Str: v, Type: tStr} }
func info(msg string, fs ...Field) {
// ... format using typed fields directly ...
}
// Hot path:
for _, ev := range events {
info("processed", IntField("count", ev.Count), IntField("ms", ev.DurMs))
}
Exercise 3 🟡 — Spread Defensive Copy When Not Needed¶
Problem: A function defensively copies the spread input even though it only reads it.
func sum(xs ...int) int {
local := append([]int(nil), xs...) // unnecessary copy
total := 0
for _, x := range local {
total += x
}
return total
}
Question: When is the defensive copy needed and when is it wasteful?
Solution
**Issue**: This function only reads `xs`. The defensive copy allocates a new slice every call — pure waste. **Optimization** — read directly: **When you DO need defensive copy**: - The function stores the slice past the call (`s.buf = xs`). - The function returns a slice that should be independent of the input. - The function passes the slice to a goroutine that outlives the call. **Benchmark** (1k ints per call, 1M calls): - With unnecessary copy: ~3.5 µs/op, 8 KB/op, 1 alloc/op - Without copy: ~0.4 µs/op, 0 B/op, 0 allocs/op **Key insight**: Defensive copy has a real cost. Use it deliberately, only when storing or crossing concurrency boundaries.Exercise 4 🟡 — Spread vs Literal in a Hot Path¶
Problem: A hot loop calls a variadic with the same literal args each iteration.
Question: Is the implicit slice constructed N times? How would you avoid that?
Solution
**Issue**: Yes, the compiler builds a fresh implicit slice on each call. For literal args this slice is typically stack-allocated, so the cost is small but non-zero (~3 ns per call). **Optimization** — build the slice once: Now the spread form passes the existing slice header — no per-iteration construction. **Benchmark** (10M iterations, 5 ints): - Literal each call: ~30 ms total (~3 ns/op) - Pre-built and spread: ~10 ms total (~1 ns/op) This is a micro-optimization that matters only when: - The variadic call is inside a tight loop (>100M calls/sec). - The args are constant across iterations. **Key insight**: Hoisting the slice out of the loop converts N implicit constructions into one. For very hot loops this is measurable.Exercise 5 🟡 — Forwarding Allocates Unnecessarily¶
Problem: A wrapper rebuilds args instead of forwarding:
Question: What's wrong, and how do you fix it?
Solution
**Issue**: The wrapper allocates a fresh `[]any` slice and copies elements, only to spread it back. The receiving `inner` function will see the same elements as if `wrap` had just done `inner(args...)`. **Optimization** — forward directly: **Benchmark** (3 args, 1M iterations): - Rebuild + spread: ~80 ns/op, 48 B/op, 1 alloc/op - Direct spread: ~15 ns/op, 0 B/op, 0 allocs/op **The only reason to rebuild** is if the wrapper needs to mutate or filter elements: This in-place compaction reuses `args`'s backing array. **Key insight**: When forwarding unchanged, just spread. Defensive copy or rebuild only when transforming.Exercise 6 🟡 — Generic Variadic Avoiding ...any¶
Problem: A library function uses ...any for flexibility:
Question: How do generics improve this?
Solution
**Issue**: `...any` boxes each arg. Calling `first(1, 2, 3)` allocates 3 boxed ints (or uses the static pool for small ints). **Optimization** — generic variadic (Go 1.18+): **Benchmark** (3 ints, 10M iterations): - `first(1, 2, 3)` via `...any`: ~85 ns/op, 32 B/op, 3 allocs/op - `First(1, 2, 3)` (generic): ~3 ns/op, 0 B/op, 0 allocs/op The generic version inlines and stays on the stack. **Caveat**: generic variadic with no args fails type inference: `First()` needs `First[int]()` explicitly. **Key insight**: Generics + variadic = same flexibility without boxing. Migrate `...any` APIs to generics where possible.Exercise 7 🟡 — Spread Slice That Will Be Mutated¶
Problem: A consumer reuses a slice across calls:
buf := make([]int, 0, 1024)
for _, ev := range events {
buf = buf[:0]
buf = append(buf, ev.Items...)
process(buf...) // BUG?
}
Question: Is process(buf...) safe? How do you make it efficient AND safe?
Solution
**Issue**: If `process` retains `buf` past its call (stores it, hands to a goroutine), the next iteration's `buf = buf[:0]` and `append` will corrupt the retained data. **Optimization with safety**: **Case A — `process` doesn't retain the slice**:buf := make([]int, 0, 1024)
for _, ev := range events {
buf = buf[:0]
buf = append(buf, ev.Items...)
process(buf...) // SAFE if process is purely transient
}
Exercise 8 🔴 — Pool the Variadic Slice¶
Problem: fmt-style helper allocates a fresh []any per call.
func myPrintf(format string, args ...any) {
// ... format args into a buffer ...
_ = format; _ = args
}
Question: How would zap-style libraries pool the args slice?
Solution
**Optimization** — `sync.Pool` for the args buffer:var argsPool = sync.Pool{
New: func() any { return make([]any, 0, 8) },
}
func myPrintf(format string, args ...any) {
buf := argsPool.Get().([]any)
defer func() {
// CRITICAL: clear references so GC can reclaim
for i := range buf {
buf[i] = nil
}
argsPool.Put(buf[:0])
}()
buf = append(buf, args...)
// ... format using buf ...
}
Exercise 9 🔴 — Verify Implicit Slice Stays on Stack¶
Problem: You have a typed variadic helper and want to confirm zero allocations.
type Tag struct{ Key, Value string }
func emit(metric string, tags ...Tag) {
// ... ship metric ...
_ = metric; _ = tags
}
// Hot:
// for i := 0; i < N; i++ {
// emit("hits", Tag{"path", "/users"}, Tag{"status", "200"})
// }
Task: Show how to verify the variadic slice doesn't escape.
Solution
**Step 1 — escape analysis**: Expected output (something like): **Step 2 — benchmark with `-benchmem`**: If you see `0 B/op, 0 allocs/op`, the implicit slice is stack-allocated. **If allocs appear**, `emit` is retaining the slice somehow: - It stores `tags` in a struct field, channel, or global. - It passes `tags` to a goroutine. - It captures `tags` in an escaping closure. To force stack allocation, ensure `emit`'s body doesn't escape `tags`. E.g.: - Convert each tag to a `string` immediately. - Process inline; don't store. **Key insight**: Escape analysis is deterministic — verify with `-gcflags="-m"` rather than guess. Once you see "does not escape," the variadic is free.Exercise 10 🔴 — Variadic + PGO¶
Problem: A Sort function takes a comparator via ...func:
func sortWith(s []int, less ...func(a, b int) bool) {
if len(less) == 0 {
sort.Ints(s)
return
}
sort.Slice(s, func(i, j int) bool { return less[0](s[i], s[j]) })
}
Question: PGO can devirtualize through a variadic of function values. Show the workflow.
Solution
**Issue**: `less[0]` is an indirect call inside a hot sort loop. The compiler cannot inline it because the function value is unknown. **Optimization with PGO** (Go 1.21+): 1. **Capture a profile** in production (or representative load test):import (
"os"
"runtime/pprof"
)
f, _ := os.Create("default.pgo")
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
// Run workload
Bonus Exercise 🔴 — Construct vs Reuse Implicit Slice¶
Problem: You measure that sum(1, 2, 3) calls allocate in production. Why might that happen?