Memory Allocator — Optimization¶
1. How to use this file¶
Fifteen scenarios where code allocates more than it should — extra B/op, extra allocs/op, extra pressure on runtime.mallocgc and the GC behind it. Each entry has a Before (code + benchmark) and a collapsible After (optimized code + benchmark + why + trade-offs + when NOT).
Anchored at Go 1.23, amd64. Numbers are reproducible-shape — run go test -bench=. -benchmem -benchtime=2s on your hardware before quoting them. Allocator cost on the hot path comes from six things:
- Hitting
runtime.mallocgcper call instead of writing into a reused buffer. - Materializing values that the compiler could have kept on the stack.
- Boxing primitives into
interface{}and forcing them to escape. - Producing intermediate strings or byte slices the caller will immediately re-encode.
- Closures and
_deferrecords that escape because of a hot-loop construction. - Per-call construction of objects (regexp, encoders, errors) that should be package-level singletons.
Most wins remove one or more of those from a hot path. Reading order: Ex. 1, 4, 5 (the three you'll see every code review), then 11, 14 (defer / sentinel — the senior reviews flag those most), then any order. Reference the runtime source: runtime/malloc.go (allocation entry points), runtime/sizeclasses.go (the 67 size classes), runtime/mcache.go (per-P cache), runtime/mbarrier.go (write barriers that feed back into allocator cost).
2. Exercise 1 — fmt.Sprintf in a hot path¶
Difficulty: ★★☆☆☆
A log formatter builds the per-line prefix with fmt.Sprintf("[%s] %d ", level, ts). Each call: allocates a []byte for the result, boxes level and ts into any, formats via reflect-driven verbs.
func formatPrefix(level string, ts int64) string {
return fmt.Sprintf("[%s] %d ", level, ts)
}
func writeLog(w io.Writer, lvl string, ts int64, msg string) {
io.WriteString(w, formatPrefix(lvl, ts))
io.WriteString(w, msg)
}
Hint
`fmt.Sprintf` is a swiss army knife — flexible, reflect-driven, allocation-heavy. For known fixed shapes, format directly into a `[]byte` you control. Look at `strconv.AppendInt`, `strconv.AppendQuote`, and the `Append*` family in general — they take a destination slice and grow it in place.Solution
func appendPrefix(dst []byte, level string, ts int64) []byte {
dst = append(dst, '[')
dst = append(dst, level...)
dst = append(dst, ']', ' ')
dst = strconv.AppendInt(dst, ts, 10)
dst = append(dst, ' ')
return dst
}
func writeLog(w io.Writer, buf *[]byte, lvl string, ts int64, msg string) {
*buf = (*buf)[:0]
*buf = appendPrefix(*buf, lvl, ts)
*buf = append(*buf, msg...)
w.Write(*buf)
}
3. Exercise 2 — []byte(string) on every request¶
Difficulty: ★★★☆☆
The HTTP middleware reads r.Header.Get("Authorization") (a string), then hands it to a function expecting []byte via []byte(token). Each conversion allocates a fresh backing array — the runtime cannot share storage between immutable strings and mutable slices.
func validateToken(b []byte) bool { /* HMAC compare */ }
func authMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
token := r.Header.Get("Authorization")
if !validateToken([]byte(token)) {
http.Error(w, "unauthorized", 401)
return
}
next.ServeHTTP(w, r)
})
}
Hint
Two paths: change the callee to accept `string` (a `string` slice header is fine for read-only use of bytes), or, if you can guarantee the callee will not mutate and not retain past the call, use `unsafe.StringData` to share the backing array. The first is always correct; the second needs care.Solution
The clean fix: change the signature.func validateToken(s string) bool { /* HMAC compare reads s like bytes */ }
func authMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
token := r.Header.Get("Authorization")
if !validateToken(token) {
http.Error(w, "unauthorized", 401)
return
}
next.ServeHTTP(w, r)
})
}
4. Exercise 3 — string([]byte) for a slice you already own¶
Difficulty: ★★☆☆☆
A handler reads from bufio.Scanner.Bytes(), immediately converts to string, and passes through to logic that only reads the bytes. The conversion allocates: string([]byte) always copies because the scanner reuses its buffer between lines, so the string must own a stable copy.
func processLine(s string) { /* read-only */ }
func parseFile(r io.Reader) {
sc := bufio.NewScanner(r)
for sc.Scan() {
processLine(string(sc.Bytes()))
}
}
Hint
Two truths: (1) `string([]byte)` must copy when the slice may be mutated later. (2) When you only need read-only access, you don't need a `string` — change the callee to accept `[]byte`. If you really need a `string`, the compiler can sometimes elide the copy when the conversion is used immediately in a read-only context (map lookup, `strings.Contains`).Solution
Pass `[]byte` through. The string conversion only needs to happen at a real boundary (storing into a map key, returning across a public API).func processLine(b []byte) { /* read-only */ }
func parseFile(r io.Reader) {
sc := bufio.NewScanner(r)
for sc.Scan() {
processLine(sc.Bytes()) // sc reuses its buffer; processLine must not retain
}
}
5. Exercise 4 — += string concatenation in a loop¶
Difficulty: ★★☆☆☆
Building a CSV row by s += field + "," allocates a fresh string per iteration. With 50 fields, that's 50 allocations and 50 memcopies of the growing prefix.
func csvRow(fields []string) string {
var s string
for i, f := range fields {
if i > 0 { s += "," }
s += f
}
return s
}
Hint
`strings.Builder` exists for exactly this. It holds a `[]byte` internally, grows by doubling, and converts to `string` once at the end with no copy (it transfers ownership via `unsafe.String`). For fixed-size output, `make([]byte, 0, totalLen)` plus `append` is even tighter.Solution
~19× faster (with `Grow`), 1 alloc instead of 99. **Why faster:** `s += f` evaluates as `runtime.concatstring2` (see `runtime/string.go`), which allocates a new `[]byte` of size `len(s)+len(f)`, copies both into it, and returns a new string header. Iteration N copies N-1 characters of accumulated prefix — quadratic. `strings.Builder` holds one growing `[]byte`, calls `mallocgc` only when the buffer doubles, and at `String()` it reinterprets the buffer as a string via `unsafe.String` (no copy). `Grow` collapses the geometric growth into one allocation of the right size. **Trade-off:** Slight code growth (a `total` pre-pass), but mechanical. `strings.Builder` is not safe to share between goroutines — each goroutine needs its own. **When NOT:** Two- or three-fragment concatenations (`a + b + c`) — the compiler emits a single `concatstring3` call, one allocation, faster than a Builder. Concatenations involving stack-allocated intermediates that don't escape.6. Exercise 5 — append without a capacity hint¶
Difficulty: ★★☆☆☆
A function builds a result slice and appends incrementally. Without an initial cap, the slice grows geometrically (1, 2, 4, 8, 16, ...), each growth mallocgc's a new backing array and copies the contents over.
func mapInts(in []int, f func(int) int) []int {
var out []int
for _, x := range in {
out = append(out, f(x))
}
return out
}
Hint
You know the final length up front: `len(in)`. `make([]T, 0, n)` reserves capacity once. The 9 allocations come from geometric growth — 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 — 10 doublings, each calling the allocator and copying.Solution
func mapInts(in []int, f func(int) int) []int {
out := make([]int, 0, len(in))
for _, x := range in {
out = append(out, f(x))
}
return out
}
7. Exercise 6 — bytes.Buffer allocated per call¶
Difficulty: ★★★☆☆
A MarshalBinary method does var buf bytes.Buffer; ... return buf.Bytes(), nil. The buffer's internal []byte allocates fresh on every call. At 100k QPS, that's 100k buffers worth of garbage per second.
type Event struct { ID int64; Name string; Tags []string }
func (e *Event) MarshalBinary() ([]byte, error) {
var buf bytes.Buffer
binary.Write(&buf, binary.LittleEndian, e.ID)
buf.WriteString(e.Name)
for _, t := range e.Tags { buf.WriteString(t) }
return buf.Bytes(), nil
}
Hint
`sync.Pool` is built for this. Pool a buffer, reset on borrow, put back on return. The catch: the caller must not retain the returned `[]byte` past the put — pool reuse will overwrite it. Pair with `bytes.Clone` or write to caller-supplied buffer.Solution
Two approaches. First, accept a caller-supplied buffer (best when caller has one):func (e *Event) AppendBinary(dst []byte) []byte {
dst = binary.LittleEndian.AppendUint64(dst, uint64(e.ID))
dst = append(dst, e.Name...)
for _, t := range e.Tags { dst = append(dst, t...) }
return dst
}
var bufPool = sync.Pool{New: func() any { b := make([]byte, 0, 256); return &b }}
func (e *Event) MarshalBinary() ([]byte, error) {
bp := bufPool.Get().(*[]byte)
buf := (*bp)[:0]
buf = binary.LittleEndian.AppendUint64(buf, uint64(e.ID))
buf = append(buf, e.Name...)
for _, t := range e.Tags { buf = append(buf, t...) }
out := bytes.Clone(buf) // caller owns this; safe to keep
*bp = buf
bufPool.Put(bp)
return out, nil
}
8. Exercise 7 — Closure captures forcing escape¶
Difficulty: ★★★☆☆
A loop spawns goroutines, each one closing over the loop variable. The closure captures i and data by reference; both escape to the heap. Worse, the loop allocates a fresh closure each iteration because the captured variables differ.
func process(items []Item) []Result {
results := make([]Result, len(items))
var wg sync.WaitGroup
for i, it := range items {
wg.Add(1)
go func() { // captures i, it, results — all escape
defer wg.Done()
results[i] = handle(it)
}()
}
wg.Wait()
return results
}
Hint
Pass captured values as goroutine arguments. The closure no longer needs to capture anything heap-resident, and the compiler can stack-allocate the function value. Bonus: avoids the classic "all goroutines see the last `i`" bug in pre-Go 1.22 code.Solution
func process(items []Item) []Result {
results := make([]Result, len(items))
var wg sync.WaitGroup
for i, it := range items {
wg.Add(1)
go func(i int, it Item) { // args, not captures
defer wg.Done()
results[i] = handle(it)
}(i, it)
}
wg.Wait()
return results
}
9. Exercise 8 — Boxing int into interface{} (any)¶
Difficulty: ★★★☆☆
A generic logger takes ...any. Every integer passed becomes an iface header pointing to a heap-allocated int. The escape isn't visible at the call site — Println(uid, score) looks like passing two integers, but each turns into an allocation.
func log(args ...any) {
for _, a := range args {
fmt.Fprint(os.Stderr, a, " ")
}
fmt.Fprintln(os.Stderr)
}
func handle(uid int64, score int) {
log("user", uid, "score", score) // uid and score each allocate
}
Hint
Two paths: a typed logger that accepts known shapes (`slog.Info("event", "uid", uid)` uses typed attrs and skips the box for integer types), or a generic function that uses Go 1.18+ generics to keep the parameter typed. Generics often eliminate the box entirely.Solution
// slog avoids boxing for known scalar types via slog.Attr's typed representation.
import "log/slog"
func handle(uid int64, score int) {
slog.Info("event", "uid", uid, "score", score)
}
// For a custom hot-path logger, use generics for a typed write path.
type Loggable interface { ~int | ~int64 | ~string | ~float64 }
func logOne[T Loggable](key string, v T) {
// type switch on T compiles to a direct write per instantiation
}
10. Exercise 9 — Struct with mostly-nil pointer field¶
Difficulty: ★★★★☆
A Node struct has an optional *Metadata field. 90% of nodes have no metadata, but every node carries the pointer (8 B) and, when set, a separate heap object for the metadata. Two allocations: the node, the metadata.
type Metadata struct { CreatedAt time.Time; Author string }
type Node struct {
ID int64
Name string
Meta *Metadata // nil for 90% of nodes
}
func newNode(id int64, name string, author string) *Node {
return &Node{
ID: id, Name: name,
Meta: &Metadata{CreatedAt: time.Now(), Author: author},
}
}
Hint
For the *rare* case, separate it out completely — a side table keyed by node ID. For the *common* case (metadata exists but is small), inline a sentinel-value variant. The goal: one allocation per node, not two.Solution
Option A (side table — best when metadata is genuinely rare):type Node struct {
ID int64
Name string
}
type NodeStore struct {
nodes []Node
meta map[int64]Metadata // only populated for nodes that need it
}
func (s *NodeStore) Add(id int64, name string, author string) {
s.nodes = append(s.nodes, Node{ID: id, Name: name})
if author != "" {
s.meta[id] = Metadata{CreatedAt: time.Now(), Author: author}
}
}
11. Exercise 10 — time.After in a select loop¶
Difficulty: ★★★☆☆
A select loop times out per iteration with case <-time.After(t). time.After allocates a new *time.Timer every call. In a hot loop that spins waiting for work, the allocations stack up and the unused timers sit on the runtime's heap until they fire.
func consume(ctx context.Context, ch <-chan Work) {
for {
select {
case w := <-ch:
process(w)
case <-time.After(100 * time.Millisecond):
// periodic housekeeping
case <-ctx.Done():
return
}
}
}
Hint
`time.NewTimer` returns a timer you can `Reset` between iterations. One allocation up front, then reused. The catch: `Reset` requires the timer be drained or stopped — easy to mess up. Go 1.23 simplifies the semantics (no drain needed if `Reset`/`Stop` follows the fire), but on older Go you need to handle it.Solution
func consume(ctx context.Context, ch <-chan Work) {
t := time.NewTimer(100 * time.Millisecond)
defer t.Stop()
for {
if !t.Stop() {
select { case <-t.C: default: } // drain if already fired (pre-1.23)
}
t.Reset(100 * time.Millisecond)
select {
case w := <-ch:
process(w)
case <-t.C:
// housekeeping
case <-ctx.Done():
return
}
}
}
12. Exercise 11 — json.Marshal per message¶
Difficulty: ★★★☆☆
A streaming encoder calls json.Marshal(msg) for each outbound message and writes the result. Every call allocates a bytes.Buffer internally, a new []byte for the result, and walks the value via reflect.
func send(w io.Writer, msgs []Msg) error {
for _, m := range msgs {
b, err := json.Marshal(m)
if err != nil { return err }
if _, err := w.Write(b); err != nil { return err }
w.Write([]byte{'\n'})
}
return nil
}
Hint
`json.NewEncoder(w).Encode(v)` writes directly to `w` and adds the newline — but it still allocates its internal buffer per call. The bigger win: reuse the encoder across messages. Even bigger: use a code-generated marshaller (`easyjson`, `go-json`) that writes to a pooled buffer.Solution
func send(w io.Writer, msgs []Msg) error {
enc := json.NewEncoder(w) // reuses internal buffer across Encode calls
for _, m := range msgs {
if err := enc.Encode(m); err != nil { return err }
}
return nil
}
var bufPool = sync.Pool{New: func() any { return &bytes.Buffer{} }}
func send(w io.Writer, msgs []Msg) error {
buf := bufPool.Get().(*bytes.Buffer)
defer func() { buf.Reset(); bufPool.Put(buf) }()
for _, m := range msgs {
buf.Reset()
if err := m.MarshalJSON_into(buf); err != nil { return err } // generated
buf.WriteByte('\n')
if _, err := w.Write(buf.Bytes()); err != nil { return err }
}
return nil
}
13. Exercise 12 — regexp.MustCompile in function body¶
Difficulty: ★☆☆☆☆
A validator calls regexp.MustCompile(...) inside the function. Each call rebuilds the regex's NFA and DFA — microseconds of work and dozens of allocations for state tables.
func isValidSlug(s string) bool {
re := regexp.MustCompile(`^[a-z0-9]+(?:-[a-z0-9]+)*$`)
return re.MatchString(s)
}
Hint
Hoist to package level. Or, if compilation truly must be lazy (the pattern depends on first-use config), use `sync.OnceValue`.Solution
var slugRE = regexp.MustCompile(`^[a-z0-9]+(?:-[a-z0-9]+)*$`)
func isValidSlug(s string) bool { return slugRE.MatchString(s) }
// Lazy init when config-dependent:
var slugREOnce = sync.OnceValue(func() *regexp.Regexp {
return regexp.MustCompile(config.SlugPattern())
})
func isValidSlug(s string) bool { return slugREOnce().MatchString(s) }
14. Exercise 13 — Map keyed by struct with pointer fields¶
Difficulty: ★★★★☆
A dedup map is keyed by a struct {Tenant *Tenant; ID int64}. Each lookup hashes the struct, which involves hashing the pointer (fine) but also exposes the GC scan to the map's buckets — every bucket pointer in the map points to keys that point to Tenant objects, multiplying GC scan cost.
type Key struct {
Tenant *Tenant
ID int64
}
var seen = map[Key]struct{}{}
func dedup(t *Tenant, id int64) bool {
k := Key{Tenant: t, ID: id}
if _, ok := seen[k]; ok { return false }
seen[k] = struct{}{}
return true
}
BenchmarkPointerKey-8 5000000 320 ns/op 0 B/op 0 allocs/op // per lookup
// But GC scan time grows with map size — separately measurable.
Hint
Replace the pointer with a primitive identifier the tenant carries (`TenantID uint64`). The key becomes pure scalars — the map's buckets carry no pointers, the GC doesn't scan them, and equality is fixed-width int compare instead of pointer dereference.Solution
If you only have a `*Tenant` and the ID lives on it, hoist it once: ~1.8× faster on the lookup; the bigger win is the GC scan time of a large map (visible only at scale). **Why faster:** The hash is the same, but the *allocator's GC-side cost* differs. `runtime.mapassign_fast64` / `mapaccess2_fast64` exist for scalar keys — direct hash, direct memcmp on the bucket. Pointer-bearing keys fall to the generic path that requires write barriers when storing keys (each pointer write in a bucket invokes the GC's pointer-write barrier). At GC time, the runtime scans every bucket's key slot for the pointer key version; for scalar keys it skips the scan entirely (see `runtime/map.go` and the `noscan` flag on `hmap`). **Trade-off:** You must have a stable scalar identifier. If `Tenant` is identity-by-pointer (no ID), you have to invent one — autoincrement at construction. Reusing IDs after deletion breaks correctness. **When NOT:** Small maps that GC visits in microseconds anyway. Maps whose pointer keys are themselves the GC roots — you'd still pay the scan elsewhere.15. Exercise 14 — defer in a tight loop¶
Difficulty: ★★★★☆
A loop opens a file, defers close, processes it. defer allocates a _defer record on the goroutine's stack (in modern Go) or heap (when the defer escapes — e.g. open-coded defers exceed 8 per frame). Even when stack-allocated, defer adds bookkeeping overhead per loop iteration.
func processAll(paths []string) error {
for _, p := range paths {
f, err := os.Open(p)
if err != nil { return err }
defer f.Close() // defers accumulate; close happens at function return
if err := process(f); err != nil { return err }
}
return nil
}
BenchmarkDeferInLoop-8 100000 12000 ns/op 480 B/op 2 allocs/op // 10 files, plus accumulated defers
// Worse: file handles stay open until the outer function returns.
Hint
The defer in a loop has two bugs: it allocates per iteration *and* it holds resources open longer than needed. Restructure so each iteration owns its file's lifetime — wrap the per-file work in a function (or IIFE) where the defer scope is the iteration, not the loop.Solution
func processAll(paths []string) error {
for _, p := range paths {
if err := processOne(p); err != nil { return err }
}
return nil
}
func processOne(p string) error {
f, err := os.Open(p)
if err != nil { return err }
defer f.Close() // single defer, scope = this call
return process(f)
}
16. Exercise 15 — errors.New per validation¶
Difficulty: ★★☆☆☆
Each validation failure constructs errors.New("amount must be positive"). errors.New allocates an *errorString on the heap every time. With a 5% rejection rate at 100k QPS, that's 5k allocs/s purely for errors.
func validateAmount(c int64) error {
if c <= 0 { return errors.New("amount must be positive") }
return nil
}
Hint
Hoist to a package-level sentinel. The error message is constant, the error value should be constant too. Bonus: callers can `errors.Is(err, ErrBadAmount)` for structured handling.Solution
~13× faster, zero allocations. **Why faster:** `errors.New(s)` is `&errorString{s}` — one allocation of a 16-B `errorString` per call, and the runtime treats it as a fresh value the GC must track. The sentinel is constructed once at package init; returning it is a single iface header copy of an already-existing pointer — no `mallocgc`. **Trade-off:** Sentinel loses any dynamic context (the actual bad value). Capture context separately in logs (`slog.Error("bad amount", "got", c, "err", ErrBadAmount)`) so the validator stays cheap. Callers comparing with `==` on the string break — use `errors.Is`. **When NOT:** Errors that genuinely carry distinct data (a `*ValidationError` with a slice of failing fields). Sentinel won't fit; pool the struct or accept the allocation since rejection paths are rare.17. Exercise 16 — Allocating result slice per call¶
Difficulty: ★★★☆☆
A function Tokenize(s string) []string allocates a fresh result slice every call. Callers calling it in a loop (tokenize each line of a log) burn through allocator capacity proportionally to line count.
func Tokenize(s string) []string {
var out []string
for _, f := range strings.Fields(s) {
out = append(out, f)
}
return out
}
func processLogs(lines []string) {
for _, l := range lines {
tokens := Tokenize(l)
analyze(tokens)
}
}
Hint
The append pattern from Ex. 1 generalizes: accept the destination as a parameter. The caller owns the buffer; reuse it across iterations by truncating with `dst[:0]` instead of reallocating.Solution
~6.6× faster, zero allocations after warmup. **Why faster:** The caller's `tokens` slice has its underlying array allocated once at the size of the largest line's token count. Subsequent calls reuse the same array — `tokens[:0]` keeps the header but resets the length; `append` writes into the existing storage as long as `len ≤ cap`. The callee never touches `mallocgc`. `strings.Fields` still allocates its return; if that's hot too, write your own scanning split that appends into `dst` directly. **Trade-off:** API friction — the parameter list grows, and the caller must remember to reset. Worse, if the caller forgets `[:0]`, the slice grows monotonically — silent leak. Tests should assert idempotence under repeated calls. **When NOT:** When the result outlives the caller (returned through a public API, stored in a struct). The destination-buffer pattern requires the caller to own the lifetime. Cold paths where the convenience of `[]string` return beats nanoseconds.18. Exercise 17 — runtime.SetFinalizer for resource cleanup¶
Difficulty: ★★★★☆
A wrapper around a C resource uses runtime.SetFinalizer to close it on GC. Every constructor sets a finalizer — and each finalizer registration allocates a *specialfinalizer (see runtime/mfinal.go) and pins the object until the GC sweeps it.
type Conn struct{ raw unsafe.Pointer }
func NewConn() *Conn {
c := &Conn{raw: C.open()}
runtime.SetFinalizer(c, func(c *Conn) { C.close_(c.raw) })
return c
}
func use() {
for i := 0; i < 100; i++ {
c := NewConn()
c.do()
// no explicit close — finalizer "eventually" handles it
}
}
Hint
Finalizers are a safety net, not a strategy. The fix is explicit `Close()` with `defer`. Keep the finalizer (or `runtime.AddCleanup` in Go 1.24+) as a backstop that logs a "you forgot to close me" warning — but don't rely on it.Solution
type Conn struct{ raw unsafe.Pointer }
func NewConn() *Conn { return &Conn{raw: C.open()} }
func (c *Conn) Close() {
if c.raw != nil {
C.close_(c.raw)
c.raw = nil
}
}
func use() {
for i := 0; i < 100; i++ {
c := NewConn()
defer c.Close() // close at function exit; or wrap in an inner func per Ex. 14
c.do()
}
}
19. Exercise 18 — Allocating wrapper struct around a primitive¶
Difficulty: ★★☆☆☆
A type-safe wrapper type UserID struct { v int64 } looks safer than raw int64, but if used as *UserID everywhere it forces heap allocation. The semantic gain isn't worth the alloc cost when the wrapper holds nothing but a primitive.
type UserID struct{ v int64 }
func NewUserID(v int64) *UserID { return &UserID{v: v} }
func handle(id *UserID) { /* ... */ }
func loop(ids []int64) {
for _, x := range ids {
handle(NewUserID(x)) // allocates per call
}
}
Hint
Use a named primitive type, not a wrapper struct. `type UserID int64` gives the same type safety (you can't pass a `int64` where `UserID` is required) with zero alloc cost — it's still a primitive at runtime.Solution
~23× faster, zero allocations. **Why faster:** `type UserID int64` is a compile-time distinction; at runtime, `UserID` *is* `int64` — same word, same register, no struct header, no heap. Passing `UserID` is one integer register. The wrapper struct, even with one field, forces the value through a pointer when accessed via `*UserID`; the constructor allocates, the dereference adds indirection. The compiler can sometimes stack-allocate the struct, but `*UserID` returned from a constructor and stored / passed through interfaces will escape. **Trade-off:** Named primitives can't carry methods that need to mutate state — methods on `UserID` work only with value receivers (no mutation) unless you make it `*UserID`, which reintroduces the problem. For pure ID types this is exactly the right shape. **When NOT:** When the type really has multiple fields or needs to grow ones (versioned identifier with epoch). When the type needs interface satisfaction with a pointer receiver. When the wrapper enforces invariants that need a constructor (validation, normalization) — then the wrapper struct earns its keep, but pass by value not pointer when small.20. When NOT to optimize¶
Allocator pressure dominates a CPU profile only when (a) the function runs on a hot path, and (b) the rest of the function's work is small. A request handler that hits the network for 50 ms doesn't care if it allocated 12 times along the way. Profile before chasing any of the above.
- Boot-time work (config parse, dependency injection) — readability wins.
- Cold paths (admin endpoints, debug dumps) —
fmt.Sprintfanderrors.Neware fine. - Tests and fixtures — clarity over speed.
- One-off CLI scripts — your future self reading the code is the bottleneck.
Profile first. Allocator overhead has these signatures in a CPU profile (look in pprof's allocation or top view):
runtime.mallocgcon a hot stack → Ex. 1, 5, 6, 17 (per-call buffer allocation).runtime.convT64/runtime.convT*→ Ex. 8 (boxing intoany).runtime.growslice→ Ex. 5 (append without cap).runtime.stringtoslicebyte/slicebytetostring→ Ex. 2, 3 (string/byte conversion).runtime.concatstrings→ Ex. 4 (+=concat).runtime.newobjectper loop iteration → Ex. 7, 18 (closure / wrapper struct).runtime.makemap→ Ex. 10 (per-call map).time.NewTimerin allocation profile → Ex. 11 (time.After).runtime.deferproc→ Ex. 14 (defer in loop on pre-1.14 or with > 8 defers).runtime.SetFinalizer/*specialfinalizer→ Ex. 17.
Common premature optimizations:
sync.Poolof objects so small thatmallocgc's tiny allocator already handles them in 10 ns.unsafe.StringDatato avoid[]byte(s)when the call site is cold.- Hand-rolled
strconvpaths whenfmt.Sprintfis called once per request. - Eliminating defer in a loop that runs 5 iterations.
- Code-generated marshallers for a structure that ships < 1k msgs/s.
Correctness gaps disguised as optimizations:
unsafe.StringDatapassed to a callee that mutates — corrupts an "immutable"string, breaks interning, breaks map keys.- Buffer returned from a
sync.Poolretained by the caller — next pool consumer overwrites their data. - Side-table metadata (Ex. 9) keyed by pointer when the pointer changes across migrations — orphaned entries leak.
- Sentinel error (Ex. 15) that carried mutable state — concurrent mutation hazard.
- Reset-and-reuse
time.Timerwithout correct drain — stale fire wakes the next iteration unexpectedly. - Destination-buffer parameter (Ex. 16) without
[:0]reset — slice grows forever, silent memory leak. - Removing a finalizer (Ex. 17) without auditing every code path for explicit close — handle leak under error returns.
- Generic instantiation explosion (Ex. 8) — binary size doubles, link time multiplies; measure before committing.
21. Summary¶
Always-ship wins (apply by default in any new code):
- Pre-size slices with
make([]T, 0, n)when the size is known (Ex. 5). - Package-level
regexp.MustCompile(Ex. 12). - Package-level sentinel errors for hot rejection paths (Ex. 15).
- Named primitive types instead of single-field wrapper structs (Ex. 18).
strings.Builder(ormake([]byte, 0, n) + append) for any concat in a loop (Ex. 4).- Explicit
defer Close()overruntime.SetFinalizer(Ex. 17). - Pass goroutine arguments instead of capturing in closures (Ex. 7).
Wins behind a profile (when measurements justify them):
Append*style APIs writing into caller buffers (Ex. 1, 16).sync.Poolfor buffers ≥ 256 B reused at high rate (Ex. 6).[]bytesignatures replacingstringfor read-only data — or vice versa (Ex. 2, 3).slogor generics to avoidanyboxing (Ex. 8).- Inline-or-side-table for mostly-nil pointer fields (Ex. 9).
time.NewTimer+Resetin tight loops (Ex. 11).json.Encoderreuse or generated marshallers (Ex. 12).- Scalar map keys when GC scan time matters (Ex. 13).
- Extract loop body to a function to avoid defer pile-up (Ex. 14).
Specialty (only when the design calls for it):
unsafe.StringData/unsafe.Slicefor zero-copy string-byte interop on audited paths.- Custom slab allocator backing a
[]Tpool for million-object parsers. runtime.MemStatswatchdog in tests that fails the build on heap regression.runtime.AddCleanup(Go 1.24+) overSetFinalizerfor the rare cases where weak cleanup is needed.go:nospliton tiny helpers that the allocator's hot path calls, when measured.
Allocator cost on the hot path comes from mallocgc being called when it shouldn't be — for buffers that could be reused, for conversions that could be elided, for sentinel-shaped objects that could live at package scope, for wrappers that gain nothing over a named primitive. Strip those by writing into caller-owned destinations, hoisting expensive constructions to init time, returning singletons instead of fresh allocations, and trusting escape analysis only after you've read its decisions (go build -gcflags='-m'). The 67 size classes in runtime/sizeclasses.go are fast for what they do; the wins come from not asking them to do anything at all.