Memory Management in Depth — Professional¶

1. The production framing¶

In a real service, memory management is not "tune GOGC". It's a budget you negotiate with the platform team and a steady stream of small decisions in code review. The professional job, roughly:

Set a memory budget in your deployment manifest (cgroup, k8s limit, ECS memory) and make the runtime aware of it via GOMEMLIMIT.
Bound allocation rate in hot paths — pools, preallocation, careful interface use.
Continuously observe heap, GC CPU, allocation rate, and goroutine count.
Detect and roll back memory regressions on each release — they are the single most common silent SLO killer.
Have a runbook for OOM and "RSS climbing" incidents.

The rest of this file is what that looks like.

2. Setting `GOMEMLIMIT` correctly in containers¶

Pre-1.19 the standard trick was automaxprocs + heuristics. Now:

# kubernetes deployment, container spec
resources:
  limits:
    memory: 2Gi
env:
  - name: GOMEMLIMIT
    value: "1800MiB"   # ~90% of the cgroup limit
  - name: GOGC
    value: "100"       # leave default unless you have data

Why 90%, not 100%? Because GOMEMLIMIT accounts for the Go runtime, but the process also uses memory you can't see — cgo allocations, mmap'd files via syscall, third-party C libraries. Reserve ~10% headroom.

For projects that can't change deployment, the go.uber.org/automemlimit library reads the cgroup limit at startup and sets GOMEMLIMIT for you. Acceptable in mature codebases; preferred over hand-tuning.

3. The four numbers your dashboards need¶

Build one panel per service with:

Metric	Source	Alert when
RSS (process)	`/proc/self/status` or container metric	Trending up over 24h
Go live heap	`runtime/metrics` `/gc/heap/live:bytes`	Steady-state climbs > 2× weekly baseline
GC CPU fraction	`/cpu/classes/gc/total:cpu-seconds` (rate)	> 20% sustained
Goroutine count	`runtime.NumGoroutine()`	Monotonic growth, no plateau

The two pairs distinguish four failures:

RSS up, heap flat → idle pages retained (cosmetic, but tune MADV_DONTNEED or debug.FreeOSMemory).
RSS up, heap up → real leak or growing working set.
GC CPU up, heap flat → allocation rate up; profile -alloc_objects.
Goroutines up → leaked goroutines; see 07-goroutine-lifecycle-leaks.

4. Exporting from `runtime/metrics` to Prometheus¶

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/collectors"
)

func init() {
    prometheus.MustRegister(
        collectors.NewGoCollector(
            collectors.WithGoCollections(collectors.GoRuntimeMetricsCollection),
        ),
    )
}

GoRuntimeMetricsCollection exports the modern runtime/metrics view, including pause histograms (go_gc_pauses_seconds) and per-class heap memory. The default collector (without WithGoCollections) is the legacy MemStats view — fine, but less expressive.

5. Multi-stage Dockerfile, memory-aware¶

FROM golang:1.24 AS build
WORKDIR /src
COPY go.* ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOFLAGS="-trimpath" \
    go build -ldflags="-s -w" -o /out/server ./cmd/server

FROM gcr.io/distroless/static-debian12
COPY --from=build /out/server /server
ENV GOMEMLIMIT=900MiB GOGC=100
USER 65532:65532
ENTRYPOINT ["/server"]

Notes:

CGO_ENABLED=0 strips the cgo runtime; smaller image, no glibc dep.
-trimpath and -ldflags="-s -w" produce a smaller, reproducible binary.
The base is distroless static: no shell, no libc, smaller attack surface.
GOMEMLIMIT is set in the image, but the deployment can override it.

6. `sync.Pool` in real handlers¶

var reqBufPool = sync.Pool{
    New: func() any { return new(bytes.Buffer) },
}

func handle(w http.ResponseWriter, r *http.Request) {
    buf := reqBufPool.Get().(*bytes.Buffer)
    defer func() {
        if buf.Cap() < 64<<10 {     // drop oversized buffers
            buf.Reset()
            reqBufPool.Put(buf)
        }
    }()

    if _, err := io.Copy(buf, r.Body); err != nil {
        http.Error(w, err.Error(), 400)
        return
    }
    // ... process buf ...
}

Three rules:

Always Reset before putting back. Otherwise you leak references through the pooled object.
Drop oversized values. Otherwise one 64 MiB request pins 64 MiB until the next GC cycle.
Type-assert exactly once. If callers do .(*bytes.Buffer) everywhere, the type is part of the contract — encode it.

7. Allocation budgets in benchmarks¶

func BenchmarkHandler(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i < b.N; i++ {
        _ = handle(req, resp)
    }
}

Then enforce via CI:

go test -bench=. -benchmem -run=^$ -count=10 ./... | tee new.txt
benchstat baseline.txt new.txt

A PR that takes BenchmarkHandler from 12 allocs/op to 24 allocs/op is a regression, even if latency is unchanged. The 2× allocation hits you in GC CPU under load, not in the microbenchmark. Treat alloc count as a tracked metric.

8. Memory regression detection in production¶

Two practical patterns:

Canary diff. Deploy to 1% of traffic for an hour. Compare live_heap, gc_cpu, and goroutines against the previous version. Block the rollout if any of the three rises more than (say) 15%.

Steady-state probe. A small synthetic-load goroutine runs the same N requests in a loop, every hour, and snapshots runtime/metrics. Plot week-over-week. Slow leaks show up here before customers do.

Both are cheap to build and worth their cost during the first year of a service's life.

9. The "RSS climbing" runbook¶

When the on-call dashboard shows RSS up but live heap flat:

Hit /debug/pprof/heap and /debug/pprof/heap?debug=2 (in-use bytes + allocation graph).
Compare HeapInuse vs HeapAlloc. A wide gap = lots of partially used spans = fragmentation. A narrow gap = the runtime simply hasn't returned pages yet.
Check HeapReleased against time-since-restart. Slow returns are normal with MADV_FREE.
If you must, call debug.FreeOSMemory() from a maintenance endpoint (/admin/free-os) — but log it loudly. Then start a ticket to remove that endpoint.
If HeapAlloc is also growing, this is a leak. Capture two profiles 30 minutes apart, pprof -base old.pb.gz new.pb.gz, look at the diff.

10. Standard library footguns at scale¶

Pattern	Hidden cost
`encoding/json` with `interface{}` decoding	Many small allocations per field; consider `json.RawMessage` or codegen
`http.Request.Body` not closed	Connection isn't returned to the pool, retained until GC
`fmt.Sprintf` in hot paths	Allocates the format args slice, the formatter, the result
`regexp.MustCompile` inside handlers	Allocates the regex every call; compile once at package init
`time.After` in long-lived selects	Timer + goroutine + closure, not collected until fired
`context.WithTimeout` without cancel	Leaks the timer and goroutine until expiry
`range` over a large map for filtering	Allocates iteration state; prefer key lookup

These are not micro-optimizations; they are the difference between a service that holds steady at 500 MiB and one that drifts to 5 GiB over a week.

11. Production-grade pprof endpoint¶

import _ "net/http/pprof"
import "net/http"

func startProfiler() {
    mux := http.NewServeMux()
    mux.HandleFunc("/debug/pprof/", pprof.Index)
    mux.HandleFunc("/debug/pprof/heap", pprof.Index)
    mux.HandleFunc("/debug/pprof/profile", pprof.Profile)
    mux.HandleFunc("/debug/pprof/allocs", pprof.Index)
    mux.HandleFunc("/debug/pprof/goroutine", pprof.Index)

    go http.ListenAndServe("127.0.0.1:6060", mux)  // localhost only
}

Two musts:

Bind to localhost (or behind an admin auth). pprof endpoints expose source-level info; never put them on a public port.
Don't import net/http/pprof into your main listener. It registers handlers on http.DefaultServeMux, which is then shared with whatever else you registered there. Use a dedicated mux on a dedicated port.

12. Capacity planning¶

For a server at steady state:

peak_rss ≈ live_heap × (1 + GOGC/100) + stacks × goroutines + fixed_overheads + cgo

Rough numbers for a typical Go service:

Fixed overheads: 30–80 MiB (runtime, libraries, mappings).
Per-goroutine stack: 4–8 KiB after a few growths.
GOGC=100 adds a 100% headroom over live.

If live_heap=400 MiB and goroutines=2000, expect peak RSS around 800 MiB + 16 MiB + 50 MiB ≈ ~870 MiB. Add 10–20% safety, deploy with a 1 GiB limit, set GOMEMLIMIT=900MiB.

13. When to give up and pre-allocate¶

For latency-sensitive code paths, the allocator's fast path is fast, but it's never free. If a hot loop must hit zero allocations:

Use sync.Pool for variable-sized buffers.
Use slab arrays for fixed-size objects with bounded counts.
Pass output slices in ((*MyResult).Decode(buf, dst)), don't return new ones.
Avoid interfaces in the hot path; use generics or type-specific code.
Audit with -benchmem and pprof -alloc_objects.

Zero-alloc Go is achievable for kernels of routing, parsing, encoding code. It is not a goal for whole services.

14. Summary¶

Production memory management is budgeting plus observability. Set GOMEMLIMIT from the platform's memory limit, watch RSS / live heap / GC CPU / goroutines as four independent signals, gate releases on allocation regressions, and keep a runbook for the two failure modes (real leak vs. retained idle pages). Reach for sync.Pool and pre-allocation surgically in hot paths; let the runtime do its job everywhere else.