Memory Management in Depth — Professional¶
1. The production framing¶
In a real service, memory management is not "tune GOGC". It's a budget you negotiate with the platform team and a steady stream of small decisions in code review. The professional job, roughly:
- Set a memory budget in your deployment manifest (cgroup, k8s limit, ECS memory) and make the runtime aware of it via
GOMEMLIMIT. - Bound allocation rate in hot paths — pools, preallocation, careful interface use.
- Continuously observe heap, GC CPU, allocation rate, and goroutine count.
- Detect and roll back memory regressions on each release — they are the single most common silent SLO killer.
- Have a runbook for OOM and "RSS climbing" incidents.
The rest of this file is what that looks like.
2. Setting GOMEMLIMIT correctly in containers¶
Pre-1.19 the standard trick was automaxprocs + heuristics. Now:
# kubernetes deployment, container spec
resources:
limits:
memory: 2Gi
env:
- name: GOMEMLIMIT
value: "1800MiB" # ~90% of the cgroup limit
- name: GOGC
value: "100" # leave default unless you have data
Why 90%, not 100%? Because GOMEMLIMIT accounts for the Go runtime, but the process also uses memory you can't see — cgo allocations, mmap'd files via syscall, third-party C libraries. Reserve ~10% headroom.
For projects that can't change deployment, the go.uber.org/automemlimit library reads the cgroup limit at startup and sets GOMEMLIMIT for you. Acceptable in mature codebases; preferred over hand-tuning.
3. The four numbers your dashboards need¶
Build one panel per service with:
| Metric | Source | Alert when |
|---|---|---|
| RSS (process) | /proc/self/status or container metric | Trending up over 24h |
| Go live heap | runtime/metrics /gc/heap/live:bytes | Steady-state climbs > 2× weekly baseline |
| GC CPU fraction | /cpu/classes/gc/total:cpu-seconds (rate) | > 20% sustained |
| Goroutine count | runtime.NumGoroutine() | Monotonic growth, no plateau |
The two pairs distinguish four failures:
- RSS up, heap flat → idle pages retained (cosmetic, but tune
MADV_DONTNEEDordebug.FreeOSMemory). - RSS up, heap up → real leak or growing working set.
- GC CPU up, heap flat → allocation rate up; profile
-alloc_objects. - Goroutines up → leaked goroutines; see 07-goroutine-lifecycle-leaks.
4. Exporting from runtime/metrics to Prometheus¶
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/collectors"
)
func init() {
prometheus.MustRegister(
collectors.NewGoCollector(
collectors.WithGoCollections(collectors.GoRuntimeMetricsCollection),
),
)
}
GoRuntimeMetricsCollection exports the modern runtime/metrics view, including pause histograms (go_gc_pauses_seconds) and per-class heap memory. The default collector (without WithGoCollections) is the legacy MemStats view — fine, but less expressive.
5. Multi-stage Dockerfile, memory-aware¶
FROM golang:1.24 AS build
WORKDIR /src
COPY go.* ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOFLAGS="-trimpath" \
go build -ldflags="-s -w" -o /out/server ./cmd/server
FROM gcr.io/distroless/static-debian12
COPY --from=build /out/server /server
ENV GOMEMLIMIT=900MiB GOGC=100
USER 65532:65532
ENTRYPOINT ["/server"]
Notes:
CGO_ENABLED=0strips the cgo runtime; smaller image, no glibc dep.-trimpathand-ldflags="-s -w"produce a smaller, reproducible binary.- The base is distroless
static: no shell, no libc, smaller attack surface. GOMEMLIMITis set in the image, but the deployment can override it.
6. sync.Pool in real handlers¶
var reqBufPool = sync.Pool{
New: func() any { return new(bytes.Buffer) },
}
func handle(w http.ResponseWriter, r *http.Request) {
buf := reqBufPool.Get().(*bytes.Buffer)
defer func() {
if buf.Cap() < 64<<10 { // drop oversized buffers
buf.Reset()
reqBufPool.Put(buf)
}
}()
if _, err := io.Copy(buf, r.Body); err != nil {
http.Error(w, err.Error(), 400)
return
}
// ... process buf ...
}
Three rules:
- Always
Resetbefore putting back. Otherwise you leak references through the pooled object. - Drop oversized values. Otherwise one 64 MiB request pins 64 MiB until the next GC cycle.
- Type-assert exactly once. If callers do
.(*bytes.Buffer)everywhere, the type is part of the contract — encode it.
7. Allocation budgets in benchmarks¶
func BenchmarkHandler(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
_ = handle(req, resp)
}
}
Then enforce via CI:
A PR that takes BenchmarkHandler from 12 allocs/op to 24 allocs/op is a regression, even if latency is unchanged. The 2× allocation hits you in GC CPU under load, not in the microbenchmark. Treat alloc count as a tracked metric.
8. Memory regression detection in production¶
Two practical patterns:
Canary diff. Deploy to 1% of traffic for an hour. Compare live_heap, gc_cpu, and goroutines against the previous version. Block the rollout if any of the three rises more than (say) 15%.
Steady-state probe. A small synthetic-load goroutine runs the same N requests in a loop, every hour, and snapshots runtime/metrics. Plot week-over-week. Slow leaks show up here before customers do.
Both are cheap to build and worth their cost during the first year of a service's life.
9. The "RSS climbing" runbook¶
When the on-call dashboard shows RSS up but live heap flat:
- Hit
/debug/pprof/heapand/debug/pprof/heap?debug=2(in-use bytes + allocation graph). - Compare
HeapInusevsHeapAlloc. A wide gap = lots of partially used spans = fragmentation. A narrow gap = the runtime simply hasn't returned pages yet. - Check
HeapReleasedagainst time-since-restart. Slow returns are normal withMADV_FREE. - If you must, call
debug.FreeOSMemory()from a maintenance endpoint (/admin/free-os) — but log it loudly. Then start a ticket to remove that endpoint. - If
HeapAllocis also growing, this is a leak. Capture two profiles 30 minutes apart,pprof -base old.pb.gz new.pb.gz, look at the diff.
10. Standard library footguns at scale¶
| Pattern | Hidden cost |
|---|---|
encoding/json with interface{} decoding | Many small allocations per field; consider json.RawMessage or codegen |
http.Request.Body not closed | Connection isn't returned to the pool, retained until GC |
fmt.Sprintf in hot paths | Allocates the format args slice, the formatter, the result |
regexp.MustCompile inside handlers | Allocates the regex every call; compile once at package init |
time.After in long-lived selects | Timer + goroutine + closure, not collected until fired |
context.WithTimeout without cancel | Leaks the timer and goroutine until expiry |
range over a large map for filtering | Allocates iteration state; prefer key lookup |
These are not micro-optimizations; they are the difference between a service that holds steady at 500 MiB and one that drifts to 5 GiB over a week.
11. Production-grade pprof endpoint¶
import _ "net/http/pprof"
import "net/http"
func startProfiler() {
mux := http.NewServeMux()
mux.HandleFunc("/debug/pprof/", pprof.Index)
mux.HandleFunc("/debug/pprof/heap", pprof.Index)
mux.HandleFunc("/debug/pprof/profile", pprof.Profile)
mux.HandleFunc("/debug/pprof/allocs", pprof.Index)
mux.HandleFunc("/debug/pprof/goroutine", pprof.Index)
go http.ListenAndServe("127.0.0.1:6060", mux) // localhost only
}
Two musts:
- Bind to localhost (or behind an admin auth).
pprofendpoints expose source-level info; never put them on a public port. - Don't import
net/http/pprofinto your main listener. It registers handlers onhttp.DefaultServeMux, which is then shared with whatever else you registered there. Use a dedicated mux on a dedicated port.
12. Capacity planning¶
For a server at steady state:
Rough numbers for a typical Go service:
- Fixed overheads: 30–80 MiB (runtime, libraries, mappings).
- Per-goroutine stack: 4–8 KiB after a few growths.
GOGC=100adds a 100% headroom over live.
If live_heap=400 MiB and goroutines=2000, expect peak RSS around 800 MiB + 16 MiB + 50 MiB ≈ ~870 MiB. Add 10–20% safety, deploy with a 1 GiB limit, set GOMEMLIMIT=900MiB.
13. When to give up and pre-allocate¶
For latency-sensitive code paths, the allocator's fast path is fast, but it's never free. If a hot loop must hit zero allocations:
- Use
sync.Poolfor variable-sized buffers. - Use slab arrays for fixed-size objects with bounded counts.
- Pass output slices in (
(*MyResult).Decode(buf, dst)), don't return new ones. - Avoid interfaces in the hot path; use generics or type-specific code.
- Audit with
-benchmemandpprof -alloc_objects.
Zero-alloc Go is achievable for kernels of routing, parsing, encoding code. It is not a goal for whole services.
14. Summary¶
Production memory management is budgeting plus observability. Set GOMEMLIMIT from the platform's memory limit, watch RSS / live heap / GC CPU / goroutines as four independent signals, gate releases on allocation regressions, and keep a runbook for the two failure modes (real leak vs. retained idle pages). Reach for sync.Pool and pre-allocation surgically in hot paths; let the runtime do its job everywhere else.
Further reading¶
GOMEMLIMIT& container memory: https://go.dev/doc/gc-guide#Memory_limitautomemlimit: https://github.com/KimMachineGun/automemlimit- Distroless images: https://github.com/GoogleContainerTools/distroless
- Prometheus Go collector: https://pkg.go.dev/github.com/prometheus/client_golang/prometheus/collectors