Memory Profiling in Go — Middle¶

1. A workflow you can repeat¶

A memory investigation almost always follows the same loop:

Reproduce the symptom under load. A profile of an idle server tells you nothing.
Capture two profiles — one near the start of the symptom, one near the peak.
Diff them (pprof -base) so you see what grew, not what's merely big.
Drill into the top growing site with list <fn> to find the offending lines.
Hypothesize a fix; rebuild; bench with -benchmem; re-profile.
Confirm the new profile no longer shows the site, or shows it dramatically smaller.

The steps that newcomers skip are 1 and 3. Without load there's nothing to see; without a diff you'll chase the biggest allocator even if it's been steady forever.

2. Four metrics, one shell¶

In an open pprof shell, you can switch metrics at any time:

(pprof) sample_index=inuse_space       # live bytes
(pprof) sample_index=inuse_objects     # live object count
(pprof) sample_index=alloc_space       # cumulative bytes
(pprof) sample_index=alloc_objects     # cumulative object count
(pprof) top

Or from the command line:

go tool pprof -inuse_space  heap.pb.gz
go tool pprof -alloc_objects heap.pb.gz

Switching between them in a single session is how you tell apart "lots of small allocations" (high alloc_objects, low per-object size) from "few large allocations that stick around" (low alloc_objects, high inuse_space). Both can crush a server, but they need different fixes.

3. `top`, `list`, `peek`, `web`¶

Four commands cover 95% of the work.

Command	What you learn
`top`	Ranked list of allocators (sort by `flat` by default)
`top -cum`	Same list sorted by cumulative (including callees)
`list <regex>`	Source view, allocations annotated per line
`peek <regex>`	All callers and callees of a function with their share
`web`	Opens an SVG callgraph in your browser
`tree`	Text version of the callgraph

(pprof) top
Showing nodes accounting for 220MB, 88.0% of 250MB total
Dropped 47 nodes (cum <= 1.25MB)
      flat  flat%   sum%        cum   cum%
   120MB 48.0% 48.0%    180MB 72.0%  myapp.(*Handler).process
    60MB 24.0% 72.0%     60MB 24.0%  encoding/json.(*decodeState).literalStore
    40MB 16.0% 88.0%     40MB 16.0%  bytes.makeSlice

flat is what this function allocated directly. cum includes everything its callees allocated. A function with high cum but low flat is a "manager" — the real work is in its children. Use peek to find them.

4. Interpreting `flat` vs `cum`¶

A common source of confusion. Consider:

func handle(r *Request) Response {
    raw := parseBody(r)     // allocates 1 MiB
    return decode(raw)      // allocates 500 KiB
}

The profile would show:

Function	flat	cum
`handle`	0	1.5 MiB
`parseBody`	1 MiB	1 MiB
`decode`	500 KiB	500 KiB

handle itself allocates nothing — it just calls things that do. If you only sort by flat, you miss the importance of handle. If you only sort by cum, every function near main looks important. You need both views; that's why top ships with -cum and peek ships at all.

5. The `pprof -base` diff¶

A single profile shows a snapshot. A diff shows the change:

go tool pprof -base baseline.pb.gz current.pb.gz
(pprof) top

Positive entries are sites that allocated more after the baseline. Negative entries allocated less. For a leak hunt, save a baseline early in the run and compare every 5–10 minutes:

curl -o t0.pb.gz http://localhost:6060/debug/pprof/heap
# ... wait for the leak to grow ...
curl -o t1.pb.gz http://localhost:6060/debug/pprof/heap
go tool pprof -base t0.pb.gz -http=:8080 t1.pb.gz

The flame graph of a diff is the single best leak-hunting tool the Go ecosystem ships.

6. The HTTP endpoints, with parameters¶

URL	Behavior
`/debug/pprof/heap`	Heap profile (default: `inuse_space`)
`/debug/pprof/heap?gc=1`	Forces a GC first; "cleaner" inuse counts
`/debug/pprof/heap?debug=1`	Text dump instead of protobuf
`/debug/pprof/heap?debug=2`	Even more verbose (per-record stacks)
`/debug/pprof/allocs`	Cumulative allocations since process start

?gc=1 is worth knowing. Without it, the profile may include garbage that simply hasn't been swept yet, which inflates inuse_space numbers in a noisy way. With it, the profile reflects only what survived a forced collection — far more meaningful for leak analysis. The trade-off: forcing a GC takes some milliseconds and briefly pegs CPU.

7. `-memprofile` and `-memprofilerate` in tests¶

go test -bench=. -benchmem -memprofile=mem.out -memprofilerate=1 ./pkg

Flag	Effect
`-memprofile=mem.out`	Write a heap profile when the test ends
`-memprofilerate=1`	Record every allocation (default is 512 KiB sampling)
`-benchmem`	Print `B/op` and `allocs/op` per benchmark
`-benchtime=10s`	Run longer for more stable numbers
`-count=10`	Repeat to compute variance with `benchstat`

For microbenchmarks, set -memprofilerate=1. The sampled default loses tiny allocations entirely, and microbenchmarks are exactly where you care about them. Don't ship that setting to production — it logs every allocation.

8. The shape of common allocation hotspots¶

Once you've seen these once, you'll recognize them in every profile.

Hotspot	What the stack looks like	Fix
`runtime.growslice`	A loop appending without `make([]T, 0, cap)`	Pre-size the slice
`runtime.mapassign`	A function inserting into a map without sizing	`make(map[K]V, n)`
`runtime.convT*`	Boxing a value into an `interface{}` per call	Pass a pointer; box once
`fmt.Sprintf` / `fmt.Sprintln`	String formatting in a tight loop	`strings.Builder`, `strconv.*`
`bytes.makeSlice`	`bytes.Buffer.Write` exceeding capacity	`buf.Grow(n)` upfront
`encoding/json.(*decodeState)`	Decoding into `map[string]interface{}`	Decode into a typed struct
`runtime.stringtoslicebyte`	`[]byte(s)` in a hot loop	`io.WriteString`, `unsafe.StringData` (advanced)
`runtime.mallocgc` near `make([]byte, ...)`	Allocating scratch buffers per request	`sync.Pool`

Memorize this table. When you open a profile, the first question to ask is "which of these does the top look like?" — most often the answer is one of them.

9. `pprof` flame graph reading¶

go tool pprof -http=:8080 heap.pb.gz

In the flame graph:

Visual cue	Meaning
Wide block	Many sampled bytes attributed there
Tall stack	Deep call chain
Top frames	Where the bytes actually came from
Bottom frames	Entry points (often `main`, `goroutine`, handlers)

Tips:

Click a frame to zoom into its subtree.
Switch metric in the upper-left dropdown to see alloc_objects vs inuse_space.
Hover for the full function name; long names get truncated.
Right-click → "Search" filters to frames matching a regex (useful for "show me only my package").

10. Capturing on a schedule¶

For a slow-leak hunt, take periodic snapshots:

mkdir -p /tmp/heaps
while true; do
    curl -s -o /tmp/heaps/heap-$(date +%s).pb.gz http://localhost:6060/debug/pprof/heap
    sleep 600
done

Then diff the oldest against the newest:

go tool pprof -base /tmp/heaps/heap-FIRST.pb.gz /tmp/heaps/heap-LAST.pb.gz

If the diff shows the same site growing across multiple hour intervals, that's the leak. This loop is the bare-bones version of what Pyroscope/Parca/Datadog do continuously.

11. Looking at `inuse_objects` deliberately¶

inuse_space shows bytes. inuse_objects shows count. For a profile dominated by a few large allocations, inuse_space is the right view. But when small structures dominate — millions of *Node in a tree, hundreds of thousands of map entries — inuse_objects reveals the work the GC actually does.

GC work is roughly proportional to object count, not byte total. A heap of one 1 GiB slice marks faster than a heap of one billion 1-byte objects. If your GCCPUFraction is high but the profile's inuse_space looks fine, switch to inuse_objects immediately.

12. When pprof and `MemStats` disagree¶

(pprof) top
Total: 250MB
runtime.MemStats.HeapAlloc = 80MB

This is normal. Three reasons:

Sampling. pprof extrapolates from samples; the result has variance.
alloc_* vs inuse_*. If you opened an allocs profile, you got the cumulative metric.
GC timing. MemStats is read at one instant; the profile was captured at a different one.

Rule: trust runtime.MemStats for absolute size; trust pprof for the distribution of allocations across call sites. They answer different questions.

13. Summary¶

The middle-level memory profiling workflow is: capture under load → diff two snapshots → drill in with top / list / peek → match the top stack against the table of common hotspots → fix → re-profile to confirm. Switch between inuse_* (leaks, steady state) and alloc_* (GC pressure, allocation rate) deliberately. Use pprof -http=:8080 for the flame graph and pprof -base for diffs — those two flags cover most real investigations.

Memory Profiling in Go — Middle¶

1. A workflow you can repeat¶

2. Four metrics, one shell¶

3. top, list, peek, web¶

4. Interpreting flat vs cum¶

5. The pprof -base diff¶

6. The HTTP endpoints, with parameters¶

7. -memprofile and -memprofilerate in tests¶