pprof Deep Dive — Tasks¶
Practical exercises. Do them in order. Each task is small enough for one focused session and produces an artifact (a profile file, a screenshot, a commit) you can compare with the answer.
Task 1: Your first profile¶
Write a tiny Go program with one CPU-heavy function (e.g., a slow Fibonacci, or sha256 of a long byte slice in a loop). Capture a 10-second CPU profile two ways:
- Via
runtime/pprofto a file. - Via
net/http/pprofover HTTP.
Open both with go tool pprof -http=:. Confirm they look identical.
Goal. Internalize the two collection paths.
Task 2: Reading flat vs. cum¶
Take Task 1's profile. Identify in the top output:
- One function with high
flatand lowcum(a leaf). - One function with low
flatand highcum(a wrapper or dispatcher). - One function where
flat ≈ cum(does its own work, no callees).
Write one sentence per function explaining what role it plays in your program.
Goal. Build correct intuition about the two columns.
Task 3: List with line granularity¶
Pick the hottest function from Task 1. Run:
Identify the single hottest line. Optimize it (rewrite the expression, pre-size a slice, avoid an allocation). Re-profile. Use -base=old.pb.gz new.pb.gz to confirm the line is now cheaper.
Goal. Connect a profile observation to a code change to a measurable diff.
Task 4: Heap profile in four views¶
Write a program that allocates a mix of short-lived and long-lived objects (e.g., a service that parses requests into structs, keeping a recent-history cache of 100 of them). Run for 60 seconds, then collect a heap profile.
Open with -http=: and switch through all four sample indices:
inuse_spaceinuse_objectsalloc_spacealloc_objects
For each, write down the top function and one sentence explaining why it dominates that view.
Goal. See firsthand how the four views answer different questions.
Task 5: Diff a real optimization¶
Take an open-source Go project (anything: go-redis, chi, your own). Identify a function and write a benchmark that exercises it heavily. Capture a CPU profile.
Apply one optimization (pre-size a slice, use sync.Pool, replace interface{} with a typed parameter). Re-capture.
Save a screenshot of the diff flame graph. Write a one-paragraph summary of the change.
Goal. Practice the optimize-measure-confirm loop with realistic code.
Task 6: Goroutine leak detection¶
Write a program that intentionally leaks goroutines (e.g., a worker pool whose done channel is never closed). Run it. Watch runtime.NumGoroutine() grow.
Capture both forms of the goroutine profile:
curl -o gr.pb.gz "http://localhost:6060/debug/pprof/goroutine"
curl -o gr2.txt "http://localhost:6060/debug/pprof/goroutine?debug=2"
Open the binary form in pprof; identify the leaking stack. Then open the text form and find the same stack. Note the line where the goroutine is parked.
Goal. Practice both diagnostic forms; understand when text is faster.
Task 7: Enable block and mutex profiling¶
Write a program with two goroutines contending on a sync.Mutex around a non-trivial critical section. Try to profile without enabling block/mutex first — confirm the profiles are empty.
Add:
Re-profile. Confirm both /block and /mutex now contain useful data. Run peek on the wait functions to find the contended mutex.
Goal. Internalize that block and mutex profiling are opt-in.
Task 8: Custom profile¶
Implement a custom profile for "currently held resources" in a small program — for example, open file handles or in-flight HTTP requests:
var inflight = pprof.NewProfile("inflight")
func handle(w http.ResponseWriter, r *http.Request) {
inflight.Add(r, 2)
defer inflight.Remove(r)
realWork(w, r)
}
Expose at /debug/pprof/inflight (custom handler). Hit the endpoint while requests are running. Open with pprof and see the call stacks of in-flight requests.
Goal. See how easy it is to extend the framework.
Task 9: Profile labels¶
Take a small HTTP server (Task 8's, or anything). Add middleware that wraps each handler with:
ctx := pprof.WithLabels(r.Context(), pprof.Labels("route", routePattern))
pprof.Do(ctx, pprof.Labels(), func(ctx context.Context) {
handler(w, r.WithContext(ctx))
})
Drive 60 s of mixed traffic against two routes. Capture a CPU profile. In the shell:
Compare. Confirm the slices look different.
Goal. Use labels to slice a profile.
Task 10: Label propagation across goroutines¶
Modify Task 9's handler so it spawns a worker via go func(){ ... }() that does most of the work. Re-profile and re-apply the same tagfocus. Notice that most cost is now unlabeled.
Fix by either:
- Wrapping the goroutine body in
pprof.Do(ctx, labels, ...). - Calling
pprof.SetGoroutineLabels(ctx)inside the goroutine.
Re-profile and confirm the tagfocus view is correct again.
Goal. Live the gotcha; remember it.
Task 11: Combine profiles from a "fleet"¶
Run the same program in three terminal sessions (or three processes on different ports). Hit them with traffic. Capture a 30 s CPU profile from each:
go tool pprof -seconds=30 -output=p1.pb.gz http://localhost:6061/debug/pprof/profile
go tool pprof -seconds=30 -output=p2.pb.gz http://localhost:6062/debug/pprof/profile
go tool pprof -seconds=30 -output=p3.pb.gz http://localhost:6063/debug/pprof/profile
Open the union:
Compare with a single 30 s profile. The union has ~3× the samples — the flame graph should look smoother.
Goal. Practice fleet aggregation.
Task 12: Read the raw protobuf¶
Skim the output. Find:
- The
sample_typelines. - One
sampleentry with its values and stack. - The function table.
Write a one-paragraph description of how the pieces fit together. (Read specification.md §9 if stuck.)
Goal. Demystify the format. Once you've seen it, every pprof feature makes more sense.
Task 13: Production-safe endpoint¶
In a small service, set up the pprof endpoint correctly:
- Bind to
127.0.0.1:6060only. - Use a dedicated mux (don't pollute
http.DefaultServeMux). - Set bounded block and mutex profile rates.
- Confirm with
curl http://localhost:6060/debug/pprof/that the index page loads from localhost but not from0.0.0.0:6060.
Write a one-paragraph rationale for each choice as if explaining it in code review.
Goal. Cement the production posture.
Bonus task: continuous profiling at home¶
Spin up Pyroscope locally (Docker image), point your toy service at it, generate load for 30 minutes. Open the Pyroscope UI; compare the two timestamps half an hour apart. Use the built-in diff view.
Then in your service:
import "github.com/grafana/pyroscope-go"
pyroscope.Start(pyroscope.Config{
ApplicationName: "demo",
ServerAddress: "http://localhost:4040",
})
This is the smallest possible continuous-profiling setup. Touch it once and you'll understand why teams adopt it.
Summary¶
Twelve tasks plus a bonus. Together they cover the same surface as the rest of the directory but force you to make the tool work rather than read about it. Save the profiles each task produces in a folder; you'll refer back to them later. The hardest task is #5 — finding a real change to a real codebase that the diff actually shows. That one is the closest to day-job pprof work.
Further reading¶
pprofinteractivehelpcommand- Profile format reference: https://github.com/google/pprof/blob/main/proto/profile.proto
runtime/pprofAPI: https://pkg.go.dev/runtime/pprof- Pyroscope quickstart: https://grafana.com/docs/pyroscope/latest/