Profiling Concurrent Go Code — Practical Tasks¶
Table of Contents¶
- How to Use This Page
- Task 1 — Hello, Mutex Profile
- Task 2 — Reading a Block Profile
- Task 3 — Goroutine Snapshot Triage
- Task 4 — Your First runtime/trace
- Task 5 — Diff Two Mutex Profiles
- Task 6 — Lines vs Functions
- Task 7 — Goroutine Labels
- Task 8 — Trace Tasks and Regions
- Task 9 — Worker Pool Profile Slicing
- Task 10 — Triage Script
- Task 11 — Continuous Profiling Lab
- Task 12 — Concurrency Health Dashboard
- Task 13 — Profile Capture in CI
- Self-Assessment
How to Use This Page¶
Each task is a small self-contained lab. Most run on a laptop with Go 1.22+ and graphviz installed (for the pprof web UI). Tasks marked "infra" want a docker daemon. Solutions are intentionally not provided — you should solve them.
Task 1 — Hello, Mutex Profile¶
Goal. Capture your first mutex profile.
Steps.
- Create
main.go:
package main
import (
"fmt"
"log"
"net/http"
_ "net/http/pprof"
"runtime"
"sync"
"time"
)
type Counter struct {
mu sync.Mutex
n int
}
func (c *Counter) Inc() {
c.mu.Lock()
defer c.mu.Unlock()
time.Sleep(50 * time.Microsecond)
c.n++
}
func main() {
runtime.SetMutexProfileFraction(1)
go func() { log.Println(http.ListenAndServe("127.0.0.1:6060", nil)) }()
var c Counter
var wg sync.WaitGroup
for i := 0; i < 200; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for j := 0; j < 500; j++ {
c.Inc()
}
}()
}
wg.Wait()
fmt.Println(c.n)
select {}
}
go run main.go &curl -o mutex.prof http://127.0.0.1:6060/debug/pprof/mutexgo tool pprof mutex.prof(pprof) top
Expected outcome. main.(*Counter).Inc is at the top. The flat value is in seconds — that is the total wait time accumulated across goroutines during the window.
Stretch. Add granularity=lines. Confirm the Unlock line shows up.
Task 2 — Reading a Block Profile¶
Goal. Use the block profile to find a slow channel.
Steps.
- Write a producer/consumer where the consumer is the bottleneck:
- Enable:
runtime.SetBlockProfileRate(1). - Capture
/debug/pprof/block. go tool pprof -lines block.prof,top -cum.
Expected outcome. runtime.chansend1 at the producer's send line dominates.
Questions to answer.
- What does the cum column tell you?
- If you bump the channel capacity to 1000, how does the profile change?
Task 3 — Goroutine Snapshot Triage¶
Goal. Use the goroutine profile to identify a leak.
Steps.
- Write a function
leaky()that starts a goroutine which blocks on a channel that is never closed and never sent to. - Call
leaky()1000 times in a loop. - After the loop, capture
/debug/pprof/goroutine?debug=1. - Identify the stack with count ~1000.
Stretch. Capture ?debug=2 and find one of the leaked goroutines' wait reason and duration.
Task 4 — Your First runtime/trace¶
Goal. Capture and explore a 3-second trace.
Steps.
- Reuse the program from Task 1.
curl -o trace.out 'http://127.0.0.1:6060/debug/pprof/trace?seconds=3'go tool trace trace.out- Open the browser. Click "View trace."
- Zoom in on a 50 ms window. Identify your worker goroutines.
Questions to answer.
- Which goroutines are running concurrently?
- Where do you see
goparkmarkers? - Click "Goroutine analysis." Which function dominates
Sync blocktime?
Task 5 — Diff Two Mutex Profiles¶
Goal. Use -base to compare before/after.
Steps.
- From Task 1, capture
before.prof. - Modify the program: replace
c.mu.Lock()with a sharded approach (e.g., 16 counters, hash to one). - Re-run, capture
after.prof. go tool pprof -base before.prof after.proftop. Negative numbers should dominate.
Stretch. Plot the change as a flame graph: go tool pprof -http=:9090 -base before.prof after.prof.
Task 6 — Lines vs Functions¶
Goal. Discover what -lines reveals on a mutex profile.
Steps.
- Write a function that uses two different mutexes:
- Spawn many goroutines calling
opwith sharedaandb. - Capture the mutex profile.
- Run
go tool pprof mutex.profandtop. Thengranularity=linesandtopagain.
Expected outcome. Without -lines, both unlocks merge. With -lines, you see two separate sites.
Task 7 — Goroutine Labels¶
Goal. Slice a CPU profile by label.
Steps.
- Write an HTTP server with two endpoints:
/fast(1 ms work) and/slow(10 ms work). - Wrap each handler in
pprof.Do(ctx, pprof.Labels("endpoint", r.URL.Path), func(ctx context.Context) {...}). - Drive load against both endpoints (e.g.,
hey). - Capture
/debug/pprof/profile?seconds=10. go tool pprof profile.(pprof) tags. Then(pprof) tagfocus=endpoint=/slowandtop.
Expected outcome. Only /slow work appears.
Task 8 — Trace Tasks and Regions¶
Goal. Add user-defined tasks/regions and see them in go tool trace.
Steps.
- Pick an existing program (or write a small one with multi-step work).
- Wrap each handler invocation in
trace.NewTask. - Add
trace.StartRegionaround each logical step. - Capture a 3-second trace under load.
- In
go tool trace, open "User-defined tasks." Inspect one task's regions.
Stretch. Add trace.Logf to log a value at a specific point. Find it in the task's event timeline.
Task 9 — Worker Pool Profile Slicing¶
Goal. Label a worker pool so profiles slice by pool.
Steps.
- Build a worker pool: 4 workers consuming from a job channel.
- At the top of each worker's run loop, call
pprof.SetGoroutineLabels(pprof.WithLabels(ctx, pprof.Labels("pool", "main", "worker", strconv.Itoa(i)))). - Run the pool under load.
- Capture mutex and block profiles.
- Slice by
tagfocus=pool=mainand check the output.
Stretch. Add a second pool. Verify each pool's contention is separable.
Task 10 — Triage Script¶
Goal. Write a reusable snapshot script.
Steps.
- Write
go-snap(bash or any language) that, given a host:port: - Captures CPU profile (15 s), heap, goroutine, mutex, block, and trace (5 s).
- Writes all of them to a timestamped directory.
- Runs
waitso all curls run in parallel. - Run it against any service of yours.
- Verify all six files are present and non-empty.
Stretch. Add automated post-processing: open each profile in pprof, dump top 10 to a summary.txt.
Task 11 — Continuous Profiling Lab (infra)¶
Goal. Run Pyroscope or Parca locally and feed it a Go service.
Steps.
docker run -d -p 4040:4040 grafana/pyroscope(or use Parca's docker image).- Build a small Go service that uses the Pyroscope agent (
github.com/grafana/pyroscope-go) or exposes pprof for Parca to scrape. - Drive load.
- Open Pyroscope/Parca UI. Navigate to your service's CPU flame graph.
- Enable mutex profile in the service. Verify it appears in the UI.
Stretch. Add pprof.Do labels and confirm the UI's label filter shows them.
Task 12 — Concurrency Health Dashboard (infra)¶
Goal. Wire runtime/metrics to Prometheus and Grafana.
Steps.
- Use the
github.com/prometheus/client_golangpackage. - Register a collector that polls these
runtime/metrics: /sched/goroutines:goroutines/sched/latencies:seconds/gc/pauses:seconds/memory/classes/heap/objects:bytes- Expose
/metrics. Run a Prometheus + Grafana docker stack locally. - Build a four-panel dashboard.
Stretch. Add an alert for goroutine count growing 3× over the last hour.
Task 13 — Profile Capture in CI¶
Goal. Add automatic profile capture to a benchmark.
Steps.
- Write a benchmark that exercises a concurrent data structure.
- Run it with
-mutexprofile mu.prof -blockprofile bl.prof -cpuprofile cpu.prof. - In CI, persist the profiles as artefacts.
- Add a small Go program that compares the new mutex profile to a baseline (
-base) and fails CI if contention rose by more than X%.
Stretch. Maintain the baseline in a separate branch updated by humans only.
Self-Assessment¶
- I have captured all three concurrency profiles on at least one real program.
- I can open and navigate
go tool trace. - I have used
-baseto verify a fix. - I have used
-linesand seen the difference. - I have instrumented a handler with labels and tasks.
- I have run a continuous profiler locally.
- I have a working snapshot script.
- I have a CI hook that captures mutex/block profiles for benchmarks.