Profiling Concurrent Go Code — Practical Tasks¶

Table of Contents¶

How to Use This Page
Task 1 — Hello, Mutex Profile
Task 2 — Reading a Block Profile
Task 3 — Goroutine Snapshot Triage
Task 4 — Your First runtime/trace
Task 5 — Diff Two Mutex Profiles
Task 6 — Lines vs Functions
Task 7 — Goroutine Labels
Task 8 — Trace Tasks and Regions
Task 9 — Worker Pool Profile Slicing
Task 10 — Triage Script
Task 11 — Continuous Profiling Lab
Task 12 — Concurrency Health Dashboard
Task 13 — Profile Capture in CI
Self-Assessment

How to Use This Page¶

Each task is a small self-contained lab. Most run on a laptop with Go 1.22+ and graphviz installed (for the pprof web UI). Tasks marked "infra" want a docker daemon. Solutions are intentionally not provided — you should solve them.

go version            # 1.22 or newer
which dot             # for pprof web UI
docker version        # for tasks 11 and 12

Task 1 — Hello, Mutex Profile¶

Goal. Capture your first mutex profile.

Steps.

Create main.go:

package main

import (
    "fmt"
    "log"
    "net/http"
    _ "net/http/pprof"
    "runtime"
    "sync"
    "time"
)

type Counter struct {
    mu sync.Mutex
    n  int
}

func (c *Counter) Inc() {
    c.mu.Lock()
    defer c.mu.Unlock()
    time.Sleep(50 * time.Microsecond)
    c.n++
}

func main() {
    runtime.SetMutexProfileFraction(1)

    go func() { log.Println(http.ListenAndServe("127.0.0.1:6060", nil)) }()

    var c Counter
    var wg sync.WaitGroup
    for i := 0; i < 200; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for j := 0; j < 500; j++ {
                c.Inc()
            }
        }()
    }
    wg.Wait()
    fmt.Println(c.n)
    select {}
}

go run main.go &
curl -o mutex.prof http://127.0.0.1:6060/debug/pprof/mutex
go tool pprof mutex.prof
(pprof) top

Expected outcome. main.(*Counter).Inc is at the top. The flat value is in seconds — that is the total wait time accumulated across goroutines during the window.

Stretch. Add granularity=lines. Confirm the Unlock line shows up.

Task 2 — Reading a Block Profile¶

Goal. Use the block profile to find a slow channel.

Steps.

Write a producer/consumer where the consumer is the bottleneck:

ch := make(chan int) // unbuffered
// producer: 100k sends
// consumer: 10 ms per receive

Enable: runtime.SetBlockProfileRate(1).
Capture /debug/pprof/block.
go tool pprof -lines block.prof, top -cum.

Expected outcome. runtime.chansend1 at the producer's send line dominates.

Questions to answer.

What does the cum column tell you?
If you bump the channel capacity to 1000, how does the profile change?

Task 3 — Goroutine Snapshot Triage¶

Goal. Use the goroutine profile to identify a leak.

Steps.

Write a function leaky() that starts a goroutine which blocks on a channel that is never closed and never sent to.
Call leaky() 1000 times in a loop.
After the loop, capture /debug/pprof/goroutine?debug=1.
Identify the stack with count ~1000.

Stretch. Capture ?debug=2 and find one of the leaked goroutines' wait reason and duration.

Task 4 — Your First runtime/trace¶

Goal. Capture and explore a 3-second trace.

Steps.

Reuse the program from Task 1.
curl -o trace.out 'http://127.0.0.1:6060/debug/pprof/trace?seconds=3'
go tool trace trace.out
Open the browser. Click "View trace."
Zoom in on a 50 ms window. Identify your worker goroutines.

Questions to answer.

Which goroutines are running concurrently?
Where do you see gopark markers?
Click "Goroutine analysis." Which function dominates Sync block time?

Task 5 — Diff Two Mutex Profiles¶

Goal. Use -base to compare before/after.

Steps.

From Task 1, capture before.prof.
Modify the program: replace c.mu.Lock() with a sharded approach (e.g., 16 counters, hash to one).
Re-run, capture after.prof.
go tool pprof -base before.prof after.prof
top. Negative numbers should dominate.

Stretch. Plot the change as a flame graph: go tool pprof -http=:9090 -base before.prof after.prof.

Task 6 — Lines vs Functions¶

Goal. Discover what -lines reveals on a mutex profile.

Steps.

Write a function that uses two different mutexes:

func op(a, b *sync.Mutex) {
    a.Lock()
    ...
    a.Unlock()
    b.Lock()
    ...
    b.Unlock()
}

Spawn many goroutines calling op with shared a and b.
Capture the mutex profile.
Run go tool pprof mutex.prof and top. Then granularity=lines and top again.

Expected outcome. Without -lines, both unlocks merge. With -lines, you see two separate sites.

Task 7 — Goroutine Labels¶

Goal. Slice a CPU profile by label.

Steps.

Write an HTTP server with two endpoints: /fast (1 ms work) and /slow (10 ms work).
Wrap each handler in pprof.Do(ctx, pprof.Labels("endpoint", r.URL.Path), func(ctx context.Context) {...}).
Drive load against both endpoints (e.g., hey).
Capture /debug/pprof/profile?seconds=10.
go tool pprof profile. (pprof) tags. Then (pprof) tagfocus=endpoint=/slow and top.

Expected outcome. Only /slow work appears.

Task 8 — Trace Tasks and Regions¶

Goal. Add user-defined tasks/regions and see them in go tool trace.

Steps.

Pick an existing program (or write a small one with multi-step work).
Wrap each handler invocation in trace.NewTask.
Add trace.StartRegion around each logical step.
Capture a 3-second trace under load.
In go tool trace, open "User-defined tasks." Inspect one task's regions.

Stretch. Add trace.Logf to log a value at a specific point. Find it in the task's event timeline.

Task 9 — Worker Pool Profile Slicing¶

Goal. Label a worker pool so profiles slice by pool.

Steps.

Build a worker pool: 4 workers consuming from a job channel.
At the top of each worker's run loop, call pprof.SetGoroutineLabels(pprof.WithLabels(ctx, pprof.Labels("pool", "main", "worker", strconv.Itoa(i)))).
Run the pool under load.
Capture mutex and block profiles.
Slice by tagfocus=pool=main and check the output.

Stretch. Add a second pool. Verify each pool's contention is separable.

Task 10 — Triage Script¶

Goal. Write a reusable snapshot script.

Steps.

Write go-snap (bash or any language) that, given a host:port:
Captures CPU profile (15 s), heap, goroutine, mutex, block, and trace (5 s).
Writes all of them to a timestamped directory.
Runs wait so all curls run in parallel.
Run it against any service of yours.
Verify all six files are present and non-empty.

Stretch. Add automated post-processing: open each profile in pprof, dump top 10 to a summary.txt.

Task 11 — Continuous Profiling Lab (infra)¶

Goal. Run Pyroscope or Parca locally and feed it a Go service.

Steps.

docker run -d -p 4040:4040 grafana/pyroscope (or use Parca's docker image).
Build a small Go service that uses the Pyroscope agent (github.com/grafana/pyroscope-go) or exposes pprof for Parca to scrape.
Drive load.
Open Pyroscope/Parca UI. Navigate to your service's CPU flame graph.
Enable mutex profile in the service. Verify it appears in the UI.

Stretch. Add pprof.Do labels and confirm the UI's label filter shows them.

Task 12 — Concurrency Health Dashboard (infra)¶

Goal. Wire runtime/metrics to Prometheus and Grafana.

Steps.

Use the github.com/prometheus/client_golang package.
Register a collector that polls these runtime/metrics:
/sched/goroutines:goroutines
/sched/latencies:seconds
/gc/pauses:seconds
/memory/classes/heap/objects:bytes
Expose /metrics. Run a Prometheus + Grafana docker stack locally.
Build a four-panel dashboard.

Stretch. Add an alert for goroutine count growing 3× over the last hour.

Task 13 — Profile Capture in CI¶

Goal. Add automatic profile capture to a benchmark.

Steps.

Write a benchmark that exercises a concurrent data structure.
Run it with -mutexprofile mu.prof -blockprofile bl.prof -cpuprofile cpu.prof.
In CI, persist the profiles as artefacts.
Add a small Go program that compares the new mutex profile to a baseline (-base) and fails CI if contention rose by more than X%.

Stretch. Maintain the baseline in a separate branch updated by humans only.

Self-Assessment¶

I have captured all three concurrency profiles on at least one real program.
I can open and navigate go tool trace.
I have used -base to verify a fix.
I have used -lines and seen the difference.
I have instrumented a handler with labels and tasks.
I have run a continuous profiler locally.
I have a working snapshot script.
I have a CI hook that captures mutex/block profiles for benchmarks.