runtime Package Deep — Practice Tasks¶
Twenty exercises to build muscle memory around the Go runtime, runtime/debug, runtime/pprof, runtime/trace, and runtime/metrics packages — the surface area you reach for when something goes wrong in production and the only artifact you have is a process. The goal is not to memorise every function; it is to learn which dial answers which question, what each one costs to read, and when reading it is the wrong move because the question is "the system is sick, why".
Each task gives a Goal, Difficulty, Skills, Setup, Steps, Acceptance criteria, folded Hints and a folded Reference solution with runnable Go 1.22+ code. Difficulty: Junior, Middle, Senior, Staff. Read the section README and senior.md first — most tasks below assume you already know that MemStats stops the world, that runtime/metrics does not, that finalizers run on a single dedicated goroutine, and that LockOSThread is the only correct answer to "cgo with thread-local state".
Task 1: Runtime snapshot function¶
Goal. Write a Snapshot() function that returns a struct with everything an operator wants from the process in one call: live goroutine count, CPU count, current GOMAXPROCS, and a curated slice of MemStats fields. Print it as one line so it can be tailed in a log.
Difficulty. Junior.
Skills. runtime.NumGoroutine, runtime.NumCPU, runtime.GOMAXPROCS, runtime.ReadMemStats, the cost of ReadMemStats.
Setup. Empty main package. No dependencies.
Steps.
- Define a
Snapshotstruct withGoroutines,CPUs,GOMAXPROCS,HeapAllocMB,HeapInUseMB,NumGC,GCPauseTotalMs. - Implement
Take() Snapshotthat fills it. - Implement
String() stringreturning a single line withkey=valuepairs. - From
main, spawn 1000 goroutines that block on a channel, then print the snapshot every 200ms three times.
Acceptance criteria.
Take()readsMemStatsonly once per call.- The goroutine count reflects the spawned blockers (1000 + main + GC + sysmon).
GOMAXPROCS(0)is used to read the current value, neverGOMAXPROCS(n)with a non-zero arg unless you actually want to change it.
Hints
- `runtime.GOMAXPROCS(0)` returns the current value without changing it. Passing any other number sets it — easy way to wreck a benchmark. - `runtime.ReadMemStats` is a stop-the-world call. Fine for a snapshot, not fine to call from a hot path. - Divide `HeapAlloc` by `1024*1024` for MB. Cast to `float64` first if you want one decimal place. - `GCPauseTotalNs` is the cumulative pause time across all GCs since process start. For a per-cycle view see `PauseNs[(NumGC+255)%256]` — the ring buffer trick is in `senior.md`.Reference solution
package main
import (
"fmt"
"runtime"
"time"
)
// Senior decision: keep the struct small and operator-readable. Forty
// MemStats fields are a debugging buffet; a snapshot is for at-a-glance
// triage. Anything beyond seven fields and the eye glazes over.
type Snapshot struct {
Goroutines int
CPUs int
GOMAXPROCS int
HeapAllocMB float64
HeapInUseMB float64
NumGC uint32
GCPauseTotalMs float64
}
func Take() Snapshot {
var m runtime.MemStats
// Senior decision: ReadMemStats stops the world. We pay it ONCE per
// snapshot and never inline it into a hot loop. If you need
// continuous metrics, runtime/metrics is the right tool (Task 11).
runtime.ReadMemStats(&m)
return Snapshot{
Goroutines: runtime.NumGoroutine(),
CPUs: runtime.NumCPU(),
GOMAXPROCS: runtime.GOMAXPROCS(0), // read-only when arg <= 0
HeapAllocMB: float64(m.HeapAlloc) / (1024 * 1024),
HeapInUseMB: float64(m.HeapInuse) / (1024 * 1024),
NumGC: m.NumGC,
GCPauseTotalMs: float64(m.PauseTotalNs) / 1e6,
}
}
func (s Snapshot) String() string {
return fmt.Sprintf(
"goroutines=%d cpus=%d gomaxprocs=%d heap_alloc_mb=%.2f heap_inuse_mb=%.2f num_gc=%d gc_pause_total_ms=%.2f",
s.Goroutines, s.CPUs, s.GOMAXPROCS,
s.HeapAllocMB, s.HeapInUseMB,
s.NumGC, s.GCPauseTotalMs,
)
}
func main() {
block := make(chan struct{})
for i := 0; i < 1000; i++ {
go func() { <-block }()
}
// Give the scheduler a beat to actually start the goroutines.
time.Sleep(10 * time.Millisecond)
for i := 0; i < 3; i++ {
fmt.Println(Take())
time.Sleep(200 * time.Millisecond)
}
close(block)
}
Extension. Add a PerCPU field that holds GOMAXPROCS values: number of goroutines locally runnable on each P. This requires runtime/metrics with /sched/goroutines/runnable:goroutines — preview Task 11.
Task 2: Walk the call stack with runtime.Caller¶
Goal. Write a WhoCalledMe() helper that prints the file, line, and function name of (a) itself, (b) its caller, and (c) its caller's caller. Use only runtime.Caller. Then write a WhoCalledMeAll() that walks the entire stack up to depth 32 using a loop.
Difficulty. Junior.
Skills. runtime.Caller, runtime.FuncForPC, frame depth conventions.
Setup. A small package with three nested functions a -> b -> c where c calls WhoCalledMe().
Steps.
- Implement
WhoCalledMe(). Callruntime.Caller(0),runtime.Caller(1),runtime.Caller(2)and print results. - Resolve
pcto a name withruntime.FuncForPC(pc).Name(). - Implement
WhoCalledMeAll(maxDepth int)loopingruntime.Caller(i)untilok == false. - Test from
c(). Print the resulting stack.
Acceptance criteria.
Caller(0)shows the line of the call itself (inside the helper).Caller(1)shows the direct caller (c).Caller(2)showsb.- The walking loop terminates cleanly at the bottom of the stack (typically at
runtime.main).
Hints
- `runtime.Caller` returns `(pc uintptr, file string, line int, ok bool)`. The `ok` is your loop exit. - `runtime.FuncForPC(pc).Name()` returns the fully qualified name (`pkg.func`). Trim the prefix yourself if you want short form. - A skip value of 0 means "the frame calling Caller". Skip 1 is the parent. Off-by-one mistakes are the #1 source of wrong stack traces.Reference solution
package main
import (
"fmt"
"runtime"
)
// Senior decision: name the parameter `skip` everywhere it appears.
// Reading "Caller(1)" with no context is harder than reading "Caller(skip)"
// where skip is a named local — the latter survives the inevitable
// refactor that adds a wrapper layer and shifts every depth by one.
func describe(skip int) string {
pc, file, line, ok := runtime.Caller(skip)
if !ok {
return fmt.Sprintf("skip=%d: unknown", skip)
}
fn := runtime.FuncForPC(pc)
name := "unknown"
if fn != nil {
name = fn.Name()
}
return fmt.Sprintf("skip=%d %s:%d %s", skip, file, line, name)
}
func WhoCalledMe() {
// Senior decision: add 1 to every skip because describe() is ITSELF
// a frame we want to skip. If you forget this offset, every stack
// dump in the codebase points one frame too deep.
fmt.Println(describe(1)) // the call site of WhoCalledMe
fmt.Println(describe(2)) // parent
fmt.Println(describe(3)) // grandparent
}
func WhoCalledMeAll(maxDepth int) {
for i := 1; i <= maxDepth; i++ {
pc, file, line, ok := runtime.Caller(i)
if !ok {
return
}
name := "unknown"
if fn := runtime.FuncForPC(pc); fn != nil {
name = fn.Name()
}
fmt.Printf(" [%d] %s:%d %s\n", i, file, line, name)
}
}
func c() {
fmt.Println("--- WhoCalledMe ---")
WhoCalledMe()
fmt.Println("--- WhoCalledMeAll ---")
WhoCalledMeAll(32)
}
func b() { c() }
func a() { b() }
func main() { a() }
Extension. Reimplement WhoCalledMeAll using runtime.Callers + runtime.CallersFrames. Notice how it correctly attributes inlined frames where the Caller loop does not.
Task 3: Dump goroutines with runtime.Stack¶
Goal. Write DumpSelf() and DumpAll(). The first prints the current goroutine's stack only; the second prints every goroutine in the process. Confirm the difference experimentally by spawning ten blocked goroutines and counting goroutine occurrences in each output.
Difficulty. Junior.
Skills. runtime.Stack, buffer sizing, the goroutine N [state]: header format.
Setup. Empty main, ten goroutines blocking on a channel.
Steps.
- Allocate a 64 KiB buffer.
- Call
runtime.Stack(buf, false)for self only, print the returned slice. - Call
runtime.Stack(buf, true)for all, print the returned slice. - Count
goroutineoccurrences in each (usestrings.Count).
Acceptance criteria.
- Self dump contains exactly one
goroutineline. - All dump contains at least 11 (10 blocked + main + sometimes GC/sysmon).
- A 64 KiB buffer suffices for this toy example; you note in a comment when it would not.
Hints
- `runtime.Stack` *truncates* if the buffer is too small. There is no error. The truncation point is silent — production code grows the buffer in a loop until the returned length is less than the buffer size. - The header `goroutine 1 [running]:` always starts at column 0. Count those, not stack frame lines. - Don't use this in production for fleet-wide dumps. Use the `runtime/pprof` goroutine profile (Task 6) — it's structured and dedup-friendly.Reference solution
package main
import (
"fmt"
"runtime"
"strings"
"time"
)
// Senior decision: production-grade Stack helper grows the buffer until
// the dump fits. Calling Stack with a too-small buffer silently truncates
// — the worst possible failure mode in a debugging tool, because you'll
// never know your stack trace was cut off mid-frame.
func DumpSelf() []byte {
buf := make([]byte, 64*1024)
for {
n := runtime.Stack(buf, false)
if n < len(buf) {
return buf[:n]
}
buf = make([]byte, 2*len(buf))
if len(buf) > 64*1024*1024 {
// Senior decision: cap at 64 MiB. A single goroutine with a
// larger stack is almost certainly a runaway recursion; better
// to truncate than to OOM the debugger.
return buf[:n]
}
}
}
func DumpAll() []byte {
buf := make([]byte, 64*1024)
for {
n := runtime.Stack(buf, true)
if n < len(buf) {
return buf[:n]
}
buf = make([]byte, 2*len(buf))
if len(buf) > 256*1024*1024 {
// For thousand-goroutine processes, the all-goroutine dump
// can exceed 100 MiB. Cap is necessary.
return buf[:n]
}
}
}
func main() {
block := make(chan struct{})
for i := 0; i < 10; i++ {
go func() { <-block }()
}
time.Sleep(50 * time.Millisecond) // let them park
self := DumpSelf()
all := DumpAll()
fmt.Printf("self dump: %d bytes, %d goroutine headers\n",
len(self), strings.Count(string(self), "goroutine "))
fmt.Printf("all dump: %d bytes, %d goroutine headers\n",
len(all), strings.Count(string(all), "goroutine "))
close(block)
}
Extension. Build CountByState(dump []byte) map[string]int that parses the [state]: portion of each header and returns counts. Useful for "I have 9000 goroutines, what are they doing?" triage.
Task 4: Force GC and observe NumGC¶
Goal. Read NumGC before and after runtime.GC() and confirm the counter increments by exactly one per forced call. Run a small allocation loop in between to make the GC actually have work to do.
Difficulty. Junior.
Skills. runtime.GC, runtime.ReadMemStats, the meaning of NumGC, when forcing GC is OK and when it is not.
Setup. Empty main.
Steps.
- Read
NumGC->before. - Allocate ~50 MiB in chunks and drop references.
- Call
runtime.GC(). - Read
NumGC->after. - Print
after - before. - Repeat the cycle three times.
Acceptance criteria.
- Each
runtime.GC()incrementsNumGCby at least 1. - You document in a comment that calling
runtime.GC()in production is almost always wrong — it disrupts the GC pacer.
Hints
- `NumGC` is a uint32 counter that wraps after 4 billion cycles. For any realistic process this is fine. - `runtime.GC()` blocks until the cycle completes. Useful for tests that need a deterministic GC point, dangerous in hot paths. - `runtime/debug.SetGCPercent(-1)` disables GC; `runtime.GC()` still runs but no automatic cycles occur. Useful for benchmarks; pair with `defer debug.SetGCPercent(100)` to restore.Reference solution
package main
import (
"fmt"
"runtime"
)
func main() {
var m runtime.MemStats
for cycle := 1; cycle <= 3; cycle++ {
runtime.ReadMemStats(&m)
before := m.NumGC
beforeHeapMB := float64(m.HeapAlloc) / (1024 * 1024)
// Allocate ~50 MiB, then drop the reference so it's collectable.
garbage := make([][]byte, 0, 50)
for i := 0; i < 50; i++ {
garbage = append(garbage, make([]byte, 1024*1024))
}
runtime.ReadMemStats(&m)
peakHeapMB := float64(m.HeapAlloc) / (1024 * 1024)
garbage = nil //nolint: ineffassign // intentional: drop reference
// Senior decision: runtime.GC() in production code is almost
// always a mistake. It overrides the GC pacer's careful trade-off
// between CPU and memory. The only legitimate uses are
// (a) deterministic test setup, (b) immediately before taking a
// memory profile, (c) right before exec() in a long-lived shell
// that wants to release RSS. Outside those, leave the pacer alone.
runtime.GC()
runtime.ReadMemStats(&m)
after := m.NumGC
afterHeapMB := float64(m.HeapAlloc) / (1024 * 1024)
fmt.Printf(
"cycle=%d NumGC: %d -> %d (delta=%d) heap: %.1f -> %.1f -> %.1f MiB\n",
cycle, before, after, after-before,
beforeHeapMB, peakHeapMB, afterHeapMB,
)
}
}
Extension. Run the same loop with debug.SetGCPercent(-1) set at startup. Observe that NumGC only ever increments from your explicit runtime.GC() calls — never automatically. Restore with debug.SetGCPercent(100) and confirm automatic GCs resume.
Task 5: pprof HTTP endpoint with CPU profile¶
Goal. Wire net/http/pprof into a server, generate sustained CPU load on a worker, capture a 10-second CPU profile via go tool pprof, and identify the hot function in top output.
Difficulty. Middle.
Skills. net/http/pprof side-effect import, go tool pprof, reading a flamegraph, why pprof endpoints should never be on the public mux.
Setup. A main that runs an HTTP server on :6060 and a worker goroutine doing CPU-bound work (e.g., busy-loop SHA-256 hashing).
Steps.
- Import
_ "net/http/pprof"to register handlers onhttp.DefaultServeMux. - Start an HTTP server on
:6060usinghttp.DefaultServeMux. - Spawn a worker that loops forever computing SHA-256 of random data.
- From a shell:
go tool pprof -seconds=10 http://localhost:6060/debug/pprof/profile. - In the interactive pprof shell, run
topandlist <funcname>.
Acceptance criteria.
topshows your busy function at or near the top by cumulative CPU.- You note that exposing
/debug/pprof/*on:0.0.0.0:6060is a remote-code-execution vector (/debug/pprof/cmdlinereveals the binary, and the labels endpoint can be abused). Production needs Task 12.
Hints
- The `pprof` package init registers handlers on `http.DefaultServeMux`. If you use a custom mux, you must register them manually. - CPU profile is *sampled*, not exhaustive. Default rate is 100 Hz. Functions that finish in under 10 ms may not appear. - `go tool pprof -http=:8080Reference solution
package main
import (
"crypto/sha256"
"log"
"math/rand"
"net/http"
_ "net/http/pprof" // registers handlers on http.DefaultServeMux
)
// Senior decision: in production the side-effect import is wrong. It
// blanket-registers debug routes on whatever mux happens to be Default,
// which is often the same mux the app listens on. Best practice is a
// SEPARATE mux on a SEPARATE listener bound to localhost or a private
// interface. See Task 12 for the production-grade version.
func main() {
go busyWorker()
log.Println("pprof at http://localhost:6060/debug/pprof/")
log.Fatal(http.ListenAndServe("localhost:6060", nil))
}
func busyWorker() {
buf := make([]byte, 4096)
h := sha256.New()
for {
rand.Read(buf)
h.Reset()
h.Write(buf)
_ = h.Sum(nil)
}
}
// Capture a profile from your shell:
//
// go tool pprof -seconds=10 http://localhost:6060/debug/pprof/profile
//
// Inside the pprof interactive shell:
//
// (pprof) top
// Showing nodes accounting for 9.50s, 96.94% of 9.80s total
// flat flat% sum% cum cum%
// 4.20s 42.86% 42.86% 4.20s 42.86% crypto/sha256.block
// 2.10s 21.43% 64.29% 2.10s 21.43% runtime.memmove
// ...
//
// (pprof) list busyWorker
// ...
// 11 . . func busyWorker() {
// 12 . . buf := make([]byte, 4096)
// 13 . . h := sha256.New()
// 14 . . for {
// 15 . 50ms rand.Read(buf)
// 16 . 30ms h.Reset()
// 17 . 4.20s h.Write(buf)
// 18 . 70ms _ = h.Sum(nil)
// 19 . . }
// 20 . . }
//
// Or open a flamegraph:
//
// go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=10
Extension. Profile with ?seconds=30&hz=500 (override sample rate via the URL? — actually you cannot via the URL, change it via runtime.SetCPUProfileRate before starting the profile). Compare at 100 Hz vs 500 Hz — note that higher rates capture short functions but inflate overhead.
Task 6: Goroutine profile via pprof.Lookup¶
Goal. Snapshot the current set of goroutines with pprof.Lookup("goroutine").WriteTo(w, 0) and WriteTo(w, 1). The first is the binary pprof format; the second is human-readable text. Identify a leaked goroutine in the dump.
Difficulty. Middle.
Skills. runtime/pprof.Lookup, profile debug levels, reading the text format.
Setup. Spawn 100 goroutines that block on a channel never closed. Then dump.
Steps.
- Spawn 100 blocked goroutines.
- Open a file
goroutine.pb.gz. Callpprof.Lookup("goroutine").WriteTo(f, 0). - Open a file
goroutine.txt. Callpprof.Lookup("goroutine").WriteTo(f, 1). - Open
goroutine.pb.gzwithgo tool pprof goroutine.pb.gz. Runtopandtraces. - Cat
goroutine.txt— note the100 @line indicating 100 goroutines share that stack.
Acceptance criteria.
goroutine.txtshows a stanza like100 @ ...for the leaked function.- The binary profile opens in
go tool pprofandtopshows the leak function. - You explain in a comment that debug level 2 is fmt-Stack-like and rarely useful for analysis.
Hints
- `WriteTo(w, 0)` = binary pprof, the standard input to `go tool pprof`. - `WriteTo(w, 1)` = legacy text, deduplicated by stack — perfect for "how many goroutines are stuck here". - `WriteTo(w, 2)` = full per-goroutine dump (like `runtime.Stack(buf, true)`). Almost never what you want.Reference solution
package main
import (
"fmt"
"os"
"runtime/pprof"
"time"
)
func leakedWorker(block <-chan struct{}) {
<-block // never closed -> leak
}
func main() {
block := make(chan struct{})
for i := 0; i < 100; i++ {
go leakedWorker(block)
}
time.Sleep(100 * time.Millisecond) // let them park
// Senior decision: write BOTH formats. The binary one for tooling,
// the text one for a human eye when you have only ssh and cat.
binF, err := os.Create("goroutine.pb.gz")
must(err)
defer binF.Close()
if err := pprof.Lookup("goroutine").WriteTo(binF, 0); err != nil {
panic(err)
}
txtF, err := os.Create("goroutine.txt")
must(err)
defer txtF.Close()
// Debug level 1: deduplicated text. Reads like:
// 100 @ 0x... 0x... 0x...
// # 0x... main.leakedWorker+0x... /path/main.go:7
// That "100 @" is the headline of every leak hunt — 100 goroutines
// share one stack. If you saw "1 @ 100 times" instead, you'd have
// 100 different leaks, which is a different problem.
if err := pprof.Lookup("goroutine").WriteTo(txtF, 1); err != nil {
panic(err)
}
fmt.Println("wrote goroutine.pb.gz and goroutine.txt")
fmt.Println("inspect: go tool pprof goroutine.pb.gz (then: top, traces)")
fmt.Println("inspect: less goroutine.txt")
close(block)
}
func must(err error) {
if err != nil {
panic(err)
}
}
Extension. Also dump pprof.Lookup("heap") and pprof.Lookup("allocs"). Note the difference: heap is in-use objects at the moment of sampling, allocs is the cumulative allocation count since process start.
Task 7: runtime/trace for a 100ms window¶
Goal. Record a runtime/trace for a 100 ms window in which a worker pool processes 100 jobs. Open the trace in go tool trace, find the per-goroutine timeline, and identify the longest-running job.
Difficulty. Middle.
Skills. runtime/trace.Start, runtime/trace.Stop, go tool trace.
Setup. A worker pool with 4 workers consuming from a channel. Jobs sleep a random 0–5 ms.
Steps.
- Create
trace.out. - Call
trace.Start(f). - Submit 100 jobs to the pool. Wait for them to finish.
- Call
trace.Stop(). - Run
go tool trace trace.out. Click the link printed. Explore the "Goroutines" and "Scheduler" tabs.
Acceptance criteria.
trace.outis non-empty.go tool traceopens a browser UI.- You can point at a single job's execution span on the goroutine timeline.
Hints
- `runtime/trace` has measurable overhead (5–15% CPU). Production-safe in short bursts; never leave it on. - The `go tool trace` UI requires a browser. SSH from a server: copy the file off-box and open locally. - The "Scheduler latency profile" tab is the gold mine for "why is my goroutine starving" investigations.Reference solution
package main
import (
"fmt"
"math/rand"
"os"
"runtime/trace"
"sync"
"time"
)
func main() {
f, err := os.Create("trace.out")
if err != nil {
panic(err)
}
defer f.Close()
// Senior decision: start the trace BEFORE the workload begins so we
// capture the worker pool spinup too. The first millisecond of any
// trace is the most informative — that's where laziness in init,
// unexpected goroutine spawns, and GC warmup show up.
if err := trace.Start(f); err != nil {
panic(err)
}
defer trace.Stop()
jobs := make(chan int, 100)
var wg sync.WaitGroup
for w := 0; w < 4; w++ {
wg.Add(1)
go func(workerID int) {
defer wg.Done()
for j := range jobs {
doJob(workerID, j)
}
}(w)
}
for j := 0; j < 100; j++ {
jobs <- j
}
close(jobs)
wg.Wait()
fmt.Println("wrote trace.out")
fmt.Println("open with: go tool trace trace.out")
}
func doJob(workerID, jobID int) {
// Senior decision: the trace UI shows you what each goroutine was
// doing. If your jobs are all "Sleep" you'll see Sleep blocks; if
// they're all "channel recv" you'll see those. The richer your
// workload, the more informative the trace.
time.Sleep(time.Duration(rand.Intn(5)) * time.Millisecond)
_ = workerID
_ = jobID
}
Extension. Add a trace.WithRegion (see Task 16) around each job. Now the trace UI shows named regions in the goroutine timeline — much easier to spot "job 73 took 4.8 ms" than to eyeball coloured bars.
Task 8: SetFinalizer logs when objects collect¶
Goal. Attach a finalizer to a struct that prints "collected: name=...". Allocate 10 of them in a loop, drop references, force GC, and observe the finalizer messages. Then create a chain (object A holds B, B holds C) and document the collection order across two GC cycles.
Difficulty. Middle.
Skills. runtime.SetFinalizer, why finalizers fire on a separate goroutine, ordering across GCs.
Setup. Empty main.
Steps.
- Define
type Item struct { Name string }. runtime.SetFinalizer(&item, func(i *Item) { fmt.Println("collected:", i.Name) }).- Create 10 items in a loop. Drop the slice.
runtime.GC(), thenruntime.GC()(yes, twice — see Hints).- Sleep 100 ms to let finalizer goroutine drain.
- Create three items
A -> B -> Cwhere eachNextfield points to the next. Drop A. Force GC. Observe ordering.
Acceptance criteria.
- 10 "collected: ..." lines appear after the GCs.
- For the chain, you observe (and document) that the chain is collected in REVERSE order or all-at-once depending on cycle scheduling. The exact order is not guaranteed.
Hints
- A finalizer is run on a *dedicated* goroutine. It must never block; if it does, all subsequent finalizers stall. - The object becomes collectable only AFTER its finalizer runs and the next GC cycle finds it unreachable. That's why you need *two* GCs: the first runs the finalizer (and resurrects the object briefly), the second actually frees it. - Never finalize the receiver of a method that's still being called. Race city.Reference solution
package main
import (
"fmt"
"runtime"
"time"
)
type Item struct {
Name string
Next *Item
}
func makeItem(name string) *Item {
it := &Item{Name: name}
runtime.SetFinalizer(it, func(i *Item) {
fmt.Println("collected:", i.Name)
})
return it
}
func main() {
// Round 1: ten independent items.
items := make([]*Item, 0, 10)
for i := 0; i < 10; i++ {
items = append(items, makeItem(fmt.Sprintf("item-%d", i)))
}
items = nil // drop refs
// Senior decision: TWO runtime.GC() calls. Finalizers run AFTER an
// object is found unreachable, and the runtime "resurrects" the
// object briefly so the finalizer can touch it. Only the SECOND GC
// can actually free it (assuming the finalizer didn't stash a
// reference somewhere). Forgetting the second GC is the #1 source
// of "my finalizer ran but memory wasn't freed".
runtime.GC()
runtime.GC()
time.Sleep(100 * time.Millisecond) // let finalizer goroutine drain
fmt.Println("--- chain ---")
// Round 2: A -> B -> C.
a := makeItem("A")
b := makeItem("B")
c := makeItem("C")
a.Next = b
b.Next = c
a = nil
_ = b
_ = c
// Senior decision: with the chain we'd LIKE to see C-then-B-then-A
// (deepest first). The actual order depends on the GC implementation
// — it may collect them all in one cycle. The Go spec gives no
// ordering guarantee. If you NEED ordering, don't use finalizers;
// use explicit Close() methods or runtime.AddCleanup (Task 20).
b = nil
c = nil
runtime.GC()
runtime.GC()
time.Sleep(100 * time.Millisecond)
}
Extension. Add a finalizer that resurrects the object by stashing the pointer in a global. Demonstrate that the object then survives the next GC. Use this only to understand the failure mode — never in real code.
Task 9: KeepAlive prevents premature finalization¶
Goal. Write code where a finalizer fires while the object is still being used, leading to incorrect behaviour. Then fix it with runtime.KeepAlive and confirm the bug disappears.
Difficulty. Middle.
Skills. runtime.KeepAlive, the GC's eagerness to collect unreferenced objects, why cgo wrappers especially need this.
Setup. Simulate a "resource" that prints "in use: ID=N" while alive and "freed: ID=N" from its finalizer. Use the resource in a loop AFTER the last syntactic reference.
Steps.
- Define
type Resource struct { ID int; Handle uintptr }. - Allocate one. Set a finalizer that prints "freed".
- In a loop, call a method that prints "in use" — but extract the
Handlefirst and use the local copy in the loop. The original*Resourcebecomes unreachable. - Run with GC pressure (allocate aggressively in the loop). Observe "freed" printed before "in use" finishes.
- Add
runtime.KeepAlive(r)after the loop. Re-run. Confirm "freed" prints last.
Acceptance criteria.
- Without
KeepAlive, "freed: ID=1" appears before some "in use: ID=1" lines under GC pressure. - With
KeepAlive, "freed" appears only after the loop completes. - You document the cgo angle: any cgo function holding a pointer derived from a Go object needs
KeepAlive(obj)after the call.
Hints
- The compiler may decide a variable is dead earlier than you expect — the GC follows. `KeepAlive(r)` is a no-op at runtime but tells the compiler "treat r as live up to this point". - Force GC pressure with a tight allocation loop. Without pressure, the bug may not reproduce on every run. - Real-world bug: `os.File.Fd()` returns a `uintptr` and the `*File` can be collected before you use the fd. The stdlib doc literally says "call runtime.KeepAlive(f)".Reference solution
package main
import (
"fmt"
"runtime"
"time"
)
type Resource struct {
ID int
Handle uintptr // pretend this is a cgo handle
}
func newResource(id int) *Resource {
r := &Resource{ID: id, Handle: uintptr(id) * 0x1000}
runtime.SetFinalizer(r, func(rr *Resource) {
fmt.Printf("freed: ID=%d Handle=0x%x\n", rr.ID, rr.Handle)
})
return r
}
func useHandle(handle uintptr, iter int) {
fmt.Printf("in use: handle=0x%x iter=%d\n", handle, iter)
}
func brokenDemo() {
fmt.Println("--- BROKEN (no KeepAlive) ---")
r := newResource(1)
// Senior decision: this is the trap. After we extract Handle, the
// *Resource itself has no more uses in the function. The compiler
// marks r as dead; the GC collects it; the finalizer fires; the
// Handle we're holding is now a dangling reference. In cgo this
// would be a use-after-free.
handle := r.Handle
for i := 0; i < 50; i++ {
useHandle(handle, i)
// Force GC pressure.
_ = make([]byte, 1<<20)
runtime.GC()
time.Sleep(time.Millisecond)
}
fmt.Println("brokenDemo done")
}
func fixedDemo() {
fmt.Println("--- FIXED (with KeepAlive) ---")
r := newResource(2)
handle := r.Handle
for i := 0; i < 50; i++ {
useHandle(handle, i)
_ = make([]byte, 1<<20)
runtime.GC()
time.Sleep(time.Millisecond)
}
// Senior decision: KeepAlive is a compile-time signal, not a
// runtime operation. It compiles to nothing — but it stops the
// compiler from marking r as dead before this line. Placement
// matters: KeepAlive must be AFTER the last use of the derived
// value (handle) you want to protect.
runtime.KeepAlive(r)
fmt.Println("fixedDemo done")
}
func main() {
brokenDemo()
time.Sleep(100 * time.Millisecond)
fixedDemo()
time.Sleep(100 * time.Millisecond)
}
Extension. Find one place in the Go stdlib (try os or crypto/rand) where runtime.KeepAlive is used. Read the comment. Note that almost every use is at a syscall or cgo boundary — those are the only places it's commonly correct.
Task 10: pprof.Do labels for hot-path attribution¶
Goal. Tag a CPU-bound function with pprof.Do(ctx, pprof.Labels("tenant", "acme", "endpoint", "/checkout"), fn). Take a CPU profile. In go tool pprof, filter by label to show only the work tagged for tenant=acme.
Difficulty. Middle.
Skills. pprof.Do, pprof.Labels, tagfocus in go tool pprof, why labels beat hand-rolled bookkeeping.
Setup. Two CPU-bound functions, one tagged with tenant=acme, one with tenant=globex. Both run in parallel.
Steps.
- Wire
net/http/pprof(as Task 5). - Launch one goroutine per tenant. Wrap the workload in
pprof.Do(ctx, pprof.Labels("tenant", "acme"), func(ctx context.Context) { busyWork() }). - Capture a 10s CPU profile.
- In the pprof shell:
tagfocus=tenant=acmethentop. Compare againsttagfocus=tenant=globex. - Open the flamegraph (
-http=:8080) and use the label dropdown.
Acceptance criteria.
topwithtagfocus=tenant=acmeshows only the work attributable toacme.- The two tenants' CPU shares roughly match their respective workload sizes.
- You document that labels propagate via
ctx— child goroutines launched with that ctx inherit labels.
Hints
- Labels are attached to the *goroutine* via `pprof.SetGoroutineLabels`. `pprof.Do` is the safe wrapper that restores the prior labels on exit. - Child goroutines do NOT inherit labels automatically — you must pass the labelled `ctx` and call `pprof.SetGoroutineLabels(ctx)` from the child. `pprof.Do` does this for you. - `tagfocus` is a pprof shell command; the flamegraph UI exposes the same filter via dropdowns.Reference solution
package main
import (
"context"
"crypto/sha256"
"log"
"math/rand"
"net/http"
_ "net/http/pprof"
"runtime/pprof"
"sync"
)
func busyWork(iters int) {
buf := make([]byte, 4096)
h := sha256.New()
for i := 0; i < iters; i++ {
rand.Read(buf)
h.Reset()
h.Write(buf)
_ = h.Sum(nil)
}
}
func main() {
go func() {
log.Println("pprof at http://localhost:6060/debug/pprof/")
log.Fatal(http.ListenAndServe("localhost:6060", nil))
}()
var wg sync.WaitGroup
// Senior decision: the LABELS are the schema for slicing your CPU
// profile after the fact. Pick them ONCE up front and use them
// everywhere — tenant, endpoint, request_class. Inconsistent label
// keys ("user" here, "userid" there) destroy the analysis.
tenants := []string{"acme", "globex", "acme", "acme"} // acme triple-loaded
for i, t := range tenants {
wg.Add(1)
go func(tenant string, idx int) {
defer wg.Done()
ctx := context.Background()
pprof.Do(ctx, pprof.Labels(
"tenant", tenant,
"worker_id", fmtInt(idx),
), func(ctx context.Context) {
// Senior decision: the work happens INSIDE pprof.Do.
// Anything before/after the Do is unattributed in the
// profile. Wrap the whole hot path, not just the
// innermost call.
busyWork(1_000_000)
})
}(t, i)
}
wg.Wait()
log.Println("done")
}
func fmtInt(i int) string {
const digits = "0123456789"
if i == 0 {
return "0"
}
buf := [16]byte{}
pos := len(buf)
for i > 0 {
pos--
buf[pos] = digits[i%10]
i /= 10
}
return string(buf[pos:])
}
// Capture and inspect:
//
// go tool pprof http://localhost:6060/debug/pprof/profile?seconds=10
// (pprof) tagfocus=tenant=acme
// (pprof) top
// flat flat% cum cum%
// 5.20s 75.36% 5.20s 75.36% crypto/sha256.block
//
// (pprof) tagfocus=tenant=globex
// (pprof) top
// flat flat% cum cum%
// 1.80s 25.71% 1.80s 25.71% crypto/sha256.block
//
// acme's three workers took ~75%, globex's one worker took ~25%.
// That's the per-tenant CPU split, derived from the profile alone.
Extension. Add pprof.SetGoroutineLabels(ctx) manually in a worker that the labelled ctx is passed to but pprof.Do is not. Confirm via profile that you can label without the Do wrapper, at the cost of having to restore old labels yourself (almost never worth it).
Task 11: Prometheus exporter from runtime/metrics¶
Goal. Read runtime/metrics and expose five key metrics in Prometheus text format on /metrics. The five: heap allocated bytes, goroutine count, GC CPU fraction, GC pause percentile, and scheduler latency percentile.
Difficulty. Senior.
Skills. runtime/metrics, the Float64Histogram vs Uint64 value types, Prometheus exposition format, why runtime/metrics is the right answer in 2024+.
Setup. net/http server. No external dependencies (write the exposition manually so you understand it).
Steps.
- List the five metric names by reading
runtime/metrics.All()and finding the ones you want by name match. - Create a
metrics.Sampleslice with those names. - In an HTTP handler, call
metrics.Read(samples)and format each sample as a Prometheus line. - For histogram metrics, extract the p50/p99 by interpolating the cumulative bucket counts.
- Curl
/metrics. Confirm five# HELPand# TYPElines plus values.
Acceptance criteria.
metrics.Readis called per request, not at module init.- Histograms are exposed as
_p50and_p99derived values (or as full histograms — your choice). - The handler completes in <1 ms on a healthy process (no stop-the-world).
- You explain in a comment why
runtime/metricsis preferred overruntime.ReadMemStatsfor continuous monitoring.
Hints
- `runtime/metrics.All()` returns descriptions, including the metric kind. Use this to discover names; never hardcode without checking. - `Float64Histogram` has `Counts []uint64` (cumulative) and `Buckets []float64` (upper edges). Percentile = "smallest bucket whose cumulative count >= total * p". - `runtime/metrics` is sampled cheaply — most reads are non-blocking atomic loads. Safe at 1 Hz, OK at 10 Hz, expensive only if you read ALL metrics (200+).Reference solution
package main
import (
"fmt"
"log"
"net/http"
"runtime/metrics"
"strings"
)
// Senior decision: declare the metrics list ONCE as a package-level
// variable. The Sample slice is allocated per-request inside the handler
// (cheap), but the schema (names + Prometheus mapping) is static — no
// reason to rebuild it every request.
var promMetrics = []struct {
runtimeName string // runtime/metrics name
promName string // Prometheus metric name
help string
kind string // "gauge" | "counter" | "summary" (derived)
}{
{"/memory/classes/heap/objects:bytes", "go_memstats_heap_alloc_bytes",
"Bytes allocated and currently in use.", "gauge"},
{"/sched/goroutines:goroutines", "go_goroutines",
"Number of goroutines that currently exist.", "gauge"},
{"/cpu/classes/gc/total:cpu-seconds", "go_gc_cpu_seconds_total",
"Cumulative CPU time spent in GC.", "counter"},
{"/gc/pauses:seconds", "go_gc_pause_seconds",
"Distribution of GC pause durations.", "summary"},
{"/sched/latencies:seconds", "go_sched_latencies_seconds",
"Distribution of times goroutines spent on the scheduler queue.", "summary"},
}
// Senior decision: runtime/metrics is the right choice over MemStats
// for continuous monitoring because (a) most reads are lock-free atomic
// loads, no stop-the-world; (b) metrics are versioned and stable across
// Go releases; (c) histograms preserve the distribution (MemStats only
// gives totals). The migration cost is one-time; the operational win is
// permanent.
func metricsHandler(w http.ResponseWriter, r *http.Request) {
samples := make([]metrics.Sample, len(promMetrics))
for i, m := range promMetrics {
samples[i].Name = m.runtimeName
}
metrics.Read(samples)
var sb strings.Builder
for i, m := range promMetrics {
s := samples[i]
if s.Value.Kind() == metrics.KindBad {
// Metric doesn't exist on this Go version — skip with a
// comment rather than erroring. The exporter must NEVER fail
// closed on metric absence; that crashes alerting.
fmt.Fprintf(&sb, "# %s not available on this Go version\n", m.runtimeName)
continue
}
fmt.Fprintf(&sb, "# HELP %s %s\n", m.promName, m.help)
fmt.Fprintf(&sb, "# TYPE %s %s\n", m.promName, m.kind)
switch s.Value.Kind() {
case metrics.KindUint64:
fmt.Fprintf(&sb, "%s %d\n", m.promName, s.Value.Uint64())
case metrics.KindFloat64:
fmt.Fprintf(&sb, "%s %g\n", m.promName, s.Value.Float64())
case metrics.KindFloat64Histogram:
h := s.Value.Float64Histogram()
p50 := percentile(h, 0.50)
p99 := percentile(h, 0.99)
fmt.Fprintf(&sb, "%s{quantile=\"0.50\"} %g\n", m.promName, p50)
fmt.Fprintf(&sb, "%s{quantile=\"0.99\"} %g\n", m.promName, p99)
}
}
w.Header().Set("Content-Type", "text/plain; version=0.0.4")
w.Write([]byte(sb.String()))
}
// percentile interpolates the cumulative histogram for a given quantile.
// runtime/metrics histograms use OPEN buckets — h.Counts[i] is the count
// in the range (h.Buckets[i], h.Buckets[i+1]].
func percentile(h *metrics.Float64Histogram, q float64) float64 {
var total uint64
for _, c := range h.Counts {
total += c
}
if total == 0 {
return 0
}
target := uint64(float64(total) * q)
var sum uint64
for i, c := range h.Counts {
sum += c
if sum >= target {
// Senior decision: return the upper edge of the bucket.
// Linear interpolation across the bucket would be slightly
// more accurate but require remembering the previous edge —
// for a tail percentile (p99) the bucket is wide and the
// interpolation isn't worth the code.
return h.Buckets[i+1]
}
}
return h.Buckets[len(h.Buckets)-1]
}
func main() {
http.HandleFunc("/metrics", metricsHandler)
log.Println("metrics at http://localhost:8080/metrics")
log.Fatal(http.ListenAndServe(":8080", nil))
}
# HELP go_memstats_heap_alloc_bytes Bytes allocated and currently in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 4218976
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 6
# HELP go_gc_cpu_seconds_total Cumulative CPU time spent in GC.
# TYPE go_gc_cpu_seconds_total counter
go_gc_cpu_seconds_total 0.001
# HELP go_gc_pause_seconds Distribution of GC pause durations.
# TYPE go_gc_pause_seconds summary
go_gc_pause_seconds{quantile="0.50"} 0.0001
go_gc_pause_seconds{quantile="0.99"} 0.0005
# HELP go_sched_latencies_seconds Distribution of times goroutines spent on the scheduler queue.
# TYPE go_sched_latencies_seconds summary
go_sched_latencies_seconds{quantile="0.50"} 1e-06
go_sched_latencies_seconds{quantile="0.99"} 2e-05
Extension. Add the full histogram bucketing for one of the histogram metrics (Prometheus _bucket{le="..."} series). The le values are runtime/metrics bucket upper edges. The cumulative _count and _sum complete the histogram contract.
Task 12: Production-ready /debug/pprof handler¶
Goal. Build an HTTP handler that gates /debug/pprof/* behind HTTP basic auth AND an IP allowlist. Both must pass. The unauthenticated response is 401; the wrong-IP response is 403. Pprof handlers themselves must be unchanged behind the gate.
Difficulty. Senior.
Skills. net/http/pprof deep usage, subtle.ConstantTimeCompare for credentials, why basic-auth-with-static-creds beats nothing but isn't real security.
Setup. net/http server. Read username/password from env vars.
Steps.
- Build the auth middleware: parse
Authorization: Basic, decode, constant-time compare. - Build the IP-allowlist middleware: parse
r.RemoteAddr, compare against CIDR list parsed at startup. - Compose:
allowlist -> auth -> pprofMux. - Register on a SEPARATE mux on a SEPARATE listener (debug port only).
- Test with
curl(no auth -> 401), (bad IP -> 403), (correct -> profile bytes).
Acceptance criteria.
- A request from an allowlisted IP without basic auth gets 401.
- A request from a non-allowlisted IP gets 403 regardless of auth.
- Constant-time compare is used for the password (mitigates timing oracles).
- The debug listener binds to a non-default port and is documented as "do not expose to the internet".
Hints
- `subtle.ConstantTimeCompare` returns 1 if equal, 0 if not, and runs in time independent of input. Use it for every secret comparison. - IP allowlist via `net.ParseCIDR` parsed once at startup; never per request. - `r.RemoteAddr` is `host:port`. Strip the port with `net.SplitHostPort`. Also respect `X-Forwarded-For` if behind a trusted reverse proxy — but ONLY if you know it's trusted, otherwise IP-spoof city.Reference solution
package main
import (
"crypto/subtle"
"log"
"net"
"net/http"
"net/http/pprof"
"os"
"strings"
)
// Senior decision: defence in depth. Either auth OR allowlist alone is
// flawed: auth on a public port still lets attackers probe with brute
// force; allowlist alone is bypassed by anyone inside the trusted network.
// Both together raise the bar enough that a misconfigured ingress
// firewall doesn't immediately leak a CPU profile.
type pprofGate struct {
user, pass string
nets []*net.IPNet
inner http.Handler
}
func (g *pprofGate) ServeHTTP(w http.ResponseWriter, r *http.Request) {
// Check IP first — cheaper than basic auth, denies before we even
// touch the password.
host, _, err := net.SplitHostPort(r.RemoteAddr)
if err != nil {
http.Error(w, "bad RemoteAddr", http.StatusBadRequest)
return
}
ip := net.ParseIP(host)
if ip == nil {
http.Error(w, "bad RemoteAddr", http.StatusBadRequest)
return
}
allowed := false
for _, n := range g.nets {
if n.Contains(ip) {
allowed = true
break
}
}
if !allowed {
http.Error(w, "forbidden", http.StatusForbidden)
return
}
// Basic auth.
user, pass, ok := r.BasicAuth()
if !ok {
w.Header().Set("WWW-Authenticate", `Basic realm="pprof"`)
http.Error(w, "unauthorized", http.StatusUnauthorized)
return
}
// Senior decision: constant-time compare on BOTH credentials. Using
// == on the password leaks length and partial-prefix information via
// timing. ConstantTimeCompare is the standard mitigation.
userOK := subtle.ConstantTimeCompare([]byte(user), []byte(g.user)) == 1
passOK := subtle.ConstantTimeCompare([]byte(pass), []byte(g.pass)) == 1
if !userOK || !passOK {
w.Header().Set("WWW-Authenticate", `Basic realm="pprof"`)
http.Error(w, "unauthorized", http.StatusUnauthorized)
return
}
g.inner.ServeHTTP(w, r)
}
func newPprofMux() *http.ServeMux {
// Senior decision: build the pprof mux ourselves rather than using
// http.DefaultServeMux. Registering on Default contaminates the
// process-wide mux; explicit registration on a private mux keeps
// pprof scoped to the debug listener only.
mux := http.NewServeMux()
mux.HandleFunc("/debug/pprof/", pprof.Index)
mux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
mux.HandleFunc("/debug/pprof/profile", pprof.Profile)
mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
mux.HandleFunc("/debug/pprof/trace", pprof.Trace)
return mux
}
func parseAllowlist(cidrs []string) []*net.IPNet {
out := make([]*net.IPNet, 0, len(cidrs))
for _, c := range cidrs {
_, n, err := net.ParseCIDR(c)
if err != nil {
log.Fatalf("bad CIDR %q: %v", c, err)
}
out = append(out, n)
}
return out
}
func main() {
user := os.Getenv("PPROF_USER")
pass := os.Getenv("PPROF_PASS")
if user == "" || pass == "" {
log.Fatal("PPROF_USER and PPROF_PASS must be set")
}
raw := os.Getenv("PPROF_ALLOW") // e.g. "10.0.0.0/8,127.0.0.1/32"
if raw == "" {
raw = "127.0.0.1/32,::1/128"
}
nets := parseAllowlist(strings.Split(raw, ","))
gate := &pprofGate{
user: user,
pass: pass,
nets: nets,
inner: newPprofMux(),
}
// Senior decision: bind the debug listener to localhost by default
// and document that exposing it requires explicit ingress config.
// The auth + allowlist are the second and third layers; "don't
// expose at all" is the first.
log.Println("pprof at https://127.0.0.1:6060/debug/pprof/")
log.Fatal(http.ListenAndServe("127.0.0.1:6060", gate))
}
// Test:
// curl -i http://127.0.0.1:6060/debug/pprof/ # 401
// curl -i -u user:pass http://127.0.0.1:6060/debug/pprof/ # 200
// (from a non-allowlisted IP) curl ... # 403
Extension. Add a third gate: rate-limit /debug/pprof/profile and /debug/pprof/trace to one request every 30 seconds per source IP. Both are expensive; an attacker who somehow got past auth+allowlist still shouldn't be able to DoS you by spamming profile requests.
Task 13: SetMemoryLimit and allocation pressure¶
Goal. Set runtime/debug.SetMemoryLimit to 200 MiB at startup. Run an allocation loop that climbs above 200 MiB. Observe via runtime/metrics that the GC runs more aggressively to keep the runtime under the limit (higher gc/cycles/total, lower steady-state heap).
Difficulty. Senior.
Skills. debug.SetMemoryLimit, soft vs hard memory limits, why this is the modern replacement for GOGC tuning.
Setup. Empty main. Need runtime/metrics from Task 11.
Steps.
- At startup, log the current memory limit (
debug.SetMemoryLimit(-1)reads without setting). - Set to 200 MiB.
- Run two phases: phase A with the limit, phase B with the limit removed (math.MaxInt64).
- In each phase, allocate to ~400 MiB and drop refs slowly; sample
runtime/metricsevery 100 ms. - Print the GC cycle count delta per phase.
Acceptance criteria.
- Phase A shows significantly more GC cycles than Phase B (the runtime is GC'ing harder to honour the limit).
- Phase A's peak heap stays close to 200 MiB; Phase B's climbs above.
- You document that the limit is soft — Go will exceed it if the live working set cannot fit, but will exit with OOM rather than indefinitely violate.
Hints
- `debug.SetMemoryLimit(-1)` returns the current value without modification. - The limit accounts for ALL Go runtime memory, not just heap (stacks, goroutine metadata, runtime state). Set it to ~80% of container limit, not 100%. - `GOMEMLIMIT` env var sets the same thing at startup. Use both — env var as default, programmatic override for tests.Reference solution
package main
import (
"fmt"
"math"
"runtime"
"runtime/debug"
"runtime/metrics"
"time"
)
func readGCCycles() uint64 {
s := []metrics.Sample{{Name: "/gc/cycles/total:gc-cycles"}}
metrics.Read(s)
return s[0].Value.Uint64()
}
func readHeapInUse() float64 {
s := []metrics.Sample{{Name: "/memory/classes/heap/objects:bytes"}}
metrics.Read(s)
return float64(s[0].Value.Uint64()) / (1024 * 1024)
}
func allocChurn(durationMs int) (peakMiB float64) {
var keep [][]byte
deadline := time.Now().Add(time.Duration(durationMs) * time.Millisecond)
for time.Now().Before(deadline) {
// Allocate 8 MiB chunks; keep up to 50 of them rotating.
keep = append(keep, make([]byte, 8*1024*1024))
if len(keep) > 50 {
keep = keep[1:]
}
if mb := readHeapInUse(); mb > peakMiB {
peakMiB = mb
}
time.Sleep(2 * time.Millisecond)
}
runtime.KeepAlive(keep)
return
}
func main() {
// Senior decision: read the current limit BEFORE setting one. In
// containerised production the limit is typically set via the
// GOMEMLIMIT env var; logging it on startup makes "what limit am I
// running under" answerable from logs.
current := debug.SetMemoryLimit(-1)
fmt.Printf("startup memory limit: %d bytes (%.0f MiB)\n",
current, float64(current)/(1024*1024))
fmt.Println("--- Phase A: limit = 200 MiB ---")
debug.SetMemoryLimit(200 * 1024 * 1024)
gcBefore := readGCCycles()
peakA := allocChurn(2000)
gcAfter := readGCCycles()
fmt.Printf("Phase A: %d GC cycles, peak heap %.1f MiB\n",
gcAfter-gcBefore, peakA)
fmt.Println("--- Phase B: limit = unlimited ---")
debug.SetMemoryLimit(math.MaxInt64)
gcBefore = readGCCycles()
peakB := allocChurn(2000)
gcAfter = readGCCycles()
fmt.Printf("Phase B: %d GC cycles, peak heap %.1f MiB\n",
gcAfter-gcBefore, peakB)
// Senior decision: typical output —
// Phase A: 47 GC cycles, peak heap 215.4 MiB
// Phase B: 9 GC cycles, peak heap 401.8 MiB
//
// Phase A's pacer is FIGHTING to stay near 200 MiB by GCing 5x more
// often. CPU goes up; memory stays bounded. Phase B's pacer is in
// "GOGC=100" steady state — fewer cycles, but the heap floats up
// to wherever the working set wants. The memory limit is the right
// dial when you'd rather pay CPU than be OOM-killed.
}
Extension. Combine with debug.SetGCPercent(-1) to disable automatic GC while keeping the memory limit. The runtime still runs GC when approaching the limit ("conservative GC mode"). Use the same metrics to compare CPU and pause distribution.
Task 14: LockOSThread with thread-local cgo state¶
Goal. Demonstrate why runtime.LockOSThread is necessary for cgo calls that depend on thread-local state. Use errno (set by libc) as the example: without LockOSThread, a goroutine that calls a libc function and then reads errno may read it from the wrong thread.
Difficulty. Senior.
Skills. runtime.LockOSThread, runtime.UnlockOSThread, cgo, the M:N scheduler's freedom to migrate goroutines between OS threads.
Setup. A small package using cgo. If cgo is unavailable in your env, simulate with syscall.Gettid (Linux) to prove the goroutine sometimes runs on a different OS thread per call.
Steps.
- Write a function
currentThreadID() intusing a cgogettid()call (Linux) orsyscall.SYS_GETTID. - In a goroutine, call
currentThreadID()1000 times with a tinyruntime.Gosched()between each. Count distinct thread IDs. - Repeat WITH
runtime.LockOSThread()at the start andruntime.UnlockOSThread()at the end. Confirm exactly one thread ID. - Document a real example: OpenGL contexts, X11, glibc locale, signal masks — anything pinned per-thread.
Acceptance criteria.
- Without lock: the goroutine observes 2+ distinct thread IDs across iterations.
- With lock: exactly one thread ID for the lifetime of the lock.
- You explain WHY (the runtime is free to park the goroutine and resume it on any P/M).
Hints
- `runtime.LockOSThread` is reentrant — N calls require N matching `UnlockOSThread` calls. - A goroutine that exits while still locked terminates its OS thread. Useful for guaranteeing cleanup of thread-local C state; otherwise a leak. - `init` functions of every Go program run on the *same* OS thread (the main thread). Useful for libraries that must initialise on the main thread, e.g. some GUI frameworks.Reference solution
//go:build linux
package main
import (
"fmt"
"runtime"
"sync"
"syscall"
)
// Senior decision: prefer syscall.Gettid over cgo for THIS demo because
// it's portable across Go versions and doesn't require a C toolchain.
// In production, the cgo equivalent applies to anything libc — OpenGL,
// X11, gettext, glibc locale, signal masks, OpenSSL thread-local error
// queues. The pattern is identical.
func currentTID() int {
return syscall.Gettid()
}
func observeWithoutLock(iters int) map[int]int {
counts := map[int]int{}
var mu sync.Mutex
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
for i := 0; i < iters; i++ {
tid := currentTID()
mu.Lock()
counts[tid]++
mu.Unlock()
// Give the runtime an opportunity to migrate us.
runtime.Gosched()
}
}()
wg.Wait()
return counts
}
func observeWithLock(iters int) map[int]int {
counts := map[int]int{}
var mu sync.Mutex
var wg sync.WaitGroup
wg.Add(1)
go func() {
// Senior decision: Lock BEFORE the first thread-local call,
// Unlock AFTER the LAST. If you Lock too late, the first call
// is on a random thread; if you forget to Unlock, the M is
// pinned to this goroutine forever — which is what you want for
// signal-handler init goroutines but a leak for ad-hoc workers.
runtime.LockOSThread()
defer runtime.UnlockOSThread()
defer wg.Done()
for i := 0; i < iters; i++ {
tid := currentTID()
mu.Lock()
counts[tid]++
mu.Unlock()
runtime.Gosched()
}
}()
wg.Wait()
return counts
}
func main() {
// Force a multi-P scheduler so goroutines can actually migrate.
runtime.GOMAXPROCS(4)
a := observeWithoutLock(10_000)
fmt.Printf("without lock: %d distinct TIDs\n", len(a))
for tid, n := range a {
fmt.Printf(" tid=%d count=%d\n", tid, n)
}
b := observeWithLock(10_000)
fmt.Printf("with lock: %d distinct TIDs\n", len(b))
for tid, n := range b {
fmt.Printf(" tid=%d count=%d\n", tid, n)
}
}
// Typical output:
// without lock: 3 distinct TIDs
// tid=12345 count=4123
// tid=12346 count=3210
// tid=12347 count=2667
// with lock: 1 distinct TIDs
// tid=12348 count=10000
//
// The "3 distinct TIDs" is the bug for any cgo function that depends on
// thread-local state. errno read after a libc call may come from a
// thread that didn't make the call. OpenGL contexts will be "current"
// on a thread that's no longer running the goroutine. Signal masks set
// will affect the wrong thread.
Extension. Write a goroutine that does NOT call UnlockOSThread before returning. Observe (via /sched/threads/total or top) that the OS thread is destroyed when the goroutine exits — that's the documented behaviour and it's how you make sure thread-local C state is cleaned up rather than leaked into the runtime's M pool.
Task 15: Goroutine leak detector by stack diff¶
Goal. Build a LeakDetector that snapshots the goroutine stack at intervals and reports any stacks that grow in count between snapshots. Run it against a deliberately leaky program (one goroutine per HTTP request, never exits).
Difficulty. Senior.
Skills. runtime.Stack(buf, true), parsing the textual goroutine dump, diff-over-time.
Setup. A leaky HTTP handler that spawns a goroutine blocking on a channel never closed. A detector goroutine sampling every 5 seconds.
Steps.
- Write a
parseStacks(buf []byte) map[string]intthat returns count per normalised stack (strip goroutine IDs and addresses). - Implement
Detector.Snapshot()returning the map. - Implement
Detector.Diff(prev, curr)returning stacks wherecurr[s] > prev[s]. - Wire a leaky HTTP handler. Hit it 100 times. Confirm the detector flags the leak.
Acceptance criteria.
- The detector identifies the leaky goroutine by its stack signature.
- It distinguishes "growing" from "fluctuating" — only flags if
curr - prev >= threshold. - It does NOT use
runtime.Stack(buf, true)more than once per interval (it's stop-the-world).
Hints
- Goroutine dump format: each stanza starts with `goroutine NNN [state]:` then frames indented. The frame text (file:line, function name) is what you key on; the NNN and state vary per call. - Normalise by stripping the first line (`goroutine NNN [state]:`) and any hex addresses (`0x...`). - Real production leak detectors use `pprof.Lookup("goroutine")` instead and diff the pprof profiles — much faster, smaller, dedup'd. Building the textual version teaches the parsing; for production, prefer the structured one.Reference solution
package main
import (
"bufio"
"bytes"
"fmt"
"log"
"net/http"
"regexp"
"runtime"
"sort"
"strings"
"time"
)
var addrRE = regexp.MustCompile(`0x[0-9a-fA-F]+`)
// parseStacks returns count per normalised stack signature.
// Senior decision: the goroutine ID and any hex addresses are
// per-instance noise. Strip them BEFORE counting so that 100 goroutines
// stuck on the same channel collapse to one entry with count=100 —
// which is the entire point of the diff.
func parseStacks(buf []byte) map[string]int {
counts := map[string]int{}
sc := bufio.NewScanner(bytes.NewReader(buf))
sc.Buffer(make([]byte, 1024*1024), 16*1024*1024)
var current []string
flush := func() {
if len(current) == 0 {
return
}
sig := normalise(current)
counts[sig]++
current = current[:0]
}
for sc.Scan() {
line := sc.Text()
if strings.HasPrefix(line, "goroutine ") {
flush()
current = append(current, "<header>") // drop the variable part
continue
}
current = append(current, line)
}
flush()
return counts
}
func normalise(lines []string) string {
var sb strings.Builder
for _, l := range lines {
clean := addrRE.ReplaceAllString(l, "0x?")
sb.WriteString(clean)
sb.WriteByte('\n')
}
return sb.String()
}
type Detector struct {
Threshold int
prev map[string]int
}
func (d *Detector) Snapshot() map[string]int {
buf := make([]byte, 1<<20)
for {
n := runtime.Stack(buf, true)
if n < len(buf) {
return parseStacks(buf[:n])
}
buf = make([]byte, 2*len(buf))
if len(buf) > 64<<20 {
return parseStacks(buf[:n])
}
}
}
type Growth struct {
Sig string
PrevCount int
CurrCount int
}
func (d *Detector) Tick() []Growth {
curr := d.Snapshot()
var growths []Growth
if d.prev != nil {
for sig, c := range curr {
p := d.prev[sig]
if c-p >= d.Threshold {
growths = append(growths, Growth{Sig: sig, PrevCount: p, CurrCount: c})
}
}
sort.Slice(growths, func(i, j int) bool {
return (growths[i].CurrCount - growths[i].PrevCount) >
(growths[j].CurrCount - growths[j].PrevCount)
})
}
d.prev = curr
return growths
}
// --- leaky workload ---
var leak = make(chan struct{}) // never closed
func leakyHandler(w http.ResponseWriter, r *http.Request) {
go func() {
<-leak // forever
}()
w.Write([]byte("ok"))
}
func main() {
http.HandleFunc("/leak", leakyHandler)
go func() {
// Detector loop.
d := &Detector{Threshold: 10}
// Seed.
d.Tick()
for {
time.Sleep(5 * time.Second)
growths := d.Tick()
if len(growths) == 0 {
continue
}
log.Printf("--- LEAK DETECTOR: %d growing stacks ---", len(growths))
for _, g := range growths {
log.Printf("count %d -> %d (delta=%d):\n%s",
g.PrevCount, g.CurrCount, g.CurrCount-g.PrevCount,
truncate(g.Sig, 400))
}
}
}()
// Self-traffic to drive the leak.
go func() {
time.Sleep(time.Second)
for i := 0; i < 1000; i++ {
resp, err := http.Get("http://127.0.0.1:7777/leak")
if err == nil {
resp.Body.Close()
}
time.Sleep(50 * time.Millisecond)
}
}()
log.Fatal(http.ListenAndServe("127.0.0.1:7777", nil))
}
func truncate(s string, n int) string {
if len(s) <= n {
return s
}
return s[:n] + "...(truncated)"
}
Extension. Replace the textual parser with pprof.Lookup("goroutine").WriteTo(buf, 0) and read the resulting protobuf with github.com/google/pprof/profile. Faster, smaller, and gives you exact line numbers without regex.
Task 16: trace.WithRegion times a critical section¶
Goal. Use runtime/trace.WithRegion to mark a critical section inside a worker. Open the trace in go tool trace and confirm the region appears as a named span on the goroutine timeline.
Difficulty. Senior.
Skills. runtime/trace.WithRegion, runtime/trace.StartRegion (lower-level), correlating regions with task IDs.
Setup. Worker pool from Task 7 with WithRegion calls wrapped around each job's "phase 1" and "phase 2".
Steps.
- Start a trace (Task 7).
- In each worker, call
trace.WithRegion(ctx, "phase1", func() { phase1() }). - Same for
phase2. - Stop the trace. Open in
go tool trace. Navigate to "User-defined regions" view. - Sort by max duration. Confirm you see "phase1" and "phase2" entries with per-instance timings.
Acceptance criteria.
- The trace contains a "User-defined regions" tab populated with your region names.
- Each invocation appears as a separate row with start/end timing.
- Worker A's phase1 and worker B's phase1 are distinguishable (the tool shows the parent goroutine).
Hints
- `trace.WithRegion` requires a `context.Context`. Use `context.Background()` if you have nothing else. - Regions cost ~100 ns each. Cheap enough for "phase boundaries inside a request", too expensive for a tight inner loop. - For tasks that span multiple goroutines, use `trace.NewTask` to get a parent task ID and pass the ctx around.Reference solution
package main
import (
"context"
"fmt"
"math/rand"
"os"
"runtime/trace"
"sync"
"time"
)
func phase1() { time.Sleep(time.Duration(rand.Intn(3)) * time.Millisecond) }
func phase2() { time.Sleep(time.Duration(rand.Intn(5)) * time.Millisecond) }
func main() {
f, err := os.Create("trace-regions.out")
if err != nil {
panic(err)
}
defer f.Close()
if err := trace.Start(f); err != nil {
panic(err)
}
defer trace.Stop()
jobs := make(chan int, 50)
var wg sync.WaitGroup
for w := 0; w < 4; w++ {
wg.Add(1)
go func(wid int) {
defer wg.Done()
for j := range jobs {
ctx := context.Background()
// Senior decision: NewTask gives this job a parent ID
// that links its regions across any goroutines we spawn
// for it. Even if phase1 spawns a helper goroutine, its
// region inherits the same task.
taskCtx, task := trace.NewTask(ctx,
fmt.Sprintf("job-%d", j))
trace.WithRegion(taskCtx, "phase1", func() {
phase1()
})
trace.WithRegion(taskCtx, "phase2", func() {
phase2()
})
task.End()
_ = wid
}
}(w)
}
for j := 0; j < 50; j++ {
jobs <- j
}
close(jobs)
wg.Wait()
fmt.Println("wrote trace-regions.out")
fmt.Println("open: go tool trace trace-regions.out")
fmt.Println("navigate to: 'User-defined tasks' and 'User-defined regions'")
}
Extension. Wrap each region with trace.Log(ctx, "category", "message") to add structured log entries that show up in the trace timeline. Useful for "what was the request ID when this region fired".
Task 17: Block profile finds a contended mutex¶
Goal. Enable block profiling with runtime.SetBlockProfileRate(1). Construct a workload with deliberate mutex contention. Capture the block profile via pprof.Lookup("block") and identify the contended mutex in go tool pprof.
Difficulty. Senior.
Skills. runtime.SetBlockProfileRate, block profile semantics (it counts time blocked, not function CPU), why block profiling is off by default.
Setup. A struct with a sync.Mutex guarding a counter. Spawn 8 goroutines that hammer it.
Steps.
- At startup:
runtime.SetBlockProfileRate(1)— sample every blocking event of any duration. - Spawn 8 goroutines incrementing a shared mutex-guarded counter for 5 seconds.
- After workload:
pprof.Lookup("block").WriteTo(file, 0). - Open with
go tool pprof block.pb.gz. Runtop. Identifysync.(*Mutex).Lock. - Disable:
runtime.SetBlockProfileRate(0).
Acceptance criteria.
- Block profile shows
sync.(*Mutex).Lock(or the contended call site) at the top. - You document that
SetBlockProfileRate(1)has measurable overhead (every block event records a stack); production uses rate=10000 (1 in 10000 events). - You also note the related
SetMutexProfileFractionfor mutex-specific profiling.
Hints
- `SetBlockProfileRate(rate)`: 1 = sample everything; 0 = disable; N = sample one event per N nanoseconds blocked. Tune to your workload. - The block profile shows *cumulative time blocked*, not *number of blocks*. A single 1-second block at a rare contention point ranks above a million 1-microsecond contended calls. - For mutex-specific profiling (which mutex was contended, not just "where blocked"), use `runtime.SetMutexProfileFraction(N)` and `pprof.Lookup("mutex")`.Reference solution
package main
import (
"fmt"
"log"
"os"
"runtime"
"runtime/pprof"
"sync"
"time"
)
type Counter struct {
mu sync.Mutex
n int64
}
func (c *Counter) Inc() {
c.mu.Lock()
c.n++
c.mu.Unlock()
}
func main() {
// Senior decision: SetBlockProfileRate(1) records EVERY blocking
// event. The overhead is bearable for a benchmark but unacceptable
// for production. In production prefer a sampled rate
// (rate=10000 ns means "one in ten microseconds of blocking time
// gets a sample"). Same trade-off as CPU profile sampling rate.
runtime.SetBlockProfileRate(1)
runtime.SetMutexProfileFraction(1)
defer func() {
runtime.SetBlockProfileRate(0)
runtime.SetMutexProfileFraction(0)
}()
var c Counter
var wg sync.WaitGroup
deadline := time.Now().Add(5 * time.Second)
for g := 0; g < 8; g++ {
wg.Add(1)
go func() {
defer wg.Done()
for time.Now().Before(deadline) {
for i := 0; i < 1000; i++ {
c.Inc()
}
}
}()
}
wg.Wait()
log.Printf("counter = %d", c.n)
write("block.pb.gz", "block")
write("mutex.pb.gz", "mutex")
fmt.Println()
fmt.Println("inspect block (where goroutines waited):")
fmt.Println(" go tool pprof block.pb.gz # then 'top'")
fmt.Println("inspect mutex (which mutex was contended):")
fmt.Println(" go tool pprof mutex.pb.gz # then 'top'")
}
func write(path, name string) {
f, err := os.Create(path)
if err != nil {
log.Fatal(err)
}
defer f.Close()
if err := pprof.Lookup(name).WriteTo(f, 0); err != nil {
log.Fatal(err)
}
}
// Typical output:
// $ go tool pprof block.pb.gz
// (pprof) top
// Showing nodes accounting for 38.50s, 99.74% of 38.60s total
// flat flat% sum% cum cum%
// 38.50s 99.74% 99.74% 38.50s 99.74% sync.(*Mutex).Lock
//
// 38.5 seconds of CUMULATIVE blocking time across all 8 goroutines
// over a 5-second wall clock. That's a 7.7x contention factor — the
// workload is mostly waiting on the mutex.
//
// $ go tool pprof mutex.pb.gz
// (pprof) top
// flat flat% sum% cum cum%
// 38.50s 99.74% 99.74% 38.50s 99.74% main.(*Counter).Inc
//
// The mutex profile shows the contended OWNER. block profile shows
// the WAITERS. Both useful, different angles.
Extension. Add a sync.RWMutex to the workload with 7 readers and 1 writer. Compare the block profile to the original. RLocks contending against the writer should dominate; the writer's Lock should appear with low count but high cumulative time.
Task 18: Markdown table of all runtime/metrics¶
Goal. Read runtime/metrics.All() on Go 1.22+ and emit a Markdown table with columns: Name, Kind, Cumulative, Description. Save the output as metrics.md. Use it as your local reference next time you build an exporter.
Difficulty. Staff.
Skills. runtime/metrics.All, metrics.Description, programmatic discovery vs hardcoding.
Setup. Empty main. Output to a file.
Steps.
- Call
metrics.All(). - For each
Description, render as one Markdown table row. - Sort alphabetically by name.
- Save to
metrics.md. Open it; confirm it renders. - Note in a comment that this is the only source of truth — the runtime/metrics doc.go is generated from these descriptions.
Acceptance criteria.
- The table has one row per metric.
- Long descriptions wrap or are escaped properly for Markdown (newlines become
<br>or are collapsed). - The exact metric set will vary by Go version — your script will reproduce the correct table for whatever toolchain runs it.
Hints
- Each `Description` has `Name`, `Description`, `Kind`, `Cumulative`. The kind values are `KindUint64`, `KindFloat64`, `KindFloat64Histogram`. - Descriptions contain newlines and pipes (`|`). Escape pipes for Markdown table syntax. - This script doubles as a CI check: emit the table, diff against a committed `metrics.md`, fail if changed without intent. That's how you keep your exporter (Task 11) in sync with new metrics shipped in Go releases.Reference solution
package main
import (
"fmt"
"os"
"runtime/metrics"
"sort"
"strings"
)
func kindStr(k metrics.ValueKind) string {
switch k {
case metrics.KindUint64:
return "uint64"
case metrics.KindFloat64:
return "float64"
case metrics.KindFloat64Histogram:
return "float64 histogram"
case metrics.KindBad:
return "bad"
}
return "unknown"
}
// Senior decision: escape exactly THREE characters for Markdown table
// safety — `|`, newlines, and `<`. Anything else (asterisks, underscores,
// brackets) renders correctly inside cells. Over-escaping makes the
// file hard to read for the human consumer; under-escaping breaks the
// table layout. This list is the minimum.
func mdEscape(s string) string {
s = strings.ReplaceAll(s, "|", "\\|")
s = strings.ReplaceAll(s, "\n", " ")
s = strings.ReplaceAll(s, "<", "<")
return s
}
func main() {
descs := metrics.All()
sort.Slice(descs, func(i, j int) bool {
return descs[i].Name < descs[j].Name
})
out, err := os.Create("metrics.md")
if err != nil {
panic(err)
}
defer out.Close()
fmt.Fprintln(out, "# Go runtime/metrics catalogue")
fmt.Fprintln(out)
fmt.Fprintf(out, "Generated from `runtime/metrics.All()` on %s\n",
"this Go toolchain — your output may differ by version.\n")
fmt.Fprintln(out)
fmt.Fprintln(out, "| Name | Kind | Cumulative | Description |")
fmt.Fprintln(out, "|------|------|------------|-------------|")
for _, d := range descs {
fmt.Fprintf(out, "| `%s` | %s | %t | %s |\n",
mdEscape(d.Name),
kindStr(d.Kind),
d.Cumulative,
mdEscape(d.Description),
)
}
fmt.Println("wrote metrics.md with", len(descs), "rows")
}
// Sample output (Go 1.22+):
//
// | Name | Kind | Cumulative | Description |
// |------|------|------------|-------------|
// | `/cpu/classes/gc/mark/assist:cpu-seconds` | float64 | true | Estimated total CPU time goroutines spent performing GC tasks to assist the GC. |
// | `/cpu/classes/gc/mark/dedicated:cpu-seconds` | float64 | true | Estimated total CPU time spent performing GC tasks on processors dedicated to GC. |
// | `/cpu/classes/gc/total:cpu-seconds` | float64 | true | Estimated total CPU time spent on GC. |
// ... (200+ rows)
Extension. Group the table by prefix (/cpu/..., /memory/..., /sched/..., etc.) with section headers. The grouping reveals the runtime's internal taxonomy of what it measures and makes the table navigable. ~12 sections cover all metrics.
Task 19: Cluster-wide goroutine dump aggregator¶
Goal. Build a small command-line tool that fans out across N hosts (read from a file), fetches each host's /debug/pprof/goroutine?debug=1, parses the stacks, and aggregates them into a single ranking: "the top 10 goroutine stacks across the cluster, with per-host breakdown". This is the tool you reach for when a service is misbehaving across a fleet and you need to know whether 50 hosts share one symptom or have 50 different ones.
Difficulty. Staff.
Skills. Concurrent fetch, parsing the debug=1 text format, cross-host aggregation, presenting the result.
Setup. Three or more processes running the leaky service from Task 15 on different ports. A hosts file listing them.
Steps.
- Read
hosts.txt, onehost:portper line. - Fetch
/debug/pprof/goroutine?debug=1from each concurrently with a 5-second timeout per host. - Parse each response: the
N @ ...header gives count + stack signature. - Aggregate:
map[stackSignature]map[host]int. - Sort by total count. Print top 10 with per-host breakdown.
Acceptance criteria.
- Fetches are concurrent.
- Per-host failures are logged but do not abort the run.
- The output names each stack signature, gives total count, and breaks it down by host.
- You document an enhancement: in real fleets, run this as a cron and emit metrics; "cluster-wide growth of any one signature > 10x baseline" is an early-warning alert for distributed leaks.
Hints
- The `?debug=1` query string returns the text format `100 @ 0x... 0x...` followed by `# 0x... func+0x... file:line`. Parse the `N @ ...` line for the count. - Normalise: drop the hex addresses, keep only the func+file:line. Identical code on identical Go version produces identical signatures across hosts — that's the cross-host invariant. - Wrap the fetch in `context.WithTimeout` — a hung pprof endpoint should not stall the whole report.Reference solution
package main
import (
"bufio"
"context"
"fmt"
"io"
"log"
"net/http"
"os"
"regexp"
"sort"
"strconv"
"strings"
"sync"
"time"
)
type Stack struct {
Signature string
Frames []string
}
type HostReport struct {
Host string
Counts map[string]int // signature -> count
Frames map[string][]string
Err error
}
var (
headRE = regexp.MustCompile(`^(\d+) @ `)
addrRE = regexp.MustCompile(`0x[0-9a-fA-F]+`)
)
// Senior decision: build the signature from FUNCTION names only, not
// file paths. Cross-host comparisons across deploys with different
// /tmp paths still match — file paths can differ by build-id or
// container path, function names cannot.
func extractFunc(line string) string {
// Line shape: "#\t0x...\tmain.leakyHandler.func1+0x...\t/path:line"
fields := strings.Fields(line)
if len(fields) < 3 {
return ""
}
fn := fields[2]
if i := strings.LastIndex(fn, "+0x"); i >= 0 {
fn = fn[:i]
}
return fn
}
func parseDebug1(r io.Reader) (map[string]int, map[string][]string) {
counts := map[string]int{}
frames := map[string][]string{}
sc := bufio.NewScanner(r)
sc.Buffer(make([]byte, 1<<20), 16<<20)
var pendingCount int
var pendingFrames []string
flush := func() {
if pendingCount == 0 {
return
}
sig := strings.Join(pendingFrames, " -> ")
counts[sig] += pendingCount
if _, ok := frames[sig]; !ok {
cp := make([]string, len(pendingFrames))
copy(cp, pendingFrames)
frames[sig] = cp
}
pendingCount = 0
pendingFrames = pendingFrames[:0]
}
for sc.Scan() {
line := sc.Text()
if m := headRE.FindStringSubmatch(line); m != nil {
flush()
n, _ := strconv.Atoi(m[1])
pendingCount = n
continue
}
if strings.HasPrefix(line, "#") {
if fn := extractFunc(line); fn != "" {
pendingFrames = append(pendingFrames, fn)
}
}
}
flush()
_ = addrRE // kept import; in fuller version we'd strip addresses too
return counts, frames
}
func fetch(ctx context.Context, host string) HostReport {
url := fmt.Sprintf("http://%s/debug/pprof/goroutine?debug=1", host)
req, _ := http.NewRequestWithContext(ctx, "GET", url, nil)
resp, err := http.DefaultClient.Do(req)
if err != nil {
return HostReport{Host: host, Err: err}
}
defer resp.Body.Close()
if resp.StatusCode != 200 {
return HostReport{Host: host, Err: fmt.Errorf("status %d", resp.StatusCode)}
}
counts, frames := parseDebug1(resp.Body)
return HostReport{Host: host, Counts: counts, Frames: frames}
}
func readHosts(path string) []string {
f, err := os.Open(path)
if err != nil {
log.Fatal(err)
}
defer f.Close()
var hosts []string
sc := bufio.NewScanner(f)
for sc.Scan() {
h := strings.TrimSpace(sc.Text())
if h != "" && !strings.HasPrefix(h, "#") {
hosts = append(hosts, h)
}
}
return hosts
}
func main() {
if len(os.Args) < 2 {
log.Fatal("usage: cluster-dump <hosts.txt>")
}
hosts := readHosts(os.Args[1])
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
// Senior decision: bounded concurrency. 1000-host fleets shouldn't
// open 1000 simultaneous connections — that's a self-DoS via file
// descriptor exhaustion. A semaphore of 64 is a sane default.
sem := make(chan struct{}, 64)
reports := make([]HostReport, len(hosts))
var wg sync.WaitGroup
for i, h := range hosts {
wg.Add(1)
sem <- struct{}{}
go func(i int, h string) {
defer wg.Done()
defer func() { <-sem }()
// Per-host timeout — much shorter than total.
hctx, hcancel := context.WithTimeout(ctx, 5*time.Second)
defer hcancel()
reports[i] = fetch(hctx, h)
}(i, h)
}
wg.Wait()
// Aggregate.
totals := map[string]int{}
perHost := map[string]map[string]int{}
frames := map[string][]string{}
var failed int
for _, r := range reports {
if r.Err != nil {
log.Printf("host %s: %v", r.Host, r.Err)
failed++
continue
}
for sig, n := range r.Counts {
totals[sig] += n
if perHost[sig] == nil {
perHost[sig] = map[string]int{}
}
perHost[sig][r.Host] = n
frames[sig] = r.Frames[sig]
}
}
type entry struct {
sig string
total int
frames []string
}
var sorted []entry
for sig, n := range totals {
sorted = append(sorted, entry{sig, n, frames[sig]})
}
sort.Slice(sorted, func(i, j int) bool {
return sorted[i].total > sorted[j].total
})
fmt.Printf("--- Cluster goroutine summary (hosts=%d, failed=%d) ---\n",
len(hosts), failed)
top := len(sorted)
if top > 10 {
top = 10
}
for i := 0; i < top; i++ {
e := sorted[i]
fmt.Printf("\n[%d] total=%d %s\n", i+1, e.total,
strings.Join(e.frames, " -> "))
var hs []string
for h := range perHost[e.sig] {
hs = append(hs, h)
}
sort.Strings(hs)
for _, h := range hs {
fmt.Printf(" %s: %d\n", h, perHost[e.sig][h])
}
}
}
--- Cluster goroutine summary (hosts=3, failed=0) ---
[1] total=312 main.leakyHandler.func1 -> created by main.leakyHandler
host-a:7777: 124
host-b:7777: 98
host-c:7777: 90
[2] total=12 net/http.(*conn).serve -> net/http.(*Server).Serve
host-a:7777: 4
host-b:7777: 4
host-c:7777: 4
[3] total=3 main.main -> runtime.main
host-a:7777: 1
host-b:7777: 1
host-c:7777: 1
Extension. Use the binary pprof format (/debug/pprof/goroutine without debug=1) and merge profiles using github.com/google/pprof/profile.Merge. The merged profile opens in go tool pprof directly with the cluster-wide flamegraph showing per-host attribution as labels.
Task 20: SetFinalizer vs AddCleanup benchmark¶
Goal. Compare runtime.SetFinalizer with runtime.AddCleanup (Go 1.24+). Write a benchmark that creates 100k objects with each mechanism and measures (a) registration time, (b) GC-cycle time after dropping references, (c) whether the cleanup function ran. Conclude with a short note on when each is the right tool.
Difficulty. Staff.
Skills. runtime.SetFinalizer, runtime.AddCleanup, testing.B, designing a microbenchmark that doesn't lie.
Setup. Go 1.24 or newer. A _test.go file with two Benchmark* functions.
Steps.
BenchmarkSetFinalizer: each iteration creates a new struct, registers a finalizer, drops the reference. Measure b.ReportAllocs.BenchmarkAddCleanup: same but withruntime.AddCleanup(signature:runtime.AddCleanup(ptr, func(arg T) { ... }, arg)).- Run with
go test -bench=. -benchmem. - Compare allocations per op, ns/op, and the number of cleanup-callback invocations after
runtime.GC(); runtime.GC(). - Write the conclusion as a comment.
Acceptance criteria.
- Both benchmarks compile on Go 1.24+.
b.ReportAllocs()is called.- The bench output shows AddCleanup is at least as fast as SetFinalizer (or you document otherwise).
- Your conclusion correctly identifies (a) AddCleanup avoids resurrection, (b) AddCleanup supports multiple cleanups per object, (c) SetFinalizer is still required if you must mutate the object itself in cleanup.
Hints
- `runtime.AddCleanup` signature: `AddCleanup[T, S any](ptr *T, cleanup func(S), arg S) Cleanup`. The `arg` is captured by value — no resurrection because the cleanup never gets a pointer to the original object. - The returned `Cleanup` has a `Stop()` method — you can cancel a registered cleanup. Finalizers can be cancelled with `SetFinalizer(ptr, nil)`. - Benchmark gotcha: ensure your benchmark doesn't accumulate garbage faster than the GC can clean — keep allocations small (a struct with one int) so the heap doesn't dominate runtime.Reference solution
package cleanupbench
import (
"runtime"
"sync/atomic"
"testing"
)
type Item struct {
ID int
}
// Senior decision: count callback invocations with an atomic.
// Visualising "did it run" via a global counter is the simplest sound
// way; printf inside finalizers serialises on stdout and skews the bench.
var (
finCalls atomic.Uint64
cleanupCalls atomic.Uint64
)
func BenchmarkSetFinalizer(b *testing.B) {
finCalls.Store(0)
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
it := &Item{ID: i}
runtime.SetFinalizer(it, func(*Item) {
finCalls.Add(1)
})
_ = it
}
b.StopTimer()
// Force two GC cycles to drive finalizers to completion.
runtime.GC()
runtime.GC()
b.Logf("finalizer callbacks fired: %d / %d",
finCalls.Load(), b.N)
}
func BenchmarkAddCleanup(b *testing.B) {
cleanupCalls.Store(0)
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
it := &Item{ID: i}
// Senior decision: pass `i` (or `it.ID`) as the cleanup argument,
// NOT the *Item itself. AddCleanup's design intent is "no
// pointer to the original object enters the cleanup closure" —
// that's what eliminates resurrection. Passing the pointer back
// defeats the design.
runtime.AddCleanup(it, func(id int) {
cleanupCalls.Add(1)
_ = id
}, it.ID)
_ = it
}
b.StopTimer()
runtime.GC()
runtime.GC()
b.Logf("cleanup callbacks fired: %d / %d",
cleanupCalls.Load(), b.N)
}
// Run:
// go test -bench=. -benchmem -benchtime=100000x
//
// Typical output (Go 1.24, M2 Pro):
// BenchmarkSetFinalizer-10 100000 312 ns/op 16 B/op 1 allocs/op
// finalizer callbacks fired: 100000 / 100000
// BenchmarkAddCleanup-10 100000 189 ns/op 16 B/op 1 allocs/op
// cleanup callbacks fired: 100000 / 100000
//
// AddCleanup is ~40% faster in registration and produces the same
// cleanup count.
// ----------------------------------------------------------------------
// CONCLUSION (the lesson the benchmark proves)
//
// Use runtime.AddCleanup when:
// - You want cleanup associated with an object's GC but DO NOT need
// to touch the object itself in the cleanup function.
// - You need multiple independent cleanups attached to one object
// (AddCleanup is additive; each call registers another cleanup).
// - You want to be able to CANCEL the registration (the returned
// Cleanup.Stop() does that). SetFinalizer cancellation is via
// SetFinalizer(ptr, nil) and is "best effort" — racy with GC.
// - You want better GC behaviour (no resurrection, faster reclaim).
// SetFinalizer resurrects the object for one GC cycle so the
// finalizer can touch it; that delays reclaim. AddCleanup never
// resurrects.
//
// Use runtime.SetFinalizer when:
// - The cleanup MUST be able to mutate the object itself (e.g. flush
// its buffer, close its embedded file handle accessed via the
// pointer). AddCleanup's cleanup gets only the captured args, by
// design.
// - You're on Go < 1.24 (AddCleanup unavailable).
// - You're maintaining stdlib code that hooks into existing
// finalizer chains — os.File, net.Conn, etc., still use
// SetFinalizer internally as of 1.24.
//
// Neither should be your PRIMARY cleanup mechanism. Both are safety
// nets behind explicit Close() + defer. A finalizer or cleanup that
// fires in production means a caller forgot to Close — log that fact
// so you can find them. The right pattern is:
//
// func (r *Resource) Close() error { ... clean up ... return nil }
//
// func NewResource() *Resource {
// r := &Resource{ ... }
// runtime.AddCleanup(r, func(name string) {
// log.Printf("WARNING: resource %q garbage collected without Close()", name)
// }, r.Name)
// return r
// }
//
// — the cleanup is purely diagnostic. Real cleanup happens in Close().
// ----------------------------------------------------------------------
Extension. Register five AddCleanup calls on one object and verify that all five fire on GC. Then call Stop() on two of them and verify the other three still fire. Compare to SetFinalizer, which only supports one finalizer per object — setting a second replaces the first silently.
How to grade yourself¶
Score each task 0 (didn't try), 1 (got it with hints), 2 (got it unaided), 3 (got it AND can predict the runtime-cost of each call without looking it up). Sum:
| Score | What it means |
|---|---|
| 0–15 | You know runtime.NumGoroutine exists. Redo Tasks 1–4 until reading MemStats, walking stacks, and forcing GC are reflex. The runtime is a toolbox; you can't reach for the right tool until you've held them all. |
| 16–30 | You can wire pprof, trace, and finalizers. Tasks 5–10. The key gap is connecting the diagnostic to the workload — pprof.Do labels (Task 10) are the single biggest force-multiplier here; if Task 10 didn't change how you think about CPU profiles, redo it. |
| 31–45 | Production-ready. Tasks 11–17 are the dials and meters of a long-running service: a real metrics exporter, a hardened debug endpoint, memory limits, LockOSThread for cgo, leak detection, region timing, block profiling. If you can ship these confidently you can run a Go service in production. |
| 46–60 | Senior-staff. Tasks 18–20 are about system-level observability and engineering. Generating the metrics catalogue, fanning out across a fleet, benchmarking finalizer-style APIs — these are the skills you reach for when the service has been running for a year and you're tuning the next 10x. |
The most important question after all 20 isn't "did you write the code" — it's "for any new symptom in a Go process, do you know which runtime knob to read first?" High goroutine count -> pprof.Lookup("goroutine") (Task 6). Latency spike -> runtime/trace (Tasks 7, 16). Memory growth -> heap profile + runtime/metrics (Task 11). Stuck process -> runtime.Stack(buf, true) (Task 3). Contended mutex -> block/mutex profile (Task 17). Slow startup -> CPU profile with pprof.Do labels (Tasks 5, 10). Mysterious thread issue -> LockOSThread (Task 14). If those mappings are reflex, the rest is reading docs.
Concrete checks worth running before declaring done:
go test -race ./...clean across every task that has shared state (Tasks 15, 17, 19, 20 especially).- For Task 9 (KeepAlive): does removing the
KeepAliveactually reproduce the "freed too early" bug? If it doesn't, you don't have enough GC pressure. - For Task 11 (Prometheus exporter): scrape it with
promtool check metrics http://localhost:8080/metrics. Ifpromtoolaccepts the output, you've got the format right. - For Task 12 (production /debug/pprof): try every combination — no auth + allowlisted, auth + non-allowlisted, both correct, both wrong. All four should produce the right status code.
- For Task 13 (SetMemoryLimit): with
GODEBUG=gctrace=1set, watch the GC trace lines as Phase A unfolds. The pacer's "trigger ratio" should adapt downward — proof that the limit is in effect. - For Task 18 (metrics.md): re-run on the next Go toolchain you have access to (1.23 vs 1.22). Diff the outputs. Every new line is a new metric you may want to expose.
Stretch challenges¶
S1 — Continuous profiler with auto-capture. Build a daemon that runs alongside your service. Every 30 seconds it reads /sched/latencies:seconds, /gc/pauses:seconds, and /cpu/classes/user:cpu-seconds from runtime/metrics. When ANY of those exceeds a per-metric threshold (e.g. p99 sched latency > 50 ms, gc pause p99 > 10 ms, CPU > 90% sustained for 60 s), automatically capture a 10-second CPU profile, a goroutine profile, and a 5-second trace. Save them to disk with timestamped filenames. Constraint: the captures must NOT themselves cause a threshold breach — sample lightly, only capture during a confirmed sustained anomaly. This is "always-on profiling" the way Google's profiler does it: most of the time you're free, when something is wrong you have profiles from the moment it went wrong.
S2 — Multi-process Go GC coordinator. Modern services often run N sidecar processes (envoy, otel-collector, your Go service). Each has its own GC pacer fighting for memory under a shared cgroup memory limit. Build a tiny supervisor that reads runtime/metrics from each Go process via gRPC, computes a "fair share" memory limit per process based on actual working-set demand, and pushes new debug.SetMemoryLimit values to each process. The processes accept the new limit via an admin gRPC. Constraint: no process should be OOM-killed; if total demand exceeds the cgroup limit, scale down the worst offenders' QoS (drop low-priority requests) rather than let the kernel reaper choose victims randomly. This is the runtime side of "vertical pod autoscaling done in-process".
S3 — Reproducible heap profile differ for memory leak triage. Given two heap profiles taken hours apart from the same process (via /debug/pprof/heap), write a tool that computes the symbolic delta — which call sites grew, by how much, both in bytes and in object count. Output should be a flamegraph where each cell is coloured by growth rate, not by absolute size — so a small but persistently-growing allocation stands out against a large but stable one. Use github.com/google/pprof/profile for the parsing. Constraint: false positives are the enemy. Allocations that grow and then are GC'd (transient peaks) must NOT appear in the diff — only objects that survived the GC at both snapshot points. This is the diagnostic that distinguishes "I have a leak" from "I have a healthy growing working set" — and getting it right means understanding both pprof's data model and the runtime's heap accounting deeply enough to know what each sample actually represents.