Profile-Guided Optimization (PGO) — Hands-on Tasks¶
Work through these in order. Each has explicit acceptance criteria. Use Go 1.21+ (1.22+ recommended for the best PGO behavior).
Task 1: Capture a CPU profile from a Go test¶
Write a small package with at least one CPU-bound benchmark. Capture a profile of the benchmark run.
Acceptance criteria - [ ] You run go test -run=^$ -bench=. -cpuprofile=cpu.pgo -benchtime=10s ./pkg/yours successfully. - [ ] You inspect with go tool pprof -top cpu.pgo and identify at least three functions in the top of the profile. - [ ] You describe in one sentence why a -benchtime=10s capture gives more useful data than -benchtime=1s.
Task 2: Build with PGO using -pgo=auto¶
Set up a main package and place the captured profile alongside it.
Acceptance criteria - [ ] You create cmd/myapp/main.go containing the hot function from Task 1. - [ ] You move the profile to cmd/myapp/default.pgo. - [ ] go build -pgo=auto ./cmd/myapp succeeds. - [ ] go version -m ./myapp | grep pgo shows the absolute path of default.pgo (not off).
Task 3: Bench before/after PGO¶
Use the same benchmark from Task 1 and run it with and without PGO.
Acceptance criteria - [ ] You run the bench with -pgo=off and -pgo=auto, each -count=10 -benchtime=2s. - [ ] You use benchstat baseline.txt pgo.txt to compute the delta. - [ ] The delta is statistically significant (p < 0.05) or you explain why your particular workload doesn't benefit (cgo, reflection, GC-bound, microbench).
Task 4: Inspect inline decisions¶
Use -gcflags='-m=2' to compare inlining decisions between PGO and non-PGO builds.
Acceptance criteria - [ ] You produce off.txt and on.txt from go build -gcflags='-m=2' -pgo=off/auto. - [ ] You diff them and identify at least one function that PGO inlines but the non-PGO build does not. - [ ] You explain why the compiler made that choice (which call site sample share crossed the budget).
Task 5: Observe devirtualization in objdump¶
Write code with an interface call where one concrete type clearly dominates (e.g., 95 %+ of calls in the benchmark go through *RedisCache mock).
Acceptance criteria - [ ] The interface call site exists in your hot path; the bench drives it heavily toward one concrete type. - [ ] After a PGO build, go tool objdump -s 'YourHandler' shows a type-tag check followed by a direct call (rather than a single indirect call). - [ ] You document the assembly snippet and label the type-check branch. - [ ] The non-PGO build of the same code shows only the indirect call.
Task 6: Build a profile-refresh script¶
Write a shell script that captures a profile from a running local server (using net/http/pprof) and validates it before overwriting default.pgo.
Acceptance criteria - [ ] The script captures via curl from localhost:6060/debug/pprof/profile?seconds=60. - [ ] It validates: file size > 10 KiB and go tool pprof -top -nodecount=1 parses it. - [ ] On failure, it leaves the existing default.pgo untouched. - [ ] On success, it mvs the temp file into place.
Task 7: A/B test PGO impact¶
Run two versions of the same binary side-by-side (e.g., on two ports) and drive load against both.
Acceptance criteria - [ ] You build myapp.no-pgo and myapp.pgo. - [ ] You drive identical synthetic load against each (e.g., hey -n 100000 -c 50). - [ ] You measure throughput (req/s) and median/P99 latency for each. - [ ] You write up the comparison in 5–10 lines: gain, statistical confidence, any surprises.
Task 8: Merge multiple profiles¶
Capture three profiles (different times, different load shapes) and merge them.
Acceptance criteria - [ ] You capture p1.pgo, p2.pgo, p3.pgo with at least 30-second windows each. - [ ] You run go tool pprof -proto p1.pgo p2.pgo p3.pgo > merged.pgo. - [ ] You verify the merged file is approximately the sum of sample counts of the three inputs. - [ ] You use the merged profile for your next PGO build.
Task 9: Identify a stale profile¶
Capture a profile, then refactor the code (rename two hot functions). Rebuild with the old profile.
Acceptance criteria - [ ] You rename the functions and the profile is now stale. - [ ] The PGO build emits a warning about stale samples. - [ ] You quantify the fraction of stale samples (read the warning or use go tool pprof -top to inspect names and cross-reference with the source). - [ ] You refresh the profile and confirm the warning disappears.
Task 10: Set up CI with PGO¶
Add a CI job to a small Go project that builds with PGO and verifies via go version -m.
Acceptance criteria - [ ] The CI YAML includes a step go build -pgo=auto. - [ ] A subsequent step asserts go version -m ./bin/app | grep -E 'build\s+-pgo=' | grep -v '=off'. - [ ] A failure case (delete default.pgo) causes the assertion to fail.
Task 11: Continuous A/B benchmark in CI¶
Add a CI step that runs your benchmarks in both -pgo=off and -pgo=auto modes and outputs a benchstat diff.
Acceptance criteria - [ ] Two go test -bench runs produce bench-off.txt and bench-on.txt. - [ ] benchstat diff is generated and uploaded as a CI artifact. - [ ] You include an example artifact (text file) showing a measurable delta.
Task 12: PGO with cgo — observe the floor¶
Write a Go function that delegates 100 % of its work to a C function via cgo, and verify PGO does nothing useful.
Acceptance criteria - [ ] You write a small cgo-using benchmark whose hot path is a cgo call. - [ ] PGO build vs no-PGO shows a delta of < 1 % (within noise). - [ ] You write a one-paragraph explanation referencing the fact that cgo time is C code, not Go code, and PGO only touches the Go side.
Stretch — Task 13: Build a full continuous-profiling-to-PGO pipeline¶
Pick a continuous profiling tool (Pyroscope, Parca, or similar). Set it up against a local Go service. Build a tool that pulls the merged profile for the last hour and commits it as default.pgo.
Acceptance criteria - [ ] You can run make refresh-pgo and it pulls a fresh profile. - [ ] The pulled profile passes validation (go tool pprof -top works). - [ ] You commit it to a sandbox repo and confirm subsequent go build uses it. - [ ] You document the end-to-end flow in 10–20 lines for a teammate to reproduce.
Submission¶
Each task should produce:
- A short writeup (5–15 lines) of what you observed.
- The code, configuration, or script you wrote.
- The benchmark, profile, or build output that backs your conclusions.
These artifacts are what turn "I read about PGO" into "I can operate it in production."