Profile-Guided Optimization (PGO) — Professional¶
1. The production framing¶
In a real organization, PGO is not "I ran curl > default.pgo once and forgot it." It is a small, dedicated piece of release engineering with these responsibilities:
- Capture a representative profile from production on a schedule.
- Validate the profile (sample count, stale ratio, hot-path coverage) before committing.
- Commit the profile through the same review process as code.
- Build with
-pgo=autoin CI by default; have-pgo=offas a baseline lane for A/B. - Measure the impact via canary + benchmark suite; gate releases on regression.
- Roll back the profile change if a release goes wrong (it is one file).
What follows is what that looks like in practice.
2. Where default.pgo lives in the repo¶
| Repo shape | Profile location |
|---|---|
| Single-binary repo | ./default.pgo (next to main.go) |
Monorepo with cmd/<name> | ./cmd/<name>/default.pgo, one per binary |
| Library only | No default.pgo — PGO applies to main packages |
| Multiple build tags | One profile per dominant tag combination if profiles diverge |
Treat default.pgo as source code: it is reviewed, committed, blamed, reverted. Git LFS is unnecessary at typical profile sizes (< 5 MiB).
3. Refresh cadence policy¶
Concrete options used in industry:
| Cadence | When it fits |
|---|---|
| Every release | Weekly or bi-weekly release train; profile captured in a release-prep step |
| Weekly via CI cron | Long-lived release branches |
| Monthly | Mature, slow-changing service |
| On-demand only | Small/internal services where PGO is "nice to have" |
A reasonable default: refresh weekly, automated via a scheduled CI job that captures, validates, and opens a PR with the updated profile.
4. The refresh script¶
#!/usr/bin/env bash
# scripts/refresh-pgo.sh
set -euo pipefail
POD=$(kubectl -n prod get pod -l app=myservice -o name | head -1)
DURATION=${DURATION:-90}
OUTPUT=./cmd/myservice/default.pgo
TMP=$(mktemp)
kubectl -n prod port-forward "${POD}" 6060:6060 &
PF_PID=$!
trap "kill ${PF_PID}" EXIT
sleep 2
curl --fail --silent --max-time $((DURATION + 30)) \
-o "${TMP}" \
"http://127.0.0.1:6060/debug/pprof/profile?seconds=${DURATION}"
# sanity: at least 10 KiB and parseable
[ "$(stat -f %z "${TMP}" 2>/dev/null || stat -c %s "${TMP}")" -gt 10240 ]
go tool pprof -top -nodecount=1 "${TMP}" > /dev/null
mv "${TMP}" "${OUTPUT}"
echo "Updated ${OUTPUT}"
CI runs this, then opens a PR titled pgo: refresh production profile (YYYY-MM-DD). Two-line diff, one binary file changed, easy to review.
5. Aggregating profiles across instances¶
A single pod's profile is noisier than the fleet aggregate. Merging:
for pod in $(kubectl -n prod get pod -l app=myservice -o name); do
kubectl -n prod port-forward "${pod}" 6060:6060 &
sleep 1
curl -o "samples/${pod//\//_}.pgo" \
"http://127.0.0.1:6060/debug/pprof/profile?seconds=60" || true
pkill -f "port-forward ${pod}" || true
done
go tool pprof -proto samples/*.pgo > cmd/myservice/default.pgo
Merging across 5–10 instances gives a stable profile and dampens any per-pod weirdness (one pod stuck on a slow client, etc.).
6. Continuous profiling integration¶
Continuous profiling platforms (Pyroscope, Parca, Grafana Cloud Profiles, Polar Signals, Google Cloud Profiler) keep a rolling history of CPU profiles. They expose APIs to download the merged profile for a time window — exactly what PGO wants.
Pyroscope¶
curl -G "https://pyroscope.example.com/render" \
--data-urlencode "query=process_cpu:cpu:nanoseconds:cpu:nanoseconds{service_name=\"myapp\"}" \
--data-urlencode "from=now-1h" \
--data-urlencode "until=now" \
--data-urlencode "format=pprof" \
-o cmd/myapp/default.pgo
Parca¶
parca-cli profile download \
--query 'process_cpu:cpu:nanoseconds:cpu:nanoseconds{job="myapp"}' \
--from "$(date -u -d '1 hour ago' +%FT%TZ)" \
--to "$(date -u +%FT%TZ)" \
--output cmd/myapp/default.pgo
Google Cloud Profiler¶
gcloud profiler profiles download \
--service=myapp --profile-type=CPU \
--duration=1h --output-file=cmd/myapp/default.pgo
The advantage: no per-pod port-forwarding, no manual capture, and you can window the data to peak hours.
7. CI / CD pipeline¶
# .github/workflows/build.yml
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with: { go-version: '1.24' }
- name: Validate profile (if present)
run: |
if [ -f ./cmd/myapp/default.pgo ]; then
go tool pprof -top -nodecount=1 ./cmd/myapp/default.pgo > /dev/null
fi
- name: Build with PGO
run: |
go build -trimpath \
-ldflags="-s -w -X main.version=${{ github.sha }}" \
-o ./bin/myapp ./cmd/myapp
- name: Verify PGO embedded
run: |
go version -m ./bin/myapp | grep -E 'build\s+-pgo=' || \
(echo "PGO not applied"; exit 1)
refresh-pgo:
runs-on: ubuntu-latest
if: github.event_name == 'schedule'
steps:
- uses: actions/checkout@v4
- run: bash scripts/refresh-pgo.sh
- uses: peter-evans/create-pull-request@v5
with:
branch: pgo/auto-refresh
title: "pgo: refresh production profile"
commit-message: "pgo: refresh production profile"
The build job is normal; the scheduled job opens a PR with the updated profile. A human reviews and merges.
8. A/B comparison in CI¶
To prove PGO is still paying for itself, run a benchmark suite in both modes on every release.
- name: Bench (baseline, no PGO)
run: go test -run=^$ -bench=. -count=8 -pgo=off ./bench/... | tee bench-baseline.txt
- name: Bench (with PGO)
run: go test -run=^$ -bench=. -count=8 -pgo=auto ./bench/... | tee bench-pgo.txt
- name: Compare
run: |
go install golang.org/x/perf/cmd/benchstat@latest
benchstat bench-baseline.txt bench-pgo.txt | tee bench-diff.txt
Archive bench-diff.txt as a CI artifact. Plot the delta over time. If PGO's measured gain drops toward zero, the profile is stale or the code has shifted away from interface/inline-friendly patterns.
9. Regression detection¶
PGO can regress: a new release's profile may push the inliner into a pessimal choice.
Three signals to watch in the canary:
| Signal | Source | Action threshold |
|---|---|---|
| Request CPU (median) | Prometheus | > 5 % worse than non-PGO |
| Binary size | ls -l | > 5 % bigger than expected |
| Compile time (CI) | CI job duration | > 25 % slower |
When a canary regresses, rollback is one of two PRs:
# rollback the profile only
git revert <profile-commit-sha>
# build without PGO (escape hatch)
go build -pgo=off ./cmd/myapp
The second is the emergency: "ship now, investigate later."
10. Multi-environment profiles¶
A common question: should staging and production share a profile?
| Setup | Recommendation |
|---|---|
| Staging mirrors prod traffic | Yes, share default.pgo |
| Staging is synthetic | No — staging profile would mislead prod |
| Multiple prod regions, similar workload | Merge into one default.pgo |
| Multiple prod tiers (free / paid) | Either merge or maintain two binaries; usually merge |
The general rule: capture from the workload you want to optimize. If staging doesn't look like prod, staging's profile doesn't help prod's binary.
11. Security and review concerns¶
Two things to know.
- The profile contains function names. It is essentially a list of function and file paths from your repo. If your repo is public, that's already public. If it's private, treat the profile as sensitive (commit to the private repo only).
- PRs that change
default.pgoare binary diffs and look unreviewable. Add a comment to the PR: where the profile was captured, what time window, what fleet size. Reviewer reads the metadata, accepts on trust.
Title: pgo: refresh production profile
Body:
- Source: prod-us-east, 12 pods
- Window: 2026-05-20 14:00–15:00 UTC (peak hour)
- Duration per pod: 60 s
- Aggregated via: go tool pprof -proto
- Top function (pre): handlers.serveRequest 8.4 %
- Top function (post): handlers.serveRequest 9.1 %
Six lines of context turn a blob commit into something a human can evaluate.
12. Bootstrap problem¶
The first time you enable PGO, you have no profile yet. The bootstrap:
- Build and deploy a non-PGO release. (Same code,
-pgo=off.) - Wait for it to take production traffic at peak.
- Capture a profile.
- Commit as
default.pgo. - Future builds pick it up automatically via
-pgo=auto.
There is no chicken-and-egg problem: PGO is strictly additive. You ship without it, then turn it on.
13. Documenting PGO in your runbook¶
A one-page section in your service runbook:
PGO
---
- Profile location: cmd/myapp/default.pgo
- Refresh cadence: weekly (via scheduled CI job)
- Capture source: production peak hour, aggregated across all pods
- Expected gain: 4–6 % CPU vs -pgo=off
- Rollback: git revert the profile commit
- Emergency disable: build with -pgo=off
- Where to look if something breaks: bench-diff.txt in CI artifacts
A new on-call engineer reads this, understands the contract in 60 seconds.
14. The "do we even need it?" calculation¶
PGO has a cost: extra CI minutes, extra review surface, one extra committed file. Run the math:
- Service runs on 200 pods, each costing $50/month → $10 000/month compute.
- PGO saves 5 % CPU → 5 % fewer pods → $500/month saved.
- Engineer time to set up + maintain: ~4 hours/year. Roughly $400 one-time + maintenance.
- ROI: positive after the first month.
For small services (<10 pods), the math may go the other way. PGO is most worth it for services with significant CPU spend and interface- or call-graph-heavy hot paths.
15. Summary¶
Production PGO is release engineering, not a magic incantation. The profile is committed source, refreshed on a cadence, captured from real traffic (ideally via a continuous profiler), validated in CI, A/B-benchmarked on every release, and rolled back with a single revert if it misbehaves. Document the contract in the runbook. The whole apparatus is small but disciplined — and rewards services with substantial CPU footprints.
Further reading¶
- Official PGO guide: https://go.dev/doc/pgo
- Pyroscope: https://pyroscope.io
- Parca: https://www.parca.dev
- Grafana Cloud Profiles: https://grafana.com/products/cloud/profiles-for-continuous-profiling/
- Google Cloud Profiler: https://cloud.google.com/profiler
benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat