Profile-Guided Optimization (PGO) — Professional¶

1. The production framing¶

In a real organization, PGO is not "I ran curl > default.pgo once and forgot it." It is a small, dedicated piece of release engineering with these responsibilities:

Capture a representative profile from production on a schedule.
Validate the profile (sample count, stale ratio, hot-path coverage) before committing.
Commit the profile through the same review process as code.
Build with -pgo=auto in CI by default; have -pgo=off as a baseline lane for A/B.
Measure the impact via canary + benchmark suite; gate releases on regression.
Roll back the profile change if a release goes wrong (it is one file).

What follows is what that looks like in practice.

2. Where `default.pgo` lives in the repo¶

Repo shape	Profile location
Single-binary repo	`./default.pgo` (next to `main.go`)
Monorepo with `cmd/<name>`	`./cmd/<name>/default.pgo`, one per binary
Library only	No `default.pgo` — PGO applies to `main` packages
Multiple build tags	One profile per dominant tag combination if profiles diverge

Treat default.pgo as source code: it is reviewed, committed, blamed, reverted. Git LFS is unnecessary at typical profile sizes (< 5 MiB).

3. Refresh cadence policy¶

Concrete options used in industry:

Cadence	When it fits
Every release	Weekly or bi-weekly release train; profile captured in a release-prep step
Weekly via CI cron	Long-lived release branches
Monthly	Mature, slow-changing service
On-demand only	Small/internal services where PGO is "nice to have"

A reasonable default: refresh weekly, automated via a scheduled CI job that captures, validates, and opens a PR with the updated profile.

4. The refresh script¶

#!/usr/bin/env bash
# scripts/refresh-pgo.sh
set -euo pipefail

POD=$(kubectl -n prod get pod -l app=myservice -o name | head -1)
DURATION=${DURATION:-90}
OUTPUT=./cmd/myservice/default.pgo
TMP=$(mktemp)

kubectl -n prod port-forward "${POD}" 6060:6060 &
PF_PID=$!
trap "kill ${PF_PID}" EXIT
sleep 2

curl --fail --silent --max-time $((DURATION + 30)) \
  -o "${TMP}" \
  "http://127.0.0.1:6060/debug/pprof/profile?seconds=${DURATION}"

# sanity: at least 10 KiB and parseable
[ "$(stat -f %z "${TMP}" 2>/dev/null || stat -c %s "${TMP}")" -gt 10240 ]
go tool pprof -top -nodecount=1 "${TMP}" > /dev/null

mv "${TMP}" "${OUTPUT}"
echo "Updated ${OUTPUT}"

CI runs this, then opens a PR titled pgo: refresh production profile (YYYY-MM-DD). Two-line diff, one binary file changed, easy to review.

5. Aggregating profiles across instances¶

A single pod's profile is noisier than the fleet aggregate. Merging:

for pod in $(kubectl -n prod get pod -l app=myservice -o name); do
  kubectl -n prod port-forward "${pod}" 6060:6060 &
  sleep 1
  curl -o "samples/${pod//\//_}.pgo" \
    "http://127.0.0.1:6060/debug/pprof/profile?seconds=60" || true
  pkill -f "port-forward ${pod}" || true
done

go tool pprof -proto samples/*.pgo > cmd/myservice/default.pgo

Merging across 5–10 instances gives a stable profile and dampens any per-pod weirdness (one pod stuck on a slow client, etc.).

6. Continuous profiling integration¶

Continuous profiling platforms (Pyroscope, Parca, Grafana Cloud Profiles, Polar Signals, Google Cloud Profiler) keep a rolling history of CPU profiles. They expose APIs to download the merged profile for a time window — exactly what PGO wants.

Pyroscope¶

curl -G "https://pyroscope.example.com/render" \
  --data-urlencode "query=process_cpu:cpu:nanoseconds:cpu:nanoseconds{service_name=\"myapp\"}" \
  --data-urlencode "from=now-1h" \
  --data-urlencode "until=now" \
  --data-urlencode "format=pprof" \
  -o cmd/myapp/default.pgo

Parca¶

parca-cli profile download \
  --query 'process_cpu:cpu:nanoseconds:cpu:nanoseconds{job="myapp"}' \
  --from "$(date -u -d '1 hour ago' +%FT%TZ)" \
  --to "$(date -u +%FT%TZ)" \
  --output cmd/myapp/default.pgo

Google Cloud Profiler¶

gcloud profiler profiles download \
  --service=myapp --profile-type=CPU \
  --duration=1h --output-file=cmd/myapp/default.pgo

The advantage: no per-pod port-forwarding, no manual capture, and you can window the data to peak hours.

7. CI / CD pipeline¶

# .github/workflows/build.yml
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with: { go-version: '1.24' }

      - name: Validate profile (if present)
        run: |
          if [ -f ./cmd/myapp/default.pgo ]; then
            go tool pprof -top -nodecount=1 ./cmd/myapp/default.pgo > /dev/null
          fi

      - name: Build with PGO
        run: |
          go build -trimpath \
            -ldflags="-s -w -X main.version=${{ github.sha }}" \
            -o ./bin/myapp ./cmd/myapp

      - name: Verify PGO embedded
        run: |
          go version -m ./bin/myapp | grep -E 'build\s+-pgo=' || \
            (echo "PGO not applied"; exit 1)

  refresh-pgo:
    runs-on: ubuntu-latest
    if: github.event_name == 'schedule'
    steps:
      - uses: actions/checkout@v4
      - run: bash scripts/refresh-pgo.sh
      - uses: peter-evans/create-pull-request@v5
        with:
          branch: pgo/auto-refresh
          title: "pgo: refresh production profile"
          commit-message: "pgo: refresh production profile"

The build job is normal; the scheduled job opens a PR with the updated profile. A human reviews and merges.

8. A/B comparison in CI¶

To prove PGO is still paying for itself, run a benchmark suite in both modes on every release.

- name: Bench (baseline, no PGO)
  run: go test -run=^$ -bench=. -count=8 -pgo=off ./bench/... | tee bench-baseline.txt

- name: Bench (with PGO)
  run: go test -run=^$ -bench=. -count=8 -pgo=auto ./bench/... | tee bench-pgo.txt

- name: Compare
  run: |
    go install golang.org/x/perf/cmd/benchstat@latest
    benchstat bench-baseline.txt bench-pgo.txt | tee bench-diff.txt

Archive bench-diff.txt as a CI artifact. Plot the delta over time. If PGO's measured gain drops toward zero, the profile is stale or the code has shifted away from interface/inline-friendly patterns.

9. Regression detection¶

PGO can regress: a new release's profile may push the inliner into a pessimal choice.

Three signals to watch in the canary:

Signal	Source	Action threshold
Request CPU (median)	Prometheus	> 5 % worse than non-PGO
Binary size	`ls -l`	> 5 % bigger than expected
Compile time (CI)	CI job duration	> 25 % slower

When a canary regresses, rollback is one of two PRs:

# rollback the profile only
git revert <profile-commit-sha>

# build without PGO (escape hatch)
go build -pgo=off ./cmd/myapp

The second is the emergency: "ship now, investigate later."

10. Multi-environment profiles¶

A common question: should staging and production share a profile?

Setup	Recommendation
Staging mirrors prod traffic	Yes, share `default.pgo`
Staging is synthetic	No — staging profile would mislead prod
Multiple prod regions, similar workload	Merge into one `default.pgo`
Multiple prod tiers (free / paid)	Either merge or maintain two binaries; usually merge

The general rule: capture from the workload you want to optimize. If staging doesn't look like prod, staging's profile doesn't help prod's binary.

11. Security and review concerns¶

Two things to know.

The profile contains function names. It is essentially a list of function and file paths from your repo. If your repo is public, that's already public. If it's private, treat the profile as sensitive (commit to the private repo only).
PRs that change default.pgo are binary diffs and look unreviewable. Add a comment to the PR: where the profile was captured, what time window, what fleet size. Reviewer reads the metadata, accepts on trust.

Title: pgo: refresh production profile
Body:
- Source: prod-us-east, 12 pods
- Window: 2026-05-20 14:00–15:00 UTC (peak hour)
- Duration per pod: 60 s
- Aggregated via: go tool pprof -proto
- Top function (pre): handlers.serveRequest 8.4 %
- Top function (post): handlers.serveRequest 9.1 %

Six lines of context turn a blob commit into something a human can evaluate.

12. Bootstrap problem¶

The first time you enable PGO, you have no profile yet. The bootstrap:

Build and deploy a non-PGO release. (Same code, -pgo=off.)
Wait for it to take production traffic at peak.
Capture a profile.
Commit as default.pgo.
Future builds pick it up automatically via -pgo=auto.

There is no chicken-and-egg problem: PGO is strictly additive. You ship without it, then turn it on.

13. Documenting PGO in your runbook¶

A one-page section in your service runbook:

PGO
---
- Profile location: cmd/myapp/default.pgo
- Refresh cadence: weekly (via scheduled CI job)
- Capture source: production peak hour, aggregated across all pods
- Expected gain: 4–6 % CPU vs -pgo=off
- Rollback: git revert the profile commit
- Emergency disable: build with -pgo=off
- Where to look if something breaks: bench-diff.txt in CI artifacts

A new on-call engineer reads this, understands the contract in 60 seconds.

14. The "do we even need it?" calculation¶

PGO has a cost: extra CI minutes, extra review surface, one extra committed file. Run the math:

Service runs on 200 pods, each costing $50/month → $10 000/month compute.
PGO saves 5 % CPU → 5 % fewer pods → $500/month saved.
Engineer time to set up + maintain: ~4 hours/year. Roughly $400 one-time + maintenance.
ROI: positive after the first month.

For small services (<10 pods), the math may go the other way. PGO is most worth it for services with significant CPU spend and interface- or call-graph-heavy hot paths.

15. Summary¶

Production PGO is release engineering, not a magic incantation. The profile is committed source, refreshed on a cadence, captured from real traffic (ideally via a continuous profiler), validated in CI, A/B-benchmarked on every release, and rolled back with a single revert if it misbehaves. Document the contract in the runbook. The whole apparatus is small but disciplined — and rewards services with substantial CPU footprints.