pprof and Profiling Tools — Find the Bug¶
Table of Contents¶
- How to Use This File
- Bug 1: pprof Imported but Never Reachable
- Bug 2: Library That Imports pprof
- Bug 3: Public Pprof Endpoint
- Bug 4: Unbounded Label Cardinality
- Bug 5: Labels Not Propagated
- Bug 6: Two CPU Profiles at Once
- Bug 7: Mutex Profile Disabled by Default
- Bug 8: Heap Profile Without GC
- Bug 9: Stripped Binary with No Symbol Server
- Bug 10: Mass Goroutine Profile Storm
- Bug 11: Custom Profile Without Remove
- Bug 12: Wrong Debug Level
- Bug 13: Trace as a Profile
- Bug 14: Pprof Endpoint Behind a Slow Middleware
- Bug 15: Comparing Profiles of Different Lengths
- How to Approach Pprof Bugs
How to Use This File¶
Each bug presents a short code or workflow with a defect. Try to spot it before reading the analysis. The bugs are real — every one of them has been seen in production.
Bug 1: pprof Imported but Never Reachable¶
package main
import (
"fmt"
_ "net/http/pprof"
)
func main() {
fmt.Println("running")
select {} // pretend long-running
}
curl http://localhost:6060/debug/pprof/ returns connection refused.
Analysis¶
The import alone does nothing useful. net/http/pprof.init() registers routes on http.DefaultServeMux, but there is no http.ListenAndServe running. The blank-identifier import is necessary but not sufficient.
Fix¶
import (
"log"
"net/http"
_ "net/http/pprof"
)
func main() {
go func() {
log.Println(http.ListenAndServe("127.0.0.1:6060", nil))
}()
select {}
}
Bug 2: Library That Imports pprof¶
A library you wrote:
After importing mylib, the host binary's HTTP API suddenly exposes /debug/pprof/.
Analysis¶
Libraries should never import net/http/pprof. The import has a hidden side effect: it mutates http.DefaultServeMux. Now any consumer who runs an HTTP server with nil mux unknowingly exposes profile endpoints.
Fix¶
Move the import to the main package of each consumer that wants pprof. Do not export the side effect via libraries.
Bug 3: Public Pprof Endpoint¶
The host is on the public internet. Logs show a steady stream of /debug/pprof/profile?seconds=600 requests.
Analysis¶
":6060" binds to all interfaces. An attacker discovered the endpoint and is forcing 10-minute CPU profiles repeatedly. Each one adds ~5% CPU overhead and exfiltrates code structure.
Fix¶
Bind to localhost and clamp durations:
Reach the endpoint via SSH tunnel.
Bug 4: Unbounded Label Cardinality¶
func handler(w http.ResponseWriter, r *http.Request) {
rid := r.Header.Get("X-Request-Id")
pprof.Do(r.Context(), pprof.Labels("request_id", rid), func(ctx context.Context) {
process(ctx, r)
})
}
The pprof endpoint slows from 50 ms to 30 s. Memory climbs. The Pyroscope dashboard becomes unusable.
Analysis¶
Every request has a unique request_id. Every distinct label set is its own internal entry in the profile's tag table. Over millions of requests the profile grows to gigabytes.
Fix¶
Use coarse labels only:
pprof.Do(r.Context(), pprof.Labels(
"endpoint", r.URL.Path,
"method", r.Method,
"tenant", tenantOf(r),
), ...)
For per-request granularity, use runtime/trace tasks, not labels.
Bug 5: Labels Not Propagated¶
pprof.Do(ctx, pprof.Labels("tenant", "acme"), func(ctx context.Context) {
go backgroundWork() // forgot to apply labels
})
tagfocus=tenant=acme returns no background-work goroutines.
Analysis¶
go f() starts a fresh goroutine with empty labels. The labels on the parent do not propagate automatically.
Fix¶
pprof.Do(ctx, pprof.Labels("tenant", "acme"), func(ctx context.Context) {
go func() {
pprof.SetGoroutineLabels(ctx)
backgroundWork()
}()
})
Or wrap your goroutine launcher to do this once and for all.
Bug 6: Two CPU Profiles at Once¶
http.HandleFunc("/profile", func(w http.ResponseWriter, r *http.Request) {
f, _ := os.Create("/tmp/cpu.prof")
defer f.Close()
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
time.Sleep(30 * time.Second)
})
Two operators hit /profile simultaneously. The second returns "cpu profiling already in use."
Analysis¶
The runtime allows only one CPU profile at a time. Without a guard, concurrent requests panic or error.
Fix¶
Serialise with a mutex or semaphore. Better: do not put CPU profiling on a user-facing endpoint at all. Use /debug/pprof/profile (which has the same constraint but at least returns a clean 500).
var cpuSem = make(chan struct{}, 1)
http.HandleFunc("/profile", func(w http.ResponseWriter, r *http.Request) {
select {
case cpuSem <- struct{}{}:
defer func() { <-cpuSem }()
default:
http.Error(w, "busy", http.StatusTooManyRequests)
return
}
pprof.StartCPUProfile(w)
defer pprof.StopCPUProfile()
time.Sleep(30 * time.Second)
})
Bug 7: Mutex Profile Disabled by Default¶
A developer adds mutex profiling to debug contention:
Analysis¶
The mutex profile is off by default. Without runtime.SetMutexProfileFraction(n) it records nothing.
Fix¶
Same applies to the block profile: runtime.SetBlockProfileRate(int(time.Millisecond)).
Bug 8: Heap Profile Without GC¶
curl -o h.prof http://host:6060/debug/pprof/heap
go tool pprof h.prof
(pprof) top
# 500 MB allocated to a function that has not run in 10 minutes
Analysis¶
Without ?gc=1, the profile includes dead-but-not-yet-collected objects. What looks like a leak may just be retained garbage.
Fix¶
Pay the cost of a GC; get clean data.
Bug 9: Stripped Binary with No Symbol Server¶
Production binary built with -ldflags="-s -w". A profile from prod shows:
instead of function names.
Analysis¶
-w drops DWARF. Symbols are normally still available via gopclntab, but if the binary was stripped further or the profile was captured by an eBPF tool that does not consult gopclntab, you get raw addresses.
Fix¶
Either ship unstripped binaries (the size cost is usually acceptable), or set up remote symbolisation:
Bug 10: Mass Goroutine Profile Storm¶
A monitoring script:
The service starts dropping p99 latency.
Analysis¶
Each goroutine profile call stops the world to walk allgs. On a service with 100k goroutines, each pause is a few milliseconds. Doing it every second pegs ~1% of wall time in stop-the-world.
Fix¶
Sample once a minute or less. For raw count, use runtime.NumGoroutine() exposed via a /metrics endpoint — cheap.
Bug 11: Custom Profile Without Remove¶
var conns = pprof.NewProfile("open_conns")
func handle(c net.Conn) {
conns.Add(c, 0)
// ... never calls conns.Remove(c)
}
/debug/pprof/open_conns grows forever even though connections close cleanly.
Analysis¶
Add and Remove must be paired. The custom profile is a separate accounting structure from the actual object lifecycle.
Fix¶
Bug 12: Wrong Debug Level¶
The on-call engineer asked for a quick goroutine count.
Analysis¶
debug=2 prints every goroutine with its full stack. For 100k goroutines that is 5+ million lines. The engineer wanted debug=1 (grouped, much smaller).
Fix¶
The first line is goroutine profile: total <N>.
Bug 13: Trace as a Profile¶
Analysis¶
A trace is not a pprof profile. It uses a different binary format and is opened with go tool trace, not go tool pprof.
Fix¶
Bug 14: Pprof Endpoint Behind a Slow Middleware¶
mux.Use(loggingMiddleware)
mux.Use(authMiddleware)
mux.Use(tracingMiddleware)
mux.Use(rateLimitMiddleware)
mux.Handle("/debug/pprof/", pprof.Index)
Profiles take 30+ seconds to return. CPU profile is even worse — the middleware adds spans for every nanosecond of profile streaming.
Analysis¶
Generic middleware was applied to admin routes. The tracing middleware in particular adds overhead per byte streamed, which destroys a CPU profile.
Fix¶
Put admin routes on a separate mux with no middleware (or only auth):
adminMux := http.NewServeMux()
adminMux.Handle("/debug/pprof/", pprof.Index)
// no logging, no tracing
Bug 15: Comparing Profiles of Different Lengths¶
curl -o cpu1.prof http://host:6060/debug/pprof/profile?seconds=30
curl -o cpu2.prof http://host:6060/debug/pprof/profile?seconds=10
go tool pprof -base cpu1.prof cpu2.prof
The "after" profile looks 3x smaller. The team panics about CPU regression.
Analysis¶
The two profiles cover different durations. Absolute sample counts are not comparable. The "regression" is just less collection time.
Fix¶
Always use the same seconds= value:
curl -o cpu1.prof http://host:6060/debug/pprof/profile?seconds=30
curl -o cpu2.prof http://host:6060/debug/pprof/profile?seconds=30
go tool pprof -base cpu1.prof cpu2.prof
Or compare normalised rates: total CPU samples / duration.
How to Approach Pprof Bugs¶
A short triage workflow:
- Is pprof reachable?
curl /debug/pprof/. If 404 or connection refused, the import or the listener is missing. - Is it on the right interface?
netstat -lntp | grep 6060. Should be127.0.0.1, not0.0.0.0. - Is it the right profile? Goroutine for leaks, heap for memory, profile for CPU, trace for latency.
- Is the profile complete? Stripped binary with bad symbolisation? Mutex/block profile not enabled? Wrong debug level?
- Is the comparison fair? Same
seconds=, same?gc=1or not, same labels enabled. - Is the workload representative? A profile during deploy or warmup tells you about deploy or warmup, not steady state.
Most pprof bugs are configuration bugs, not code bugs. The code is fine; the operator or developer asked the wrong question or used the wrong flag.