GODEBUG & runtime/debug — Find the Bug¶
Each snippet contains a real-world bug related to
GODEBUGor theruntime/debugpackage.GODEBUGis a comma-separated list ofname=valuesettings the runtime reads once at startup; since Go 1.21 it also gates backward-incompatible behavior changes whose defaults come from the module'sgoline.runtime/debugexposes programmatic GC/memory controls, stack/heap dumps, and build-info access. Find the bug, explain it, fix it.
Bug 1 — Expecting GODEBUG to change a running process¶
func enableTracing() {
os.Setenv("GODEBUG", "gctrace=1") // turn on GC tracing now
}
func main() {
runWorkload() // already running; no GC trace appears
enableTracing() // too late
runMoreWorkload() // still no GC trace
}
Bug: GODEBUG is parsed once, at program startup, before main runs. Calling os.Setenv("GODEBUG", ...) afterward does nothing for already-resolved settings — the runtime never re-reads it.
Fix: set it in the environment before launching the process. For in-process control of the GC, use runtime/debug instead (though gctrace itself has no programmatic equivalent — you must set it at launch):
Bug 2 — //go:debug in a library package¶
Bug: //go:debug directives configure an executable's default behavior. They are permitted only in package main. Placing one in a library is a compile error — and even if it compiled, it would not affect consumers.
Fix: move the directive to a file in package main, or set the module-wide default with a godebug directive in go.mod:
Bug 3 — //go:debug separated from package by a blank line¶
Bug: A //go:debug directive must be in the comment block immediately preceding the package clause, with no blank line between. The blank line here detaches the comment from the package, so the directive is treated as an ordinary comment and silently ignored.
Fix: remove the blank line:
Bug 4 — Memory limit set below the working set¶
func main() {
debug.SetMemoryLimit(64 << 20) // 64 MiB "to be safe"
cache := loadHugeCache() // live set is ~300 MiB
serve(cache) // GC runs nonstop, latency collapses
}
Bug: The soft memory limit is below the program's live working set. The runtime cannot reach the target, so it GCs back-to-back — the GC death spiral: high GC CPU, terrible latency, but no OOM (so RSS dashboards look fine).
Fix: size the limit above the working set, derived from the container limit:
Alert on GC CPU fraction, not just RSS, to catch this class of problem.
Bug 5 — In-code SetMemoryLimit silently overrides the operator's GOMEMLIMIT¶
Bug: GOMEMLIMIT (env) and SetMemoryLimit (call) set the same dial; the later setter wins. The hard-coded call runs after the env var was applied at startup, so the operator's GOMEMLIMIT is overridden and ignored.
Fix: pick one source of truth. If ops should control it, don't hard-code a call — read from the environment, or only call SetMemoryLimit when the env var is absent:
Bug 6 — Disabling GC in a long-lived service¶
func main() {
debug.SetGCPercent(-1) // "maximize throughput"
runForeverHTTPServer() // memory grows without bound → OOM
}
Bug: SetGCPercent(-1) disables the garbage collector. In a short-lived tool that is fine; in a long-running server the heap grows without bound until the process is OOM-killed.
Fix: keep GC enabled. To favor throughput while still capping memory, disable the ratio but set a limit — GC then runs only as memory approaches the ceiling:
Bug 7 — Ignoring ok from ReadBuildInfo¶
func version() string {
info, _ := debug.ReadBuildInfo() // ignored ok
return info.Main.Version // panics when info is nil
}
Bug: ReadBuildInfo returns (nil, false) when no build info is embedded — under go run, some test binaries, or -buildvcs=false. Dereferencing info then panics with a nil pointer.
Fix: branch on ok:
func version() string {
info, ok := debug.ReadBuildInfo()
if !ok {
return "unknown"
}
return info.Main.Version
}
Bug 8 — gctrace output redirected to the wrong stream¶
Bug: gctrace (and the other runtime traces) write to stderr, not stdout. Redirecting only stdout (>) sends the GC lines to the terminal instead of the file.
Fix: redirect stderr:
Bug 9 — FreeOSMemory on a hot path¶
func handleRequest(w http.ResponseWriter, r *http.Request) {
resp := buildResponse(r)
w.Write(resp)
debug.FreeOSMemory() // "keep memory low" — on every request
}
Bug: FreeOSMemory forces a full GC and an eager OS return on every request. Under load this dominates CPU and tanks throughput — it is one of the most expensive things you can call per-request.
Fix: remove it from the request path. For steady-state memory control use GOMEMLIMIT; if RSS accounting matters, set GODEBUG=madvdontneed=1. Reserve FreeOSMemory for a one-off after a large batch:
Bug 10 — Misreading the gctrace heap triple¶
// alerting rule, pseudo-code, parsing "50->52->30 MB"
liveHeapMB := parseFirst(gctraceTriple) // takes 50, the WRONG number
if liveHeapMB > threshold { alert() }
Bug: The triple is heap-before -> heap-at-mark-termination -> live-after. The live heap is the third number (30), not the first (50). The alert reads the pre-GC size and fires on transient peaks, not on actual live memory.
Fix: parse the third value — and better, stop parsing gctrace text entirely (it is not a stable API) and read runtime/metrics (/gc/heap/live:bytes or the heap-goal metrics) instead.
Bug 11 — Assuming the toolchain version sets compatibility behavior¶
func mayPanicNil() {
defer func() {
r := recover()
useRecovered(r) // expects nil sometimes
}()
panic(nil)
}
A teammate "upgrades to Go 1.23" by installing the new toolchain and expects panic(nil) to now raise *runtime.PanicNilError. It still delivers nil.
Bug: Compatibility defaults come from the go line, not the toolchain version. With go 1.20 in go.mod, the binary keeps Go 1.20's panicnil=1 behavior even when built by Go 1.23. The toolchain upgrade is deliberately behavior-neutral.
Fix: to opt into the new behavior, raise the go line (and review the change as a behavior change):
Or, to opt in without raising the whole baseline, set //go:debug panicnil=0 in package main. To keep the old behavior on a new go line, set panicnil=1.
Bug 12 — Pinned compatibility setting left to rot¶
$ go build ./...
# years later, after a toolchain upgrade:
panic: godebug: unknown setting "x509sha1"
Bug: Compatibility GODEBUG settings are removed after a deprecation window. A setting pinned years ago and never revisited eventually stops existing; the build (or runtime) then fails because the setting is gone — and the old behavior with it.
Fix: treat every pin as debt with an expiry. Track it to a removal date, fix the root cause (here: stop relying on SHA-1 certificates), and remove the pin. Use the /godebug/non-default-behavior/x509sha1:events counter to confirm whether the old path is still exercised before removing.
Bug 13 — SetMemoryLimit blind to cgo memory¶
func main() {
// container hard limit is 1 GiB
debug.SetMemoryLimit(950 << 20) // leave 74 MiB headroom
runImageProcessor() // uses a cgo library allocating ~400 MiB off-heap
}
// process is OOM-killed despite the limit
Bug: The soft memory limit accounts only for runtime-managed memory — heap, stacks, runtime metadata. It is blind to cgo and mmap allocations. The cgo library's ~400 MiB is invisible to the limit, so total cgroup memory blows past 1 GiB and the OOM killer fires even though the Go limit is "satisfied."
Fix: subtract non-Go memory when sizing the limit:
Track cgo allocations separately; the Go limit cannot do it for you.
Bug 14 — debug.Stack() used to debug a deadlock across goroutines¶
func dumpOnSignal() {
<-sigChan
log.Printf("goroutines:\n%s", debug.Stack()) // only ONE goroutine
}
Bug: debug.Stack() returns only the current goroutine's stack. To diagnose a deadlock you need all goroutines' stacks; this dumps just the signal handler's, which is useless for the investigation.
Fix: use runtime.Stack with all=true, or send the process SIGQUIT (the runtime dumps every goroutine):
buf := make([]byte, 1<<20)
n := runtime.Stack(buf, true) // all goroutines
log.Printf("goroutines:\n%s", buf[:n])
Bug 15 — Parsing gctrace as a stable metrics source¶
// scrapes gctrace lines into Prometheus
re := regexp.MustCompile(`gc \d+ @[\d.]+s (\d+)%`)
// breaks after a Go upgrade changes the line format
Bug: The gctrace text format is not a stable API. It changes between Go releases. A dashboard built on parsing it silently breaks (or produces wrong numbers) after a toolchain upgrade.
Fix: read the structured, stable runtime/metrics API instead:
Use gctrace for ad-hoc human investigation only; never as a programmatic data source.
Bug 16 — Shipping a dirty-tree build to production¶
func main() {
info, _ := debug.ReadBuildInfo()
log.Printf("revision=%s", revisionOf(info)) // logs a commit hash, looks fine
// ... but vcs.modified=true and nobody checks it
}
Bug: The logged vcs.revision looks authoritative, but vcs.modified=true means the binary was built from a tree with uncommitted changes. The revision does not fully describe the binary — it is unreproducible — and this build reached production unnoticed.
Fix: check vcs.modified and refuse/flag dirty builds:
for _, s := range info.Settings {
if s.Key == "vcs.modified" && s.Value == "true" {
log.Fatal("refusing to run a dirty-tree build in production")
}
}
Better, enforce it in the deploy pipeline, not just at runtime.
Bug 17 — Exposing the full dependency list publicly¶
func versionHandler(w http.ResponseWriter, r *http.Request) {
info, _ := debug.ReadBuildInfo()
fmt.Fprint(w, info.String()) // dumps every module and version, unauthenticated
}
Bug: BuildInfo.String() (and info.Deps) lists every dependency and its exact version. Served unauthenticated on the public internet, it is a ready-made inventory for matching your dependencies against known CVEs.
Fix: expose only a short revision publicly; gate the full build info behind authentication:
func versionHandler(w http.ResponseWriter, r *http.Request) {
info, _ := debug.ReadBuildInfo()
if !authorized(r) {
fmt.Fprintln(w, shortRevision(info)) // commit only
return
}
fmt.Fprint(w, info.String())
}
Bug 18 — Treating non-default-behavior zero as proof of safety¶
// pre-upgrade check
if nonDefaultBehavior("/godebug/non-default-behavior/panicnil:events") == 0 {
raiseGoLineTo("1.21") // "nothing depends on the old behavior"
}
Bug: A zero counter means the old code path was not exercised in this process's lifetime, not that no code depends on it. A panic(nil) reachable only under rare inputs may never have run during the short check window — so the counter is zero while the dependency is real.
Fix: treat zero as "no evidence of reliance," not "proof of safety." Run the check across representative load and a long window, combine it with tests and code review, and raise the go line as a staged, reversible change with pins available as a bridge.
Bug 19 — madvdontneed confusion: RSS "leak" that isn't¶
$ ./app # RSS sits at 800 MiB long after a spike freed most of it
# team concludes there's a memory leak and adds FreeOSMemory everywhere
Bug: On Linux, the runtime defaults to MADV_FREE, which lets freed pages count against RSS until the kernel reclaims them under pressure. The memory is logically free; RSS just lags. Concluding "leak" and sprinkling FreeOSMemory adds expensive full GCs to fight a non-problem.
Fix: if RSS must reflect freed memory promptly (e.g., for cgroup accounting), set GODEBUG=madvdontneed=1 to use MADV_DONTNEED:
Confirm it's not a real leak first via heap profiling (pprof), not RSS.
Bug 20 — Misspelled GODEBUG setting, silently ignored¶
$ GODEBUG=gctrace=1 ./app # typo: gctRAce
# no GC output; engineer assumes the program has no GC activity
Bug: Unknown GODEBUG names are silently ignored — no error, no warning. The typo gctrace → gctrace (here gctrace) produces no output, leading to the false conclusion that GC isn't running.
Fix: verify the exact name against the docs / GODEBUG history table, and double-check spelling when expected output is missing:
Bug 21 — Crash traceback lost because only stderr was used¶
func main() {
// logs go through a buffered, async logger
log.SetOutput(asyncBufferedWriter)
runServer() // a goroutine panics fatally; the traceback never reaches durable storage
}
Bug: A fatal crash's traceback goes to stderr via the runtime's crash path, which may be lost or truncated when the process dies — especially if stderr is wired into a buffered/async logger that never flushes. The most important diagnostic is exactly the one lost.
Fix: route crash output to a durable sink with SetCrashOutput (Go 1.23) and widen the traceback:
f, _ := os.OpenFile("/var/log/app/crash.log",
os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0o644)
debug.SetCrashOutput(f, debug.CrashOptions{})
debug.SetTraceback("all")
A sidecar tails crash.log; the traceback now survives the crash.
Bug 22 — Bumping the go line and toolchain in one unreviewed commit¶
Committed as part of "chore: update Go", merged with a rubber stamp. A week later, TLS handshakes to a legacy partner start failing because Go 1.21+ dropped the RSA key-exchange cipher suites by default.
Bug: Raising the go line flips every compatibility default at once. Conflating it with a routine toolchain update hides a real behavior change (here, tlsrsakex defaulting off) behind an innocuous-looking diff, with no instrumentation or canary.
Fix: treat go-line changes as a distinct risk class. Upgrade the toolchain first (behavior-neutral), then raise the go line as its own reviewed change, instrumented with the non-default-behavior counters and rolled out via canary. Pin individual settings as a temporary bridge if needed:
Summary¶
GODEBUG and runtime/debug are small surfaces with sharp edges. Most bugs come from one of four habits:
- Forgetting
GODEBUGis startup-only and external. It cannot be changed on a running process, unknown names are silently ignored, traces go to stderr, and//go:debugonly works inpackage mainwith no detaching blank line. - Mishandling the soft memory limit. It is a target, not a cap; set below the working set it causes a GC death spiral (invisible to RSS alerts); it ignores cgo/mmap memory; and an in-code call silently overrides the operator's
GOMEMLIMIT. - Misusing the controls.
SetGCPercent(-1)in a long-lived service leaks memory;FreeOSMemoryon a hot path is ruinous;debug.Stack()is one goroutine, not all. - Misunderstanding the compatibility system. The
goline (not the toolchain) selects defaults; pinned settings expire;non-default-behaviorzero is not proof of safety; andgo-line bumps are behavior changes deserving review.
Read structured runtime/metrics, not gctrace text; check ok from ReadBuildInfo; never ship a dirty build or expose the full dependency list; and route fatal crashes to durable storage. With those habits the rest is straightforward.
In this topic