Context — Find the Bug¶
1. How to use this file¶
Fourteen buggy snippets of context usage in Go: cancel-leaks, value-key collisions, struct-stored contexts, lost deadlines, busy spins, double-cancels, nil-ctx panics, WithoutCancel misuse. Read each in 30-60 seconds, decide where the defect is, then expand <details> for the answer.
Context bugs almost never blow up on the happy path. They silently leak a goroutine because nobody called cancel. They quietly drop a deadline because a leaf re-rooted on Background(). They collide on a string value key the moment a second package picks the same word. Four questions to ask every snippet:
- Does every
WithCancel/WithTimeout/WithDeadlinehave adefer cancel()reachable on every path? - Is the parent passed through unmodified — no
context.Background()orcontext.TODO()substituted in the middle of a call chain? - Is the context ever stored on a long-lived struct or shared past the caller's return?
- Are value keys unexported named types, not bare strings — and is every consumer prepared for
ctx == nil?
If a snippet can't answer all four, there's a bug. Every diagnosis below references context/context.go from the Go 1.22 standard library by line number, so you can confirm the source-level reason rather than trusting the prose.
Bug 1 — WithCancel returned but cancel never called (goroutine leak)¶
import "context"
func fetchUser(parent context.Context, id string) (*User, error) {
ctx, _ := context.WithCancel(parent) // BUG: cancel discarded
return store.Get(ctx, id)
}
Answer
**Bug:** `WithCancel` returns `(ctx, cancel)`. Discarding `cancel` is a guaranteed leak. `withCancel` (context.go:273) builds a `cancelCtx` and calls `c.propagateCancel(parent, c)` (context.go:475). When `parent` is itself a `cancelCtx`, `propagateCancel` registers the new child in `parent.children` (the `map[canceler]struct{}` at context.go:436). That registration is *only* removed by `c.cancel(removeFromParent=true, ...)` at context.go:549. If nobody calls the returned `cancel`, the child stays in the parent's map for the lifetime of the parent. If the parent is `context.Background()`, the child stays forever. `goroutines` (context.go:371) goes up; nothing brings it down. **Why subtle:** Nothing crashes. The function returns the right `User`. The leak shows up only under load — slow memory creep, eventually OOM. The signal is in `runtime/pprof` heap profiles, not in unit tests. **Spot:** `go vet` ships `lostcancel` exactly for this. Any line of the form `ctx, _ := context.WithCancel(...)` or `ctx, _ := context.WithTimeout(...)` or `ctx, _ := context.WithDeadline(...)` is wrong. The underscore is the bug. **Fix:** `cancel()` is idempotent (see Bug 10), so the unconditional `defer` is always safe. **Why common:** Callers see `cancel` as "for the error path" and discard it on the happy path. The whole point of `cancel` is to free the parent registration even when nothing went wrong. The cleanup is the point, not the abort.Bug 2 — context.WithValue using a bare string key¶
const userIDKey = "user_id"
func WithUser(ctx context.Context, id string) context.Context {
return context.WithValue(ctx, userIDKey, id) // BUG: string key
}
func User(ctx context.Context) string {
s, _ := ctx.Value(userIDKey).(string)
return s
}
Answer
**Bug:** `context.WithValue` stores the pair in a `valueCtx{key, val any}` (context.go:742). Lookup walks the chain comparing keys with `==` (context.go:768). A bare `string` is comparable to *any other* `string` with the same characters, so the moment a second package picks `"user_id"` — middleware, an auth library, a logging context — they collide. Whoever wrapped last wins; the other side reads the wrong value. The doc comment on `WithValue` (context.go:715-727) explicitly warns: "The provided key must be comparable and should not be of type `string` or any other built-in type to avoid collisions between packages." `go vet`'s `contextkey` analyser flags this exact pattern. **Why subtle:** Within one package the bug is invisible — your one string matches your one string. The collision only fires when two packages converge in a request pipeline. The bad reader gets *a* value with the *right* type; the static type-assertion succeeds, the runtime value is wrong. **Spot:** Any `context.WithValue(ctx, "...", ...)` literal. Any `const fooKey = "..."` paired with `ctx.Value(fooKey)`. `go vet` and `staticcheck`'s `SA1029` catch it. **Fix:** Use an unexported named type. The type identity makes the key unforgeable across packages: Empty struct, zero size, package-private — no other package can construct the same key. Matches the pattern `cancelCtxKey int` (context.go:374) used by the stdlib itself. **Why common:** Strings are the obvious key type for a key/value bag. The collision footgun is non-obvious from the call site, and discovery requires reading the package doc. The lint rule is the cheapest signal.Bug 3 — Storing context.Context in a struct field¶
type Service struct {
ctx context.Context // BUG: long-lived field
db *DB
}
func NewService(ctx context.Context, db *DB) *Service {
return &Service{ctx: ctx, db: db}
}
func (s *Service) Find(id string) (*User, error) {
return s.db.Get(s.ctx, id) // uses whatever ctx was at construction
}
Answer
**Bug:** Context is *per-call*, not per-object. The doc comment on `Context` (context.go:36-44) is explicit: "Do not store Contexts inside a struct type; instead, pass a Context explicitly to each function that needs it. The Context should be the first parameter, typically named ctx." Storing it freezes one request's deadline, values, and cancellation channel into a long-lived object. Later callers — possibly from other requests, possibly from a different request lifecycle — inherit the *constructor's* ctx. When the constructor's request finishes and its `cancel` runs, every subsequent `Find` sees `ctx.Err() == Canceled` and returns immediately. **Why subtle:** Tests instantiate `NewService(context.Background(), db)` and pass forever — `Background()` never cancels, so `s.ctx` is permanently valid. Production wires `NewService(reqCtx, db)`; the next request's `Find` panics with "context canceled" once the original request returns. **Spot:** Any struct field of type `context.Context`. Any constructor that takes a context and stashes it. The Go `linters` ecosystem has `containedctx` for exactly this. **Fix:** Take ctx as the first parameter of every method that needs one: The one canonical exception is request-scoped types whose entire identity is one request (`*http.Request` carries one via `req.Context()` — and even that is opaque, accessed via method, not field). **Why common:** "Threading ctx through every method is ugly." It's a deliberate choice — ctx is *information about the call*, not about the object. Putting it on the struct converts call-scoped state into object-scoped state, and the request lifecycle stops matching the object lifecycle.Bug 4 — context.Background() substituted at a leaf (deadline lost)¶
func handler(ctx context.Context, w http.ResponseWriter, r *http.Request) {
// ctx came from a WithTimeout(parent, 5*time.Second) higher up the stack.
rows, err := db.QueryContext(context.Background(), "SELECT ...") // BUG
...
}
Answer
**Bug:** The outer caller wrapped a deadline. The leaf re-rooted on `context.Background()` (context.go:215), throwing the deadline away. `db.QueryContext` will block until the database returns; the 5-second SLA the caller assumed is silently gone. Same defect with `context.TODO()` at the leaf (see Bug 9). `Background()` is documented (context.go:209-214) as the root for `main`, `init`, tests, and incoming requests. It has no `Deadline`, no `Done`, no `Value`s — `emptyCtx` methods at context.go:181-197 all return zero. Substituting it mid-call drops every cancellation and every value the parent set. **Why subtle:** The query works. The result is correct. Tests pass because tests don't exceed their (absent) deadline. Production breaks when the database is slow — the timeout the handler *thought* it had isn't there, and the request hangs until something else times out (load balancer, client). **Spot:** Any `context.Background()` or `context.TODO()` inside a function whose signature already includes `ctx context.Context`. Linter rule: `contextcheck`. Static rule: if the function takes ctx, the function does not create ctx. **Fix:** If the leaf genuinely should outlive the request (background flush, fire-and-forget), use `context.WithoutCancel(ctx)` (Go 1.21+, context.go:585) to preserve values but detach cancellation — that's a deliberate, auditable severance, unlike "happened to type `Background()`". **Why common:** Two reasons. (1) Copy-paste from `main`, where `Background()` is the right answer. (2) "I want this to run regardless" — but "regardless of caller cancellation" is rarely the actual intent. The intent is usually "with the same deadline as the caller", which is achieved by *not* doing anything special.Bug 5 — Goroutine selects on ctx.Done() but never reads it again¶
func worker(ctx context.Context, jobs <-chan Job) {
for {
select {
case <-ctx.Done():
// BUG: no return — falls through, loops forever
case j := <-jobs:
process(j)
}
}
}
Answer
**Bug:** `<-ctx.Done()` is a *read* on a channel that's permanently ready once `cancel()` fires. The `select` arm matches, the body is empty, the outer `for` restarts the `select`, the `Done` channel is *still* ready (it was closed, not signalled), the arm matches again — instant busy-loop pegging one CPU core. From context.go:451: the `Done` channel is created once via `c.done.CompareAndSwap(nil, closedchan)` (or a fresh channel that's later closed at context.go:564). Once closed, every subsequent receive returns the zero value immediately — that's the point of the close-as-broadcast pattern. **Why subtle:** Before cancellation the loop is well-behaved. After cancellation it pegs a core but the program *seems* alive — log lines stop coming because the busy goroutine doesn't yield, but the process doesn't crash. Symptom in production: one core at 100%, no progress, no panic. **Spot:** Any `case <-ctx.Done():` arm whose body doesn't `return` (or `break` out of the surrounding loop). The Done channel is a one-shot broadcast — you read it once to *learn* of cancellation, then exit. **Fix:** If cleanup is needed, do it after the `return`-style structure: drain a buffered channel, log `ctx.Err()`, close downstream channels. But you must *leave the select*. **Why common:** Empty `case` arms look harmless in unrelated `select`s where the channel later un-fires. `Done` doesn't un-fire — close is permanent — so an empty arm under cancellation is exactly the wrong shape.Bug 6 — Forgot to check ctx.Err() after <-ctx.Done()¶
func runJob(ctx context.Context) error {
done := launchAsync()
select {
case <-done:
return nil
case <-ctx.Done():
return errors.New("job aborted") // BUG: hides Canceled vs DeadlineExceeded
}
}
Answer
**Bug:** `ctx.Done()` closes for two distinct reasons — the caller cancelled (`context.Canceled`, context.go:167) or a deadline expired (`context.DeadlineExceeded`, context.go:171). The caller of `runJob` wants to *know which* — Canceled means "user gave up, no retry"; DeadlineExceeded means "server too slow, retry with backoff is OK". A bare `errors.New(...)` collapses both into one opaque string. `ctx.Err()` (context.go:463 for `cancelCtx`, with the deadlined variant in `timerCtx`'s cancel at context.go:679) returns exactly one of those two sentinel errors, set under the lock at context.go:560-573 before the channel close. The information is *there* — the bug is throwing it away. **Why subtle:** From the unit test you wrote, "job aborted" is the right outcome. The retry layer two functions up doesn't see Canceled vs DeadlineExceeded; it retries everything or retries nothing, neither correct. **Spot:** Any `case <-ctx.Done(): returnBug 7 — context.WithTimeout(parent, 0) (immediate cancel)¶
func quickCheck(parent context.Context, attempt int) error {
d := time.Duration(attempt) * 100 * time.Millisecond
ctx, cancel := context.WithTimeout(parent, d) // BUG: d == 0 on attempt 0
defer cancel()
return ping(ctx)
}
Answer
**Bug:** `WithTimeout(parent, 0)` resolves to `WithDeadline(parent, time.Now())` (context.go:703-708). `WithDeadline` (context.go:625) computes `dur := time.Until(d)`; if `dur <= 0`, it calls `c.cancel(true, DeadlineExceeded, ...)` *immediately* (context.go:651), before returning. The returned ctx has `Err() == DeadlineExceeded` from the first instant. Result: every call inside `quickCheck` with `attempt == 0` gets a pre-cancelled context. `ping(ctx)` short-circuits on the very first `ctx.Done()` check and returns `DeadlineExceeded`. The function looks like it ran a check; it actually ran nothing. **Why subtle:** `0 * time.Second` reads as "use the default" or "no timeout" — neither is right. Zero is a valid `Duration` representing zero nanoseconds. `WithCancel(parent)` (positive infinity) and `WithTimeout(parent, time.Hour)` are the same flavour of ctx; only the boundary case `0` collapses. **Spot:** Any `WithTimeout` whose duration argument comes from arithmetic that can yield 0 or negative — `time.Until(t)` with `t` in the past, `attempt * unit` with `attempt = 0`, configuration default `0` interpreted as "infinite". **Fix:** Validate at the boundary. If `d <= 0` means "no deadline", call `WithCancel(parent)` instead. If `d <= 0` is a configuration error, return it: **Why common:** Code that treats `0` as "unset" in some configs (env vars, flag defaults) collides with `WithTimeout`'s "use exactly this duration" contract. The two conventions need explicit translation.Bug 8 — context.WithDeadline(parent, time.Time{}) or a past time¶
func runUntil(parent context.Context, until time.Time) error {
ctx, cancel := context.WithDeadline(parent, until) // BUG when until is past/zero
defer cancel()
return loop(ctx)
}
Answer
**Bug:** Same root cause as Bug 7, different surface. `WithDeadline` computes `dur := time.Until(d)` (context.go:649). If `d` is `time.Time{}` (the zero time, year 1) or any past instant, `dur` is hugely negative. The function returns a context that's already at `DeadlineExceeded` — context.go:651-657 calls `c.cancel(true, DeadlineExceeded, Cause(parent))` *before* `WithDeadline` returns. No timer is started (the `if dur <= 0` branch returns early); the returned `cancel` is for the already-cancelled context. A second slightly different trap: passing a deadline *later* than the parent's deadline. context.go:638-642 checks `if cur, ok := parent.Deadline(); ok && cur.Before(d) { return WithCancel(parent) }` — the function silently downgrades to `WithCancel` because the parent's deadline already bounds it. Not a bug, but surprising if you expected the longer deadline to win. **Why subtle:** Same shape as Bug 7. The function appears to "do something with a deadline"; it actually short-circuits. `time.Time{}` (zero value) is a common defaulted-but-not-set value from JSON/YAML configs. A `Until time.Time` field that arrived as zero turns every call into an instant DeadlineExceeded. **Spot:** Any `WithDeadline` whose `d` argument can be unset/zero/past. Any deadline computed by subtraction (`startTime.Add(timeout)`) where `startTime` could be in the distant past. **Fix:** Validate: If the input is configuration, validate at parse time, not at use time. The earlier you reject, the closer the error is to the human who can fix it. **Why common:** Zero `time.Time` looks like "no deadline" to anyone unfamiliar with the type. It is in fact "deadline at year 1", which is in the past, which is immediate cancel.Bug 9 — context.TODO() left in production code¶
func ChargeCard(amount int64) error {
return paymentsAPI.Charge(context.TODO(), amount) // BUG: shipped to prod
}
Answer
**Bug:** `context.TODO()` (context.go:223-229) returns the singleton `todoCtx{}`, identical in behaviour to `Background()` — no deadline, no values, no cancellation. The doc comment is explicit: "TODO returns a non-nil, empty Context. Code should use context.TODO when it's unclear which Context to use or it is not yet available (because the surrounding function has not yet been extended to accept a Context parameter). TODO is identified by static analysis tools that determine whether Contexts are being propagated correctly in a program." `String()` returns `"context.TODO"` (context.go:207) precisely so that grep, vet, and pprof traces can find these stragglers. The bug isn't that TODO is broken — it's that it's a marker of "we haven't decided yet". When that marker ships to production, the caller's deadline, cancellation, and request-scoped values all silently disappear at this leaf. **Why subtle:** Behaves identically to `Background()`. The function works. Cancellation just doesn't reach it. The only signal is the literal token `TODO` in the source — which is exactly what `staticcheck`'s `SA1012` and the `contextcheck` linter look for. **Spot:** `grep -rn 'context.TODO()' .` Every hit is a code-review question: "did you finish threading ctx through, or is this still a stub?" **Fix:** Thread `ctx` through: If you genuinely have nowhere to thread from (a global init, a fire-and-forget background loop), use `Background()` — but be deliberate. `Background()` says "this is a root"; `TODO()` says "I haven't decided". Production code should never say the second. **Why common:** `TODO()` is the easiest way to compile a function that doesn't yet take ctx. It's meant to be a temporary scaffold. The bug is when the scaffold becomes permanent because nobody opened the PR to add the `ctx context.Context` parameter.Bug 10 — Calling cancel() twice (works, but reads as bug)¶
func runQuery(parent context.Context) error {
ctx, cancel := context.WithCancel(parent)
defer cancel()
err := doWork(ctx)
if err != nil {
cancel() // BUG: redundant, misleading
return err
}
return nil
}
Answer
**Bug:** Not a runtime bug — `cancelCtx.cancel` (context.go:549) is explicitly idempotent. The first thing it does after taking the lock is `if c.err != nil { c.mu.Unlock(); return }` (context.go:557-560). The second call is a no-op. The bug is *readability*: the explicit `cancel()` before `return err` reads as "this matters", which makes a reader wonder what the `defer cancel()` is for, which makes the reader doubt every other `defer cancel()` in the codebase. Worse, this pattern hides a class of real bugs. If the early `cancel()` is removed (because someone reasoned "the defer handles it"), and the `defer` is removed in a refactor (because someone reasoned "the explicit cancel handles it"), the result is Bug 1 — a leak. The redundancy was load-bearing for the wrong reason. **Why subtle:** The program is correct. `cancel` is idempotent, so no harm. The harm is to the next reader. **Spot:** Any `cancel()` call site outside `defer`. There are legitimate ones — a goroutine that wants to terminate its child on success — but they should be rare and commented. **Fix:** Trust the `defer`: If the explicit cancel matters — e.g., to release the parent's child slot *before* a long post-work cleanup — say so in a comment: **Why common:** "Belt and braces" instinct. The runtime is forgiving here; the reader is not.Bug 11 — Passing nil as context¶
func ChargeCard(ctx context.Context, amount int64) error {
return paymentsAPI.Charge(ctx, amount)
}
// caller
err := ChargeCard(nil, 100) // BUG: nil Context
Answer
**Bug:** `Context` is an interface. The zero value is `nil`. The interface methods (context.go:71-153) — `Deadline()`, `Done()`, `Err()`, `Value()` — are all called as method dispatch on the interface value; method dispatch on a nil interface panics with `runtime error: invalid memory address or nil pointer dereference`. The first call inside `paymentsAPI.Charge` that does `ctx.Done()` or `ctx.Value(...)` blows up. The stack trace points to the dispatch site, far from the caller that planted the nil. The doc on `Context` (context.go:36) is explicit: "Do not pass a nil Context, even if a function permits it. Pass context.TODO if you are unsure about which Context to use." Standard library functions enforce this — `http.NewRequestWithContext` panics on nil, `database/sql.DB.QueryContext` panics on nil. **Why subtle:** Some functions tolerate nil ctx for compatibility (older libraries, legacy methods). Most don't. Whether a given API panics on nil is implementation-defined, so "passes the test" depends on which path inside the callee happens to touch ctx. **Spot:** Any literal `nil` in a position typed `context.Context`. Any caller of a function with `ctx context.Context` that hasn't received a ctx itself (the right answer is to take one as a parameter, never to fabricate `nil`). **Fix:** Use `context.Background()` or `context.TODO()`: If the call site has its own ctx, use that. The point is: never nil. The two-second cost of typing `context.Background()` saves the runtime panic. **Why common:** Test code that doesn't care about ctx and types `nil` to shut up the compiler. Glue code where ctx isn't yet wired and `nil` "compiles". Both surface in production the first time the callee touches ctx.Bug 12 — Race between cancel() and <-ctx.Done() on a custom Context¶
// Custom Context wrapping a real context, recording the deadline reason.
type tracingCtx struct {
context.Context
cancelled atomic.Bool
}
func (c *tracingCtx) Done() <-chan struct{} {
if c.cancelled.Load() {
return closedchan
}
return c.Context.Done()
}
// caller calls cancel() concurrently with another goroutine reading <-ctx.Done()
Answer
**Bug:** Custom `Context` implementations have to honour the "Done is closed asynchronously after cancel returns" contract — but they routinely don't. The stdlib `cancelCtx.cancel` (context.go:549-583) handles this carefully: under the lock it sets `c.err = err`, then either swaps in `closedchan` via `c.done.CompareAndSwap(nil, closedchan)` (context.go:565) *or* closes the existing channel via `close(d.(chan struct{}))` (context.go:567). Either way, observers see (a) the lazy `Done()` channel allocation (context.go:448-461) and (b) the *atomic transition* from open to closed. A custom Context like the one above uses two separate signals (an `atomic.Bool` and the embedded ctx's Done channel) without synchronising the *transition*. A reader can see `cancelled.Load() == false`, fall through to `c.Context.Done()`, and miss the close that happened between the check and the return. Conversely, `cancelled` can go true *before* the underlying channel closes — readers see "cancelled" with no Done event, or vice-versa. The stdlib is safe because `cancelCtx` owns the channel *and* the error and gates both behind one mutex (`c.mu` at context.go:432). Custom implementations rarely replicate that ordering and end up with a TOCTOU race that `-race` reports. **Why subtle:** `-race` catches it; production usually doesn't, because the window between "set bool" and "close channel" is microseconds. The bug surfaces as flaky tests, or a one-in-a-million missed cancellation under heavy contention. **Spot:** Any `type X struct { context.Context; ... }` where `X` overrides `Done()` or `Err()` without holding a lock around the transition. Any custom Context that stores cancellation state in multiple variables. **Fix:** Don't write custom Context. If you must, embed `context.Context` and *forward* — let the underlying ctx own the synchronisation: If you need to *add* a cancellation channel, chain via `context.WithCancel(parent)` so the stdlib owns the close. Don't roll your own. **Why common:** "I just want to add a field" leads to embedding `context.Context` and overriding one method. The override breaks the atomicity contract documented at context.go:36 ("Context implementations must be safe for simultaneous use by multiple goroutines"). The escape hatch is to compose rather than override.Bug 13 — Returning a child context from a function that outlives the caller¶
func openSession(parent context.Context) context.Context {
ctx, cancel := context.WithTimeout(parent, 30*time.Second)
defer cancel() // BUG: cancels before caller can use ctx
return ctx
}
// caller
sessionCtx := openSession(reqCtx)
go process(sessionCtx) // ctx is already cancelled
Answer
**Bug:** `defer cancel()` runs when `openSession` returns. The returned `ctx` is *already cancelled* by the time the caller receives it. `ctx.Err()` is `Canceled` (context.go:560 set it inside `c.cancel`); `ctx.Done()` is closed (context.go:564-568). The downstream `process(sessionCtx)` sees a dead context. The deeper bug is *ownership*. `WithCancel`/`WithTimeout`/`WithDeadline` return `(ctx, cancel)` together because cancellation responsibility flows with the context. Whoever holds `cancel` must call it; whoever uses `ctx` must do so before `cancel` is called. Hiding `cancel` behind `defer` inside the constructor severs this ownership — the caller has no way to keep ctx alive *and* no way to clean up. A symmetric mistake: returning ctx *without* the cancel, expecting the caller to "deal with it". The caller can't deal with a cancel they never received — and the ctx will leak per Bug 1. **Why subtle:** Tests that immediately consume ctx may pass — the `defer cancel()` runs after the test assertion. Real callers that pass ctx to a goroutine, store it, or hand it to a slower I/O path see the cancellation race. **Spot:** Any function that returns a `context.Context` it created via `WithCancel`/`WithTimeout`/`WithDeadline`. Either the function should also return `cancel`, or it shouldn't be creating ctx in the first place. **Fix:** Return both `ctx` and `cancel`, and let the caller own the lifecycle: This is exactly the shape the stdlib uses — `WithCancel`, `WithTimeout`, `WithDeadline`, `WithCancelCause` all return the pair. If you can't return the pair (e.g., the function returns a different higher-level type), store `cancel` on that type and call it from `Close`. **Why common:** "I want a one-liner that returns ctx." The shape `func() ctx` instead of `func() (ctx, cancel)` is shorter. The lifecycle break is invisible from the call site.Bug 14 — Misuse of context.WithoutCancel (Go 1.21+): losing required cancellation¶
func writeAuditLog(ctx context.Context, event Event) {
bgCtx := context.WithoutCancel(ctx)
go func() {
// BUG: bgCtx never times out, never cancels — runs forever on slow audit sink
if err := auditSink.Write(bgCtx, event); err != nil {
log.Printf("audit: %v", err)
}
}()
}
Answer
**Bug:** `context.WithoutCancel` (context.go:580-590) returns a `withoutCancelCtx` (context.go:592) whose `Done()` returns `nil` (context.go:600-602) and whose `Err()` returns `nil` (context.go:604-606). It preserves *values* (via `value()` walking the parent chain at context.go:608-611) but strips *cancellation entirely*. The result is a context that never fires `Done`, never reports `Err`, and never times out — by design. That's correct for "this audit write must outlive the request" — the original use case. The bug here is using it *without adding a new deadline*. The audit sink can hang forever; the goroutine leaks; if the sink is slow, every request leaks one more goroutine. The correct pattern when you want to detach from the request *and* still have a bound is to combine: detach with `WithoutCancel`, then re-bound with `WithTimeout`: **Why subtle:** Tests pass — audit writes complete fast in tests. Production: audit sink degrades, requests still succeed (they don't wait for audit), but the process accumulates goroutines blocked on the audit sink. Memory creeps. Eventually `goroutines` (context.go:371) shows millions; the process dies. **Spot:** Any `context.WithoutCancel(ctx)` whose return value is used directly in I/O without further wrapping. The function is a tool for *re-rooting*; the re-rooted context still needs its own bounds. **Fix:** Now the goroutine is bounded by a 5-second deadline of its own, independent of the request's. Values flow through; cancellation is fresh. **Why common:** `WithoutCancel` was added in Go 1.21 to solve "preserve values, drop cancellation"; many adopters read it as "make this run in the background" and miss that "background" should still mean "with *some* deadline". The function is a building block, not a complete solution.Summary¶
These bugs cluster into four families.
Lifecycle (1, 10, 13, 14): discarding cancel, calling it twice, returning ctx without its cancel, using WithoutCancel without re-bounding. The (ctx, cancel) pair from withCancel (context.go:273) registers the child in the parent's cancelCtx.children map (context.go:436); only cancel(true, ...) (context.go:549) removes it. Lose ownership and you leak the registration — or strip cancellation entirely and never get it back.
Value semantics (2, 3, 11): string keys collide, struct fields freeze one request's ctx into a long-lived object, nil ctx panics on the first method dispatch. The doc on Context (context.go:36-44) and on WithValue (context.go:715-727) writes the rules; the linter ecosystem (go vet, staticcheck, containedctx, contextcheck) enforces them at CI time. The stdlib itself uses var cancelCtxKey int (context.go:374) — an unexported named type — as the canonical pattern.
Cancellation semantics (5, 6, 7, 8, 12): busy-loop on Done, swallow ctx.Err(), immediate-cancel via zero/past deadline, race on custom Context implementations. Done() is a one-shot broadcast (closed at context.go:564-568); after close, every receive returns the zero value forever. Err() (context.go:463) is the only way to learn why. WithDeadline (context.go:625) and WithTimeout (context.go:703) short-circuit on non-positive durations — boundary inputs must be validated by the caller, not by the constructor.
Propagation (4, 9): substituting Background() or TODO() mid-stack throws the parent's deadline, cancellation, and values away. emptyCtx (context.go:181-197) returns zero from every method. Static rule: if the function already takes a ctx context.Context, the function does not construct a new root — it threads what it received.
Review checklist for any context-using PR:
- Does every
WithCancel/WithTimeout/WithDeadline/WithCancelCausehave a reachabledefer cancel()on every return path?go vet'slostcancelis the cheapest catcher. - Are all
context.WithValuekeys unexported named types (e.g.,type userIDKey struct{}), neverstringor any built-in?go vet'scontextkeyflags it. - Does any struct hold a
context.Contextfield? If yes, refactor — pass ctx as the first parameter of each method instead.containedctxlints this. - Does any function that receives a
ctx context.Contextever createcontext.Background()orcontext.TODO()mid-body?contextcheckflags it; replace with the received ctx or withcontext.WithoutCancel(ctx)if the detachment is deliberate. - Does every
<-ctx.Done()arm of aselectreturn(orbreakout of the surrounding loop)? An empty arm is a busy spin. - When a
<-ctx.Done()fires, does the error path returnctx.Err()(or wrap it with%w) so callers canerrors.Is(err, context.DeadlineExceeded)? - Are all
WithTimeout/WithDeadlinedurations validated> 0and deadlines validated.After(time.Now())before the call? Boundary inputs (env vars, JSON0, zerotime.Time) must be screened. - Are
nilcontexts forbidden at every callsite? Usecontext.Background()(deliberate root) orcontext.TODO()(temporary, with a TODO comment); nevernil. - If a function returns a context it created, does it also return the matching
cancel? Match the stdlib shape(Context, CancelFunc). - If
context.WithoutCancelis used, is the detached context re-bounded with a freshWithTimeoutbefore any I/O? - Are there any custom
Contextimplementations? If yes, do they compose via embedding without overridingDone/Err, or do they atomically gate transitions under a single mutex the waycancelCtxdoes (context.go:549-583)?