Goroutine Lifecycle — Find the Bug¶
Each section presents bugged code, asks you to find the lifecycle problem, then explains the bug and shows the fix. Read carefully — most bugs in this file are subtle and represent real production failures.
Table of Contents¶
- Bug 1: The Mystery of the Phantom Receivers
- Bug 2:
wg.AddAftergo - Bug 3: The Loop Variable Captures (Pre-1.22)
- Bug 4: Unbuffered Result Channel
- Bug 5: Recover in the Wrong Goroutine
- Bug 6: The Context Without
DoneCheck - Bug 7:
time.TickLives Forever - Bug 8:
LockOSThreadWithoutUnlockOSThread - Bug 9: The Self-Joining WaitGroup
- Bug 10: The Reconnect Spawn Loop
- Bug 11: The Finalizer That Blocks
- Bug 12: Goroutine Started in Init
- Bug 13: The Forgotten
cancel - Bug 14: The
selectThat Always Picks Default - Bug 15: Spawning From a Spawn
Bug 1: The Mystery of the Phantom Receivers¶
func fetchAll(urls []string) []Result {
results := make(chan Result)
for _, u := range urls {
u := u
go func() {
results <- fetch(u)
}()
}
collected := []Result{}
for r := range results {
collected = append(collected, r)
}
return collected
}
Symptom. Function never returns.
Find the bug. Why does for r := range results never end?
Answer
`for r := range results` exits only when the channel is *closed*. Nothing in the code closes it. Each sending goroutine sends once and exits, but the receive loop blocks forever after receiving `len(urls)` values. **Fix:**func fetchAll(urls []string) []Result {
results := make(chan Result, len(urls))
var wg sync.WaitGroup
for _, u := range urls {
u := u
wg.Add(1)
go func() {
defer wg.Done()
results <- fetch(u)
}()
}
go func() {
wg.Wait()
close(results)
}()
collected := []Result{}
for r := range results {
collected = append(collected, r)
}
return collected
}
Bug 2: wg.Add After go¶
func process(items []Item) {
var wg sync.WaitGroup
for _, it := range items {
it := it
go func() {
wg.Add(1)
defer wg.Done()
handle(it)
}()
}
wg.Wait()
fmt.Println("done")
}
Symptom. Sometimes "done" prints before all items are processed.
Find the bug.
Answer
`wg.Wait()` reads the counter when it runs. If `Wait()` runs before *any* of the goroutines have called `wg.Add(1)`, it sees counter = 0 and returns immediately. **Fix:** Always `wg.Add(1)` in the parent before `go`:Bug 3: The Loop Variable Captures (Pre-1.22)¶
func main() {
var wg sync.WaitGroup
for i := 0; i < 5; i++ {
wg.Add(1)
go func() {
defer wg.Done()
fmt.Println(i)
}()
}
wg.Wait()
}
Symptom. On Go 1.21, prints 5 5 5 5 5 instead of some permutation of 0..4.
Find the bug.
Answer
Before Go 1.22, all iterations of the loop share the *same* `i` variable. By the time the goroutines run, `i == 5`. **Fix (works on every Go version):** Or in Go 1.22+, this works by default because each iteration has a fresh `i`. This is a *lifecycle* bug because the goroutines' captured state changes between birth and run.Bug 4: Unbuffered Result Channel¶
func first(urls []string) Result {
results := make(chan Result) // unbuffered
for _, u := range urls {
u := u
go func() {
results <- fetch(u) // each sender blocks waiting for a receive
}()
}
return <-results // read the first
}
Symptom. Function returns correctly but runtime.NumGoroutine rises by len(urls) - 1 for every call.
Find the bug.
Answer
`first` reads exactly one value. The remaining `len(urls) - 1` goroutines are stuck forever on `results <- fetch(u)` — the channel is unbuffered and no one else reads. Leak. **Fix:** Buffer the channel: Now every sender completes regardless of whether anyone reads. The goroutines end naturally. Or use a `context.Context` to cancel the rest: The deferred `cancel()` signals all losers to give up.Bug 5: Recover in the Wrong Goroutine¶
func safe(fn func()) {
defer func() {
if r := recover(); r != nil {
log.Printf("recovered: %v", r)
}
}()
go fn() // spawn fn in a new goroutine
}
Symptom. When fn panics, the program crashes anyway.
Find the bug.
Answer
`recover` only catches panics in *its own* goroutine. The deferred `recover` here is in `safe`'s goroutine; the panic happens in `fn`'s goroutine, which is a different one. The deferred recover never sees it. **Fix:** Move recover into the goroutine:Bug 6: The Context Without Done Check¶
Symptom. When ctx is canceled, the worker keeps processing until jobs is closed.
Find the bug.
Answer
The worker does not check `ctx.Done()`. Cancellation has no effect on this goroutine unless the producer also stops sending and closes `jobs`. **Fix:** Now the worker exits on either context cancellation or channel close.Bug 7: time.Tick Lives Forever¶
func heartbeat() {
for t := range time.Tick(time.Second) {
sendHeartbeat(t)
}
}
func main() {
go heartbeat()
runApp()
}
Symptom. When runApp returns, the program does not exit. Or, in a more complex setting, heartbeat goroutines leak between test runs.
Find the bug.
Answer
`time.Tick` returns a channel that the runtime never stops. There is no way to stop it without exiting the goroutine. Combined with the fact that the goroutine has no exit condition (no context check), it leaks. Worse: each call to `time.Tick` allocates a new `runtime.Timer`. If `heartbeat` is invoked repeatedly, you accumulate timers. **Fix:** Use `time.NewTicker` plus a context: `Stop()` releases the timer; context cancellation ends the goroutine.Bug 8: LockOSThread Without UnlockOSThread¶
func renderFrame() {
runtime.LockOSThread()
initGL()
drawFrame()
}
func main() {
for {
renderFrame()
time.Sleep(16 * time.Millisecond)
}
}
Symptom. After running for a while, the process has many OS threads in ps -L. Performance degrades.
Find the bug.
Answer
`renderFrame` calls `LockOSThread` but never `UnlockOSThread`. When the function returns, the goroutine is still pinned to the thread. Each subsequent call to `renderFrame` (back on the same goroutine — `main`'s goroutine) re-locks; the lock counter goes up. Eventually if this goroutine dies, the thread is killed. Worse if `renderFrame` is called in fresh goroutines:func main() {
for {
go renderFrame() // each call pins a new thread; goroutine exits; thread dies
time.Sleep(16 * time.Millisecond)
}
}
func renderLoop(ctx context.Context) {
runtime.LockOSThread()
defer runtime.UnlockOSThread()
initGL()
ticker := time.NewTicker(16 * time.Millisecond)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
drawFrame()
}
}
}
func main() {
ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt)
defer cancel()
go renderLoop(ctx)
<-ctx.Done()
}
Bug 9: The Self-Joining WaitGroup¶
func parallel(tasks []func()) {
var wg sync.WaitGroup
for _, t := range tasks {
t := t
wg.Add(1)
go func() {
defer wg.Done()
wg.Wait() // wait for "everyone else" first
t()
}()
}
wg.Wait()
}
Symptom. Deadlock.
Find the bug.
Answer
Each goroutine calls `wg.Wait()`. The waitgroup counter is `len(tasks)`. Every goroutine is waiting for the counter to reach 0, but no one calls `Done` until *after* `Wait` returns. Classic self-deadlock. **Fix:** This pattern usually has no use. If you wanted to synchronize a start point, use a `sync.WaitGroup` with `Add(1)` and `Done` on a separate "ready" group, or a `chan struct{}` as a start barrier.Bug 10: The Reconnect Spawn Loop¶
func consumeForever(ctx context.Context, broker string) {
for {
conn, err := dial(broker)
if err != nil {
time.Sleep(time.Second)
continue
}
go consume(ctx, conn)
time.Sleep(30 * time.Second) // health-check interval
}
}
Symptom. runtime.NumGoroutine grows by ~2 every minute.
Find the bug.
Answer
Each iteration spawns a *new* `consume` goroutine, but never stops the previous one. Even if the connection dies, the old `consume` goroutine may still be running (especially if it's blocked on something). After an hour, dozens of leaked consumers each hold a connection. **Fix:** Tie each consumer's lifecycle to its own context and cancel before reconnect:func consumeForever(ctx context.Context, broker string) {
for ctx.Err() == nil {
connCtx, cancel := context.WithCancel(ctx)
conn, err := dial(broker)
if err != nil {
cancel()
select {
case <-time.After(time.Second):
continue
case <-ctx.Done():
return
}
}
done := make(chan struct{})
go func() {
defer close(done)
consume(connCtx, conn)
}()
select {
case <-done:
// consumer exited; loop back to reconnect
case <-ctx.Done():
cancel()
<-done
return
}
cancel()
}
}
Bug 11: The Finalizer That Blocks¶
type Resource struct {
conn net.Conn
}
func NewResource(addr string) *Resource {
conn, _ := net.Dial("tcp", addr)
r := &Resource{conn: conn}
runtime.SetFinalizer(r, func(r *Resource) {
r.conn.Close() // may block if peer is slow
})
return r
}
Symptom. Under load, finalizers seem to "stop running"; memory grows.
Find the bug.
Answer
Finalizers run one-at-a-time on a dedicated runtime goroutine. If `r.conn.Close()` blocks (e.g., the peer has a half-closed TCP and the kernel waits for an ACK), the finalizer goroutine is stuck. Every other queued finalizer waits. **Fix:** Finalizers should be fast and non-blocking. For potentially-slow cleanup, defer it to a separate goroutine: Better still: do not rely on finalizers. Have an explicit `Close()` method. Use finalizers only as a backstop:Bug 12: Goroutine Started in Init¶
var bg = startBackground()
func startBackground() *Worker {
w := &Worker{}
go w.run() // started during package init
return w
}
Symptom. Tests in the package report leaks. Code that imports this package gets a goroutine "for free."
Find the bug.
Answer
`init()`-time goroutines have no lifecycle owner. They start when the package is imported and run forever, unless the program exits. Tests find them as "leaked." Reusable libraries that do this are widely considered impolite. **Fix:** Make the start explicit:type Background struct {
ctx context.Context
cancel context.CancelFunc
wg sync.WaitGroup
}
func StartBackground(ctx context.Context) *Background {
ctx, cancel := context.WithCancel(ctx)
b := &Background{ctx: ctx, cancel: cancel}
b.wg.Add(1)
go func() {
defer b.wg.Done()
b.run(ctx)
}()
return b
}
func (b *Background) Stop() {
b.cancel()
b.wg.Wait()
}
Bug 13: The Forgotten cancel¶
func fetch(parent context.Context, url string) ([]byte, error) {
ctx, _ := context.WithTimeout(parent, 5*time.Second)
return doRequest(ctx, url)
}
Symptom. go vet warns: the cancel function returned by context.WithTimeout should be called, not discarded. Memory rises slowly under load.
Find the bug.
Answer
`context.WithTimeout` returns a `cancel` function. If you do not call it, the context's internal timer goroutine and the parent's children list retain a reference until the timeout fires. Under load, you accumulate slow-to-release contexts. **Fix:** `defer cancel()` releases on every return path, including panic.Bug 14: The select That Always Picks Default¶
func worker(ctx context.Context, jobs <-chan Job) {
for {
select {
case <-ctx.Done():
return
case j := <-jobs:
process(j)
default:
time.Sleep(time.Microsecond)
}
}
}
Symptom. When ctx is canceled, the worker doesn't exit promptly. CPU usage is high (the loop runs constantly).
Find the bug.
Answer
`default` is selected immediately if neither of the other cases is ready. The `time.Sleep(time.Microsecond)` reduces but does not eliminate the busy loop. Worse: `ctx.Done()` is only checked every microsecond, so cancellation has perceptible latency. **Fix:** Remove `default`: Now the `select` blocks until one of the two cases is ready. No CPU waste, immediate cancellation. A `default` clause in a `for-select` is almost always a bug. Use it only if you specifically want non-blocking behavior.Bug 15: Spawning From a Spawn¶
type Server struct {
requests chan Request
}
func (s *Server) Run(ctx context.Context) {
for {
select {
case <-ctx.Done():
return
case req := <-s.requests:
go func() {
resp := s.process(req)
req.reply <- resp
}()
}
}
}
Symptom. Under load, goroutines accumulate. Some clients never receive a response.
Find the bug.
Answer
Each request spawns an orphan goroutine. The orphan: 1. Has no lifecycle owner; if `req.reply` is never read (client disconnected), the goroutine blocks forever sending. 2. Is not bounded: under high load, you spawn unbounded goroutines. 3. Ignores `ctx`: when the server shuts down, in-flight goroutines keep running. **Fix:** Use a worker pool or a `context.Context` plus buffered reply:func (s *Server) Run(ctx context.Context) {
const workers = 16
var wg sync.WaitGroup
for i := 0; i < workers; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for {
select {
case <-ctx.Done():
return
case req := <-s.requests:
resp := s.process(ctx, req)
select {
case req.reply <- resp:
case <-ctx.Done():
return
}
}
}
}()
}
wg.Wait()
}
Summary¶
Common lifecycle bug patterns:
- Missing close path —
for rangenever exits because no one closes. - Add after go — race between
wg.Addandwg.Wait. - Captured loop variable — pre-1.22, all goroutines see the final value.
- Unbuffered result channel — losers in a race leak.
- Recover in wrong goroutine — only catches panics in its own goroutine.
- No
ctx.Done()check — cancellation has no effect. time.Tick/ leaking timers — useNewTicker+Stop.LockOSThreadwithoutUnlock— thread destroyed on goroutine death.- Self-deadlock with WaitGroup — group waits on itself.
- Reconnect spawn — each retry leaks the previous goroutine.
- Blocking finalizer — backs up the finalizer queue.
- Init-time goroutines — no lifecycle owner.
- Discarded
cancel()— slow context release. defaultinfor-select— busy loop and slow cancellation.- Orphan goroutines — spawned without lifecycle parent.
Every bug here has been seen in production. The cure is the same: answer "when does this goroutine end?" before you write go .... Read 03-preventing-leaks for the patterns that make these bugs structurally impossible.