Goroutine Common Pitfalls — Junior Level¶

Table of Contents¶

Introduction
Prerequisites
Glossary
Core Concepts
Real-World Analogies
Mental Models
Pros & Cons
Use Cases
Code Examples
Coding Patterns
Clean Code
Product Use / Feature
Error Handling
Security Considerations
Performance Tips
Best Practices
Edge Cases & Pitfalls
Common Mistakes
Common Misconceptions
Tricky Points
Test
Tricky Questions
Cheat Sheet
Self-Assessment Checklist
Summary
What You Can Build
Further Reading
Related Topics
Diagrams & Visual Aids

Introduction¶

Focus: "Here are the wrong-looking goroutines that every Go programmer ships at least once. Learn the shape, learn the fix, and learn the question you should have asked before writing them."

If you have read the previous five subsections, you can spawn goroutines, give them work, and wait for them with a WaitGroup or a channel. You also know that they are cheap, that they live in user space, that they share memory, and that an unrecovered panic kills the process.

What this subsection adds is a single skill: pattern recognition for failure. Goroutine bugs are not random typos — they come from a small number of repeating shapes. Once you know the shapes, you can spot them on a code review at a glance:

A for loop with go func() that closes over a variable instead of taking it as a parameter
A wg.Add(1) written inside the goroutine body, after go
A time.Sleep(time.Second) used to "wait for the goroutines to finish"
A go someTask() with no obvious answer to "how does this goroutine exit?"
A chan int that has a sender but no guaranteed reader
A panic in a worker goroutine with no recover at the boundary
A map[string]int that two goroutines write to without a Mutex
A time.After(...) inside a select loop that ticks every iteration
A context.WithCancel whose cancel is never called
A defer piling up inside a tight loop in a long-running goroutine
A mu.Lock() followed by a blocking network call

This file gives each of these its own example, root cause, and fix. By the end you should be able to read someone else's pull request and feel a little chill — the correct chill — at every one of these patterns.

A note on scope. This file catalogues pitfalls. Some, especially goroutine leaks, are deep enough that they deserve their own subsection. In those cases you will see a pointer to the deeper coverage. The goal here is fast recognition, not exhaustive analysis.

Prerequisites¶

Required: You have read 01-overview, 02-vs-os-threads, 03-stack-growth, 04-runtime-management, and 05-best-practices in this same goroutines section.
Required: You can spawn a goroutine, use sync.WaitGroup, and you understand that channels carry values between goroutines.
Required: Familiarity with closures and function literals — many pitfalls live in the closure body.
Required: Working go toolchain, preferably 1.22 or newer. The loop-variable pitfall behaves differently before 1.22.
Helpful: You have run go test -race at least once and have seen the race detector report a race.
Helpful: Awareness that context.Context exists. It is covered in depth later, but it appears in a handful of pitfalls here.

If you can write a short program that spawns ten goroutines, waits for them, and prints a counter — and you understand why a bare counter++ from many goroutines is wrong — you are ready.

Glossary¶

Term	Definition
Pitfall	A wrong-looking but compilable code pattern that produces incorrect behaviour at runtime. Not a syntax error.
Captured loop variable	A variable declared in a `for` header that a closure references; before Go 1.22 a single variable was shared across iterations, causing all goroutines to read the same value.
Goroutine leak	A goroutine that does not exit when it should, holding memory and resources until the program ends.
Data race	Two goroutines accessing the same memory location without synchronisation, where at least one is a write. Detected with `go test -race`.
Deadlock	All goroutines are blocked waiting on each other; nothing can make progress. Runtime detects the "all goroutines asleep" case and panics.
Live lock	All goroutines are running but no useful work happens (busy loops, ping-pong). Worse than deadlock — looks healthy on a CPU graph.
Concurrent map access	Two or more goroutines reading/writing a built-in `map` simultaneously. The runtime detects this and aborts the program with `fatal error: concurrent map writes`.
Double close	Calling `close(ch)` on the same channel twice. Panics with `close of closed channel`.
Send on closed channel	Sending a value on a channel after `close(ch)`. Panics with `send on closed channel`.
`recover`	Built-in function that stops a panic in progress, but only when called from a deferred function in the same goroutine.
`sync.Once`	A primitive that runs a function exactly one time, no matter how many goroutines call it. The canonical singleton initialiser.
`context.Context`	The standard idiom for carrying cancellation, deadlines, and request-scoped values across goroutine boundaries.
`atomic`	The `sync/atomic` package, providing lock-free reads and writes of integer and pointer values. Must not be mixed with non-atomic access to the same variable.

Core Concepts¶

Pitfalls are not bugs in the runtime — they are bugs in the contract¶

The Go runtime is conservative. It does not prevent races, leaks, or panics; it only detects a few of them at runtime (the race detector, concurrent map access, deadlock with no runnable goroutines). Everything else — exit conditions, ownership, lifetime — is your contract to enforce. A "common pitfall" is a place where the contract is fragile and easy to violate without noticing.

Once you internalise this, every pitfall becomes a question:

"How does this goroutine exit?"
"Who owns this variable? Who reads, who writes, under what lock?"
"Who closes this channel? Who guarantees the close happens?"
"What happens if this function panics? What happens if it returns an error?"
"What happens if the caller cancels?"

If you cannot answer all five, you have a pitfall waiting to happen.

Most pitfalls split into three families¶

It helps to file pitfalls into mental categories:

Lifetime bugs — the goroutine starts and never stops, or stops too soon. Examples: leaks via unread channels, missing cancel(), ticker not stopped.
Ordering bugs — two goroutines do things in an unintended order. Examples: wg.Add inside the goroutine, time.Sleep as synchronisation, send-before-close races.
Sharing bugs — two goroutines touch shared state unsafely. Examples: captured loop variable, concurrent map access, atomic+non-atomic mixing.

Most of the file maps to one of these. When you see a new bug, ask: "Is this lifetime, ordering, or sharing?" The fix usually lives in that family.

Pitfalls compound¶

A single bug is usually obvious. Real-world pitfalls compound: a captured loop variable shows up as wrong output, but only sometimes, because the data race made the actual symptom intermittent. A leaked goroutine causes the heap to grow slowly, but you only notice when an unrelated allocation pushes the GC over a threshold. The Go runtime almost never gives you a stack trace pointing at "the bug" — instead you get a downstream symptom and have to walk back.

This is why the bug catalogue matters. Knowing the shapes lets you walk back faster.

`go` is a one-way door¶

There is no "undo." Once go f() executes, the goroutine is scheduled. You cannot cancel it from outside without explicit cooperation (a channel, a context.Context, an atomic flag the goroutine polls). Any pitfall that says "stop the goroutine" requires that the goroutine wanted to be stopped — that it polls a stop signal.

Real-World Analogies¶

Pitfalls are like loaded but uncocked traps¶

Each pitfall is a tripwire. The code compiles and may even run correctly on your laptop. The trap fires only when conditions align: high load, a particular OS, a specific Go version, a slow disk, a flaky network. The job of this file is to point at each tripwire and say "do not step here, even when nothing seems to happen."

A leaked goroutine is like a tab you forgot to close¶

A browser tab open in the background uses no visible CPU. It holds memory, a TCP connection, a JS context. Multiply by a hundred tabs and your laptop fan spins. A leaked goroutine is the same — invisible while small, lethal at scale.

A captured loop variable is like a passing baton with shared writing on it¶

Imagine relay runners passing a baton. Each runner writes their number on the baton with a single shared marker. By the time you read the baton, only the last writer's number is visible. That is closure capture of a single shared variable.

A `wg.Add(1)` inside the goroutine is like sending an invitation after the meeting started¶

The meeting is wg.Wait() — it begins when the counter is zero. If you send the invitation (Add) after the meeting started, the meeting is already over. The goroutine arrives to find an empty room.

A `time.Sleep` for synchronisation is like setting a kitchen timer for "hopefully long enough"¶

Sometimes the cake bakes in 30 minutes. Sometimes 45. If your timer is set to 35 minutes, sometimes you eat raw cake. The fix is a thermometer (WaitGroup), not a longer timer.

Mental Models¶

Model 1: The "exit slip" rule¶

Every goroutine must have an answer to one question on its exit slip: "When does this goroutine return?" If you write go f() and cannot answer the question in one sentence, you have a leak waiting to happen. Acceptable answers include:

"When f returns, which it does after processing one item."
"When ctx.Done() fires."
"When the input channel is closed and drained."
"When done is closed."

Unacceptable answers:

"Eventually."
"When the program exits."
"I think it returns somewhere."

Model 2: Pitfalls live in pairs¶

Most pitfalls have a symmetric pair that is also wrong:

wg.Add outside, wg.Done outside → underflow / never zero
wg.Add inside the goroutine → race with Wait
wg.Add outside, wg.Done inside → correct
wg.Add outside, wg.Done missing → Wait blocks forever

The mental shortcut: Add belongs to the parent (it knows how many to spawn). Done belongs to the child (it knows when it has finished its own work).

Model 3: "The race detector is a sieve, not a fence"¶

go test -race catches observed races. It does not catch races that did not happen during the test run. A 99% safe codebase can still have a 1%-probability race that takes down production. The race detector raises confidence; it does not prove safety. Pair it with code review for the patterns in this file.

Model 4: "If you cannot explain ownership, you cannot explain correctness"¶

For every shared variable, you must be able to say: "Goroutine X owns this for read. Goroutine Y owns this for write. The handoff happens at channel send / mutex unlock / context cancel." Variables without owners are bugs.

Pros & Cons¶

This is a meta-section: the "pros and cons of knowing pitfalls."

Pros of learning pitfalls explicitly¶

You read code faster. You spot bad patterns visually, without running.
You write fewer bugs. Pattern recognition flips into pattern avoidance.
You debug faster. When a real bug shows up, the catalogue narrows the search.
You review better. PR reviews become "I see the captured variable on line 42" instead of "looks fine to me."
You design defensively. Knowing how WaitGroup.Add can race makes you write Add outside the goroutine on the first try.

Cons / risks¶

You can become paranoid. Not every shared variable needs a mutex. Some are immutable, some are owned by one goroutine.
You can over-engineer. Adding a context and an errgroup and a buffered result channel to a 10-line script is wasteful.
You can mistake symptoms for causes. A program that prints 5 5 5 could be the loop-variable bug or could be a logic bug that always assigns 5. The catalogue is a tool; thinking is still required.

Use Cases¶

This subsection's "use cases" are the situations where you should reach for it:

Situation	How this file helps
Reviewing a PR with `go func()`	Skim the bug catalogue for any matching shape.
Debugging "program hangs on shutdown"	Check the goroutine leak section, the missing `cancel()` section, and `wg.Done` audit.
Debugging "program crashes with no message"	Look at panic-in-goroutine, concurrent map access, double close.
Tracking a slow memory leak	Goroutine leaks, ticker not stopped, `time.After` in select loop.
"Tests pass but production is flaky"	Captured loop variable, `time.Sleep` for synchronisation, atomic mixing.
Onboarding a new Go engineer	This file as a checklist.

Code Examples¶

The examples below are organised by pitfall. Each one has the broken form, the symptom, the cause, and the fix.

Example 1: Captured loop variable (the most common Go concurrency bug)¶

package main

import (
    "fmt"
    "sync"
)

func main() {
    var wg sync.WaitGroup
    for i := 0; i < 5; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            fmt.Println(i)
        }()
    }
    wg.Wait()
}

Symptom (Go ≤ 1.21). Output is some permutation of 5 5 5 5 5, or sometimes 3 5 5 5 5, never 0 1 2 3 4.

Symptom (Go 1.22+). Output is some permutation of 0 1 2 3 4. The change in semantics fixed the bug for the loop-variable case specifically.

Cause. Pre-1.22, the variable i is a single variable shared across all iterations. The closure captures it by reference. By the time goroutines run, the loop has finished and i == 5. Go 1.22 made each iteration create a new i, which closures bind to.

Fix that works on all versions.

for i := 0; i < 5; i++ {
    wg.Add(1)
    go func(i int) {        // parameter shadow
        defer wg.Done()
        fmt.Println(i)
    }(i)                    // pass the current value
}

The parameter i is its own variable in each goroutine's frame. Even on Go 1.5 from 2015 this works.

Why prefer the parameter even on 1.22+? Three reasons. First, the code is portable to old codebases. Second, the explicit pass shows intent: I am snapshotting this value. Third, the same pattern applies to variables that are not in the loop header — like for _, x := range items { x := x; go ... } — where you still need the shadow trick.

Example 2: `wg.Add(1)` inside the goroutine¶

var wg sync.WaitGroup
for i := 0; i < 10; i++ {
    go func() {
        wg.Add(1)           // BUG: races with Wait
        defer wg.Done()
        doWork()
    }()
}
wg.Wait()

Symptom. Sometimes Wait returns immediately, before any goroutine has called Done. The "all workers done" message prints with workers still running.

Cause. wg.Wait() runs concurrently with the goroutines. If the scheduler runs Wait before any goroutine has executed its Add, the counter is still zero — Wait returns immediately. The race is fundamental: the parent does not know whether the child has reached Add yet.

Fix.

for i := 0; i < 10; i++ {
    wg.Add(1)               // in the parent, before go
    go func() {
        defer wg.Done()
        doWork()
    }()
}
wg.Wait()

Add in the parent runs in serial with the for loop; by the time Wait is called, the counter is 10. Then Done in each goroutine decrements it.

Rule of thumb. Add in the parent, Done in the child. Never the reverse.

Example 3: Forgetting `wg.Done()`, or forgetting `defer`¶

var wg sync.WaitGroup
wg.Add(1)
go func() {
    if condition {
        return              // BUG: forgot to call Done
    }
    work()
    wg.Done()
}()
wg.Wait()

Symptom. Wait blocks forever. If this is the main goroutine, the runtime reports a deadlock.

Cause. The early return path skipped wg.Done(). The counter stays positive forever.

Fix.

go func() {
    defer wg.Done()         // runs on every exit path, including panic
    if condition {
        return
    }
    work()
}()

defer wg.Done() immediately after the parent's Add is the only style that survives early returns and panics.

Example 4: `time.Sleep` as synchronisation¶

go heavyWork()
time.Sleep(500 * time.Millisecond)
fmt.Println("hopefully done")

Symptom. Tests pass on the developer's machine, fail on a loaded CI runner. Or pass for a year and then fail in production on a slow VM.

Cause. time.Sleep is a hope, not a guarantee. The goroutine may take 600 ms, 2 s, or never finish. The program prints "hopefully done" with the work still running.

Fix. Use WaitGroup, a result channel, or errgroup.Group.

done := make(chan struct{})
go func() {
    heavyWork()
    close(done)
}()
<-done
fmt.Println("definitely done")

When sleep is okay. In a test for a timing-dependent assertion (and even then, prefer eventually polling). In a for { ...; time.Sleep(interval); } loop where the sleep is the whole purpose, not a synchronisation primitive.

Example 5: Unrecovered panic kills the program¶

go func() {
    parseUserInput(input)   // can panic on malformed input
}()

Symptom. A single bad input crashes the entire HTTP server. Other goroutines mid-request die too.

Cause. A panic inside any goroutine, with no recover in that goroutine's defer chain, propagates to runtime termination. Recovering elsewhere does not help.

Fix. Wrap the goroutine body with a recover at the boundary.

go func() {
    defer func() {
        if r := recover(); r != nil {
            log.Printf("worker panic: %v\n%s", r, debug.Stack())
        }
    }()
    parseUserInput(input)
}()

Better fix. Make parseUserInput not panic on user input. Panics belong to programmer errors; bad input should return error. Use recover as defence in depth, not as an error-handling strategy.

Example 6: Goroutine leak — sender blocked forever¶

func compute() int {
    ch := make(chan int)    // unbuffered
    go func() {
        ch <- doWork()      // blocks until someone receives
    }()
    if cachedValue != nil {
        return *cachedValue // never read ch — leak
    }
    return <-ch
}

Symptom. runtime.NumGoroutine() slowly increases over time. Heap grows. Eventually OOM.

Cause. When cachedValue != nil, the function returns without receiving from ch. The goroutine's send blocks forever; the goroutine is leaked.

Fix 1: buffer the channel.

ch := make(chan int, 1)

The send completes whether or not anyone reads. If no one reads, the goroutine exits and the buffered value is collected with the channel.

Fix 2: always read.

result := <-ch
if cachedValue != nil {
    return *cachedValue
}
return result

Now the function always drains ch before returning.

For deep coverage of leak shapes, see the planned 07-goroutine-lifecycle-leaks chapter (coming soon).

Example 7: Goroutine leak — receiver on never-closed channel¶

ch := make(chan Item)
go func() {
    for item := range ch {
        process(item)
    }
}()
// ... feed values for a while ...
// program continues, never closes ch — goroutine lives forever

Symptom. Same as Example 6: leak accumulating.

Cause. for v := range ch exits only when ch is closed. If it is never closed, the receiver waits forever.

Fix. Close the channel when the sender is done.

defer close(ch)

If there are multiple senders, use the "single closer" pattern: one goroutine that knows when all senders are done (often a WaitGroup waiter) closes the channel.

Example 8: Goroutine leak — ticker not stopped¶

go func() {
    t := time.NewTicker(time.Second)
    for {
        select {
        case <-t.C:
            doTick()
        }
    }
}()

Symptom. Goroutine and ticker both leak. If created in a loop or a per-request handler, leak accumulates.

Cause. for { select { ... } } with only one case has no exit path. The ticker also has no Stop() call, so its internal goroutine and channel leak too.

Fix.

go func() {
    t := time.NewTicker(time.Second)
    defer t.Stop()
    for {
        select {
        case <-ctx.Done():
            return
        case <-t.C:
            doTick()
        }
    }
}()

Always defer t.Stop() after time.NewTicker(...). Always have an exit case in the select.

Example 9: Concurrent map access — fatal error, not a race¶

m := map[string]int{}
for i := 0; i < 100; i++ {
    go func(i int) {
        m[fmt.Sprintf("k%d", i)] = i
    }(i)
}

Symptom. Program crashes with fatal error: concurrent map writes. This is a runtime-detected fatal error; the process exits immediately. It is not a panic — recover does not catch it.

Cause. Go's built-in map is not safe for concurrent use. The runtime adds explicit checks to detect concurrent writes and aborts the program.

Fix 1: lock around access.

var mu sync.Mutex
m := map[string]int{}
for i := 0; i < 100; i++ {
    go func(i int) {
        mu.Lock()
        m[fmt.Sprintf("k%d", i)] = i
        mu.Unlock()
    }(i)
}

Fix 2: use sync.Map for the right shape of workload.

var m sync.Map
m.Store("k1", 42)
v, _ := m.Load("k1")

sync.Map is optimised for "write once, read many" or "disjoint key sets per goroutine." It is slower than map + Mutex for typical workloads.

Example 10: Capturing pointers across goroutines¶

type Request struct{ Data []byte }

for _, r := range requests {
    go process(&r)          // BUG: &r is the same address on every iteration (pre-1.22)
}

Symptom. All goroutines see the last request's data, or random subsets.

Cause. Same family as the captured loop variable: r is a single variable that the for ... range rebinds. &r is the same address every time. By the time the goroutines run, the slot holds whatever the loop left in it.

Fix.

for _, r := range requests {
    r := r              // shadow
    go process(&r)
}

Or pass by value if possible: go func(r Request) { process(&r) }(r).

Example 11: Double-close of a channel¶

ch := make(chan int)
close(ch)
close(ch)                   // panic: close of closed channel

Symptom. Panic. Cannot be recovered if it propagates out of a deferred function.

Cause. Closing the same channel twice is a programmer error. Often shows up when multiple senders each "close on exit" with defer close(ch).

Fix. Single closer pattern: exactly one place in the program closes a given channel. See 02-channels/06-closing-channels for the full treatment.

Example 12: Send on closed channel¶

ch := make(chan int)
close(ch)
ch <- 1                     // panic: send on closed channel

Symptom. Panic. Common when the closer closes too early or the senders did not observe the close.

Cause. Sending on a closed channel is always a panic. The language does not let senders test "is this channel closed?" reliably from outside — they must coordinate.

Fix. Reorder lifetimes so all senders finish before the channel is closed. Often via wg.Wait(); close(ch) in a goroutine that owns the close.

Example 13: Closing a receive-only channel¶

func consume(ch <-chan int) {
    close(ch)               // compile error: cannot close receive-only channel
}

Symptom. Compile error. This is the easy version of pitfall 12 — the compiler catches it.

Cause. Receive-only channels (<-chan T) can only be received from. The compiler enforces directionality.

Fix. Either keep the channel bidirectional inside the closing function, or restructure ownership so closing happens before the channel is narrowed to receive-only.

Example 14: `time.After` in a select loop¶

for {
    select {
    case msg := <-msgs:
        process(msg)
    case <-time.After(time.Second):
        return              // timeout
    }
}

Symptom. Memory grows under load. Each iteration of the for loop creates a new Timer from time.After. If msgs is busy, the timeout case is never selected, but the timer is not garbage-collected until it fires (1 second later).

Cause. time.After returns a channel from a fresh Timer. The timer is alive in the runtime's heap until it fires. Allocating one per iteration when iterations are fast is a slow leak.

Fix.

timer := time.NewTimer(time.Second)
defer timer.Stop()
for {
    select {
    case msg := <-msgs:
        process(msg)
        if !timer.Stop() {
            <-timer.C
        }
        timer.Reset(time.Second)
    case <-timer.C:
        return
    }
}

Reusing a single Timer avoids the per-iteration allocation. The Reset dance is awkward but correct. Go 1.23 made Reset simpler — see the release notes.

For the full chapter on timers and tickers, see 02-channels and the time package documentation.

Example 15: `defer` inside a tight loop¶

for _, file := range files {
    f, err := os.Open(file)
    if err != nil {
        return err
    }
    defer f.Close()         // BUG: deferred until function returns
    process(f)
}

Symptom. "Too many open files" errors on big inputs. All defers are queued for function return — none run during the loop.

Cause. defer runs at function exit, not loop iteration exit. With 10 000 files, 10 000 open file descriptors accumulate before any closes.

Fix. Extract the body to a function:

for _, file := range files {
    if err := processFile(file); err != nil {
        return err
    }
}

func processFile(name string) error {
    f, err := os.Open(name)
    if err != nil {
        return err
    }
    defer f.Close()         // runs at the end of processFile, per iteration
    return process(f)
}

In the goroutine context: if a long-running goroutine has defers inside a for loop, each iteration adds to the defer chain. Same fix.

Example 16: Forgetting to call `cancel()`¶

ctx, _ := context.WithTimeout(parent, 5*time.Second)
result := doRequest(ctx)
return result

Symptom. go vet warns: "the cancel function is not used." At runtime, the context's timer leaks until 5 seconds after the call.

Cause. WithTimeout (and WithCancel, WithDeadline) return a cancel function that must be called to release resources. If you discard it, the runtime keeps the timer and child contexts alive until the deadline.

Fix.

ctx, cancel := context.WithTimeout(parent, 5*time.Second)
defer cancel()
return doRequest(ctx)

defer cancel() releases resources promptly whether or not the timeout fires.

Example 17: Mutex held over a syscall or unbounded operation¶

var mu sync.Mutex
mu.Lock()
data, err := http.Get(url)  // blocks for who-knows-how-long
defer mu.Unlock()

Symptom. All other goroutines that want the lock are blocked until the network call returns. Latency spikes; throughput collapses.

Cause. The mutex protects a critical section, but the section now includes a network round trip. The lock holder pauses the world for as long as the network takes.

Fix. Move the I/O outside the critical section:

data, err := http.Get(url)
if err != nil { return err }

mu.Lock()
defer mu.Unlock()
state.Update(data)

The mutex protects only the in-memory update.

Example 18: Mixing atomic and non-atomic access¶

var counter int64
go func() {
    atomic.AddInt64(&counter, 1)
}()
go func() {
    fmt.Println(counter)    // plain read, no atomic
}()

Symptom. Race detector flags a race. On some architectures, the read may observe a torn value.

Cause. Atomic operations are a consistent protocol. If one goroutine uses atomic.AddInt64, every other goroutine must use atomic.LoadInt64 to read. Mixing atomic writes with plain reads loses the synchronisation guarantees.

Fix.

fmt.Println(atomic.LoadInt64(&counter))

Or use the typed wrappers introduced in Go 1.19: atomic.Int64, atomic.Pointer[T]. They prevent the mixing mistake at the type level.

var counter atomic.Int64
counter.Add(1)
fmt.Println(counter.Load())

Example 19: Unsynchronised init of a singleton¶

var instance *Service

func Get() *Service {
    if instance == nil {
        instance = &Service{...}    // race if called from multiple goroutines
    }
    return instance
}

Symptom. Race detector flags a race. Multiple goroutines may each construct their own instance, the last one winning; earlier instances are garbage but may have side effects.

Fix. sync.Once.

var (
    instance *Service
    once     sync.Once
)

func Get() *Service {
    once.Do(func() {
        instance = &Service{...}
    })
    return instance
}

sync.Once.Do runs the function exactly once, no matter how many goroutines call Do concurrently. Subsequent calls return immediately.

Example 20: Background goroutine outliving its parent¶

func handleRequest(req *Request) {
    go func() {
        time.Sleep(10 * time.Minute)
        cleanup(req)
    }()
    return
}

Symptom. Each request leaves a goroutine pinning the entire request struct (and any captured context, body, headers) for 10 minutes. Under load this is gigabytes.

Cause. The goroutine outlives the request and references the request's data via closure capture.

Fix. Use context.Context with cancellation, or move the cleanup to a worker pool that batches.

func handleRequest(ctx context.Context, req *Request) {
    select {
    case cleanupQueue <- req:
    case <-ctx.Done():
    }
}

A separate pool drains cleanupQueue at controlled concurrency.

Example 21: WaitGroup reused across goroutines¶

var wg sync.WaitGroup
for i := 0; i < 10; i++ {
    wg.Add(1)
    go func() {
        defer wg.Done()
        work()
    }()
}
wg.Wait()
wg.Add(5)                   // dangerous: reuse before Wait returned in all callers

Symptom. sync.WaitGroup documents: "Note that calls with a positive delta that occur when the counter is zero must happen before a Wait. Calls with a positive delta, or calls with a negative delta that start when the counter is greater than zero, may happen at any time."

In practice: if Add(positive) races with Wait(), behaviour is undefined. The classic bug is reusing a WaitGroup across batches — one goroutine is still in Wait while another calls Add.

Fix. Use a fresh WaitGroup per batch, or guarantee all Waits have returned before the next Add. A sync.Mutex around the batch boundary helps if you must reuse.

Coding Patterns¶

Pattern 1: The "exit slip" comment¶

For every goroutine you spawn, write a one-line comment about how it exits.

// exits when ctx is cancelled
go consumer(ctx, ch)

// exits when ch is closed
go publisher(ch)

// exits when input drains and wg signals
go worker(input, &wg)

The comment is for the reader, but writing it is for you — if you cannot finish the sentence, the goroutine is already a bug.

Pattern 2: Pass-by-parameter, capture-by-nothing¶

for _, item := range items {
    go func(item Item) {
        process(item)
    }(item)
}

Avoid closure capture for anything other than truly shared state. Parameters make ownership and snapshotting explicit.

Pattern 3: `defer cancel()` next to `WithCancel`/`WithTimeout`¶

ctx, cancel := context.WithTimeout(parent, 5*time.Second)
defer cancel()

Always. No exceptions. go vet will warn if you forget.

Pattern 4: Single closer, multiple senders¶

var wg sync.WaitGroup
for i := 0; i < N; i++ {
    wg.Add(1)
    go func() {
        defer wg.Done()
        for _, v := range produce() {
            ch <- v
        }
    }()
}
go func() {
    wg.Wait()
    close(ch)
}()

The producers do not close. A separate goroutine waits for them all and then closes once.

Pattern 5: Buffered result channel of size 1 for "one-shot" goroutines¶

errCh := make(chan error, 1)
go func() {
    errCh <- doWork()
}()
// the send never blocks even if no one reads

Prevents the leak in Example 6.

Clean Code¶

Name your goroutines (in comments). // fetcher, // consumer, // drainer. Helps when reading stack traces.
Keep go func() {...}() bodies short. If the body is more than 30 lines, extract a named function.
One WaitGroup per batch. Do not reuse across phases.
defer the cleanup, always. defer wg.Done(), defer cancel(), defer t.Stop(), defer mu.Unlock().
Cluster spawn and join. When possible, the go and the Wait live in the same function. Lifetime is visible.

Product Use / Feature¶

Product context	Pitfall to watch for
HTTP handler that fires async work	Background goroutine outliving the request (Example 20)
Worker pool fed from a queue	Send-on-closed when the queue closer races senders (Example 12)
Periodic flush goroutine	Ticker not stopped (Example 8), missing `cancel()` (Example 16)
Caching layer	Unsynchronised init (Example 19), concurrent map writes (Example 9)
File-processing job	`defer` in a tight loop (Example 15)
Distributed lock leader election	Mutex over a syscall (Example 17), atomic mixing (Example 18)
Fan-out to N services	Captured loop variable (Example 1), `time.After` in select (Example 14)

Error Handling¶

Pitfalls compound with errors. Two recurring traps:

Trap 1: errors lost in a goroutine¶

go func() {
    err := doWork()
    // err is discarded
}()

The goroutine ran, an error happened, and no caller will ever know. Always route errors back somewhere — a channel, an errgroup, a logger, an atomic pointer.

Trap 2: a panic that should have been an error¶

If doWork can fail on bad input, return an error. Do not panic and rely on a recover somewhere. Panics are for programmer errors (nil dereference in code that should never see nil); errors are for runtime failures.

Pattern: `errgroup` for "all or first failure"¶

import "golang.org/x/sync/errgroup"

g, ctx := errgroup.WithContext(parent)
for _, url := range urls {
    url := url
    g.Go(func() error {
        return fetch(ctx, url)
    })
}
if err := g.Wait(); err != nil {
    return err
}

errgroup cancels the context on first error, so other goroutines stop early — but they must respect ctx.Done().

Security Considerations¶

Panic = process exit = service outage. A malicious payload that triggers a nil dereference inside any goroutine kills the entire process. Defence: recover at the goroutine boundary; better, validate input so the panic does not happen.
Goroutine spawn = DoS vector. If handleRequest spawns N goroutines per request, an attacker controls N. Bound concurrency with a worker pool or semaphore.
Captured loop variable = wrong authorisation. "Authorise all users in this slice in parallel" plus closure capture can authorise the wrong user. Race + capture is a security bug, not just a correctness bug.
Leaked goroutines hold secrets. If a goroutine captures a session token or a decrypted key and never exits, the secret stays in memory after it should have been zeroed.

Performance Tips¶

Goroutine churn has a cost. Spawning 10 000 short-lived goroutines per second is far slower than running a pool of 10 workers that each handle 1 000 jobs.
time.After in a hot loop costs allocations. Reuse a Timer.
sync.Map is slower than map + Mutex for typical workloads. Benchmark before switching.
Atomics are faster than mutexes for single integer state. atomic.Int64 beats mu.Lock(); n++; mu.Unlock().
defer has a per-call cost (~30 ns in Go 1.14+). Negligible normally; in a 1B-iteration loop, measurable.

Best Practices¶

Always answer "how does this goroutine exit?" before writing the body.
Always pass loop variables as parameters; never close over them.
Always wg.Add in the parent, defer wg.Done in the child.
Always defer cancel() next to WithCancel/WithTimeout/WithDeadline.
Always defer t.Stop() after time.NewTicker/time.NewTimer if the lifetime is bounded.
Always make buffered result channels size 1 for "one-shot" patterns to prevent leaks.
Always recover at the goroutine boundary if the body can panic on untrusted input.
Always run go test -race in CI.
Never use time.Sleep for synchronisation.
Never close a channel from the receiving side or from one of many senders without coordination.
Never share a map across goroutines without a Mutex or use sync.Map.
Never mix atomic and non-atomic access to the same variable.

Edge Cases & Pitfalls¶

This entire file is the edge cases section for the goroutines track. A few extras that did not get their own example:

Nil channel — sends and receives on a nil channel block forever. Sometimes used intentionally in select to "disable" a case; if it happens by accident, you have a silent leak.
select with no default and all-nil cases — blocks forever. Easy to reach if every case channel is conditionally niled.
recover outside defer — returns nil. A common mistake when copying defer func() { recover() } patterns.
runtime.Gosched as a fix for races — does nothing. Yielding gives other goroutines a chance to run, which makes the race more likely to manifest, not less.
GOMAXPROCS=1 "makes my code single-threaded" — it does not. Preemption still happens. Races still exist. See 02-vs-os-threads.

Common Mistakes¶

Mistake	Fix
Closure captures loop variable (pre-1.22)	Pass as parameter
`wg.Add(1)` inside the goroutine	Move to the parent before `go`
`wg.Done()` only on success path	`defer wg.Done()` at the top
`time.Sleep` to wait for completion	`WaitGroup` or channel
Unrecovered panic on bad input	`defer recover()` at boundary, validate input upstream
Goroutine sends to unread channel	Buffer of size 1 or always read
Goroutine receives from never-closed channel	Close from a single owner
Concurrent map writes	`sync.Mutex` or `sync.Map`
`close(ch)` twice	Single closer pattern
Send after close	Reorder lifetimes
`time.After` in tight select	Reuse `Timer`
`defer` in tight loop	Extract to function
`WithTimeout` without `cancel()`	`defer cancel()` always
Mutex over network call	Move I/O out of critical section
Atomic write + plain read	Use `atomic.Load*` or `atomic.Int64`
Singleton init via `if ptr == nil`	`sync.Once`
Goroutine outlives request	`context.Context` cancellation
Reusing `WaitGroup` mid-batch	Fresh `WaitGroup` per batch

Common Misconceptions¶

"Captured loop variable is fixed in 1.22, so I do not need parameters." — The fix only applies to for ... range and for i := ...; ...; ... loop variables. Other captured variables (e.g., values inside a for ... range items { x := compute() ... } body) still need explicit handling.

"time.Sleep(10 * time.Millisecond) is long enough on modern machines." — Modern machines are also occasionally slow. CI runners pause. VMs get descheduled. Sleep-based sync is always wrong.

"recover catches everything." — recover only catches panics in the same goroutine, only inside defer. Fatal errors (concurrent map writes, stack overflow) are not recoverable.

"sync.Map is the safe map." — Not a drop-in. Different API, different performance characteristics. For most cases map + Mutex is better.

"atomic is faster than mutex, so use atomics." — For one integer, yes. For multi-field state, no. A mutex around a struct is simpler and correct; atomics need careful layout and ordering reasoning.

"go vet catches the pitfalls." — It catches some (unkeyed composite literals, cancel function not used, lock copy). It does not catch most. You still need review and tests.

"The race detector catches all races." — It catches races that happen during the test run. Races on rarely-exercised paths slip through.

"runtime.NumGoroutine() increasing means a leak." — Sometimes. It can also mean legitimate concurrency. Look at goroutine identity over time, not a snapshot.

Tricky Points¶

Why does the captured loop variable bug print "5 5 5 5 5" specifically?¶

By the time the goroutines actually run, the loop has finished. The variable i holds the post-loop value, which is 5 (the value that failed the i < 5 check). All goroutines, sharing that variable, read 5.

Why does `go vet` complain about the unused `cancel`?¶

The cancel is not just a function — it is the link in a chain of contexts. Discarding it leaves the context-tree resources (timers, child contexts) alive until the parent context is itself cancelled or the deadline passes. go vet warns because this is almost always a bug.

Why is a buffered channel of size 1 the "leak-proof" pattern?¶

The goroutine sends and immediately exits. If no one reads, the channel and its single value hold ~16 bytes of garbage that the GC will collect. The goroutine itself is gone. With unbuffered, the goroutine would block on the send forever — that is the leak.

Why does `sync.WaitGroup` document "Add must happen before Wait"?¶

Wait reads the counter, and if zero, returns. If Add runs after Wait started, the counter snapshot was zero — Wait already returned. The two operations are not synchronised by the WaitGroup itself for "Add when counter is zero."

Why is closing a channel from the receiver always wrong?¶

The sender does not check before sending. If the receiver closes mid-send, the next send panics. The contract is: the owner of the send side closes. Owners can be one of N senders coordinating via WaitGroup, or a separate goroutine.

Test¶

package pitfalls_test

import (
    "context"
    "runtime"
    "sync"
    "sync/atomic"
    "testing"
    "time"
)

// Demonstrates the captured-variable fix.
func TestParameterShadow(t *testing.T) {
    var got [5]int
    var wg sync.WaitGroup
    for i := 0; i < 5; i++ {
        wg.Add(1)
        go func(i int) {
            defer wg.Done()
            got[i] = i
        }(i)
    }
    wg.Wait()
    for i, v := range got {
        if v != i {
            t.Fatalf("index %d: got %d", i, v)
        }
    }
}

// Demonstrates that wg.Add in the parent is required.
func TestAddBeforeGo(t *testing.T) {
    var wg sync.WaitGroup
    var counter int64
    wg.Add(10)
    for i := 0; i < 10; i++ {
        go func() {
            defer wg.Done()
            atomic.AddInt64(&counter, 1)
        }()
    }
    wg.Wait()
    if counter != 10 {
        t.Fatalf("expected 10, got %d", counter)
    }
}

// Demonstrates buffered channel of 1 prevents leak.
func TestBufferedOneShot(t *testing.T) {
    before := runtime.NumGoroutine()
    for i := 0; i < 1000; i++ {
        ch := make(chan int, 1)
        go func() {
            ch <- 42
        }()
        // intentionally do not read
    }
    runtime.GC()
    runtime.Gosched()
    time.Sleep(10 * time.Millisecond)
    after := runtime.NumGoroutine()
    if after-before > 50 {
        t.Fatalf("leaked goroutines: before=%d after=%d", before, after)
    }
}

// Demonstrates defer cancel().
func TestDeferCancel(t *testing.T) {
    ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
    defer cancel()
    select {
    case <-ctx.Done():
    case <-time.After(time.Second):
        t.Fatal("context did not finish")
    }
}

Always run with -race:

go test -race ./...

Tricky Questions¶

Q. What does this print on Go 1.21? On 1.22?

for i := 0; i < 3; i++ {
    go func() { fmt.Println(i) }()
}
time.Sleep(time.Second)

A. 1.21: most often 3 3 3. 1.22: some permutation of 0 1 2. The semantics of the loop variable changed.

Q. Why does this leak?

ch := make(chan int)
go func() { ch <- 42 }()
if condition { return }
fmt.Println(<-ch)

A. When condition is true, the goroutine's send blocks forever. Fix: make(chan int, 1).

Q. What is wrong with this WaitGroup?

var wg sync.WaitGroup
for i := 0; i < 5; i++ {
    go func() {
        wg.Add(1)
        defer wg.Done()
        work()
    }()
}
wg.Wait()

A. Wait may run before any goroutine reaches Add, returning immediately. Add belongs in the parent before go.

Q. Why does this panic intermittently?

ch := make(chan int)
for i := 0; i < 5; i++ {
    go func() {
        defer close(ch)
        ch <- 1
    }()
}

A. Five goroutines each close the same channel. After the first close succeeds, the next close panics. Additionally, sends from other goroutines after the close panic.

Q. Why is this a problem under load?

for {
    select {
    case msg := <-msgs:
        process(msg)
    case <-time.After(time.Second):
        return
    }
}

A. A fresh timer is allocated per iteration. Under high message rates, the heap fills with pending timers. Reuse a single time.NewTimer and Reset.

Q. What does this print?

var counter int64
var wg sync.WaitGroup
for i := 0; i < 1000; i++ {
    wg.Add(1)
    go func() {
        defer wg.Done()
        counter++
    }()
}
wg.Wait()
fmt.Println(counter)

A. Almost always less than 1000, varying every run. counter++ is read-modify-write; concurrent increments lose updates. Fix: atomic.AddInt64(&counter, 1).

Cheat Sheet¶

// Pass loop variable
for _, x := range items {
    go func(x Item) { ... }(x)
}

// WaitGroup
wg.Add(1)               // in parent, before go
go func() {
    defer wg.Done()     // in child, at top, with defer
    work()
}()
wg.Wait()

// Cancellation
ctx, cancel := context.WithTimeout(parent, dur)
defer cancel()

// Timer / Ticker
t := time.NewTicker(interval)
defer t.Stop()
for {
    select {
    case <-ctx.Done(): return
    case <-t.C: tick()
    }
}

// Result channel
errCh := make(chan error, 1)    // size 1 to prevent leak
go func() { errCh <- doWork() }()

// Recover at boundary
go func() {
    defer func() {
        if r := recover(); r != nil { log.Println(r) }
    }()
    risky()
}()

// Singleton
var once sync.Once
once.Do(func() { instance = &Service{} })

// Atomic
var n atomic.Int64
n.Add(1); v := n.Load()

// Single closer for fan-in
go func() { wg.Wait(); close(ch) }()

Self-Assessment Checklist¶

Summary¶

The pitfalls in this file are not exotic. They are the patterns that every Go programmer types out at some point and then has to debug. The bugs all share a structure: a contract between goroutines was implicit, and reality violated the implicit assumption. Captured loop variables assume the closure binds at goroutine-run time; in fact, before 1.22, it binds to a shared slot. wg.Add inside the goroutine assumes the goroutine runs before Wait; in fact, the scheduler may run Wait first. time.Sleep assumes a known upper bound on goroutine duration; in fact, no such bound exists.

The cure is mechanical: pass values as parameters, Add in the parent and Done in the child with defer, replace sleeps with explicit waits, recover at the boundary, buffer one-shot result channels, defer cancel(), defer t.Stop(), single-closer per channel, sync.Once for singletons. Most of these are one-line changes once you have learned to see the shape.

The next step is to drill recognition. Move on to find-bug.md in this subsection, and to the dedicated leak coverage in 07-goroutine-lifecycle-leaks.

What You Can Build¶

With this material internalised, you can:

Code-review someone else's PR for goroutine pitfalls in under five minutes.
Diagnose "tests pass but production is flaky" by enumerating the eight likely shapes.
Fix a leaking long-running service by mechanically walking the goroutine catalogue.
Author CI guard tests using goleak and -race that catch regressions.
Write Go training material for your team using this file as a checklist.

Diagrams & Visual Aids¶

Three families of pitfalls¶

            +----------------+   +----------------+   +----------------+
            |   LIFETIME     |   |   ORDERING     |   |   SHARING      |
            |  (leaks)       |   |  (races with   |   |  (races on     |
            |                |   |   coordination)|   |   state)       |
            +--------+-------+   +--------+-------+   +--------+-------+
                     |                    |                    |
        - leak via unread ch   - wg.Add inside go    - captured loop var
        - leak via no close    - time.Sleep sync     - concurrent map
        - ticker not stopped   - send-before-close   - atomic mixing
        - missing cancel()     - panic in wrong gor. - singleton init race
        - bg gor outlives par. - double close        - WaitGroup reuse
        - time.After in select - send on closed
        - defer in tight loop
        - mutex over syscall

Lifecycle of a leaked goroutine¶

spawned ──> running ──> blocked on send ──> waiting forever
                              ▲
                              │
                       (no receiver ever)

Captured loop variable (pre-1.22)¶

       i=0  ┐
       i=1  │ same address &i, value mutated by loop
       i=2  ├──> goroutines all read &i AFTER loop ends (i=N)
       i=3  │     final value = N
       i=4  ┘

`wg.Add` inside the goroutine: race window¶

parent:  wg.Wait() ── counter is 0, returns immediately ──> next code
child:                        wg.Add(1) ── too late
                              wg.Done()

Single-closer pattern¶

producers  ──┐
producers  ──┤   wg.Wait()
producers  ──┼──> closer goroutine ──> close(ch)
producers  ──┘                          │
                                        ▼
                                    consumer (for v := range ch)

Mutex over syscall¶

goroutine A:  mu.Lock() ── HTTP request ── 5s ── mu.Unlock()
goroutine B:  mu.Lock() ─── BLOCKED for ~5s ───
goroutine C:  mu.Lock() ─── BLOCKED for ~5s ───

Atomic + plain read¶

goroutine A:  atomic.AddInt64(&n, 1)   ── protocol: atomic
goroutine B:  fmt.Println(n)            ── protocol: plain  <-- broken

The protocol breaks. The race detector reports it.

Goroutine Common Pitfalls — Junior Level¶

Table of Contents¶

Introduction¶

Prerequisites¶

Glossary¶

Core Concepts¶

Pitfalls are not bugs in the runtime — they are bugs in the contract¶

Most pitfalls split into three families¶

Pitfalls compound¶

go is a one-way door¶

Real-World Analogies¶

Pitfalls are like loaded but uncocked traps¶

A leaked goroutine is like a tab you forgot to close¶

A captured loop variable is like a passing baton with shared writing on it¶

A wg.Add(1) inside the goroutine is like sending an invitation after the meeting started¶

A time.Sleep for synchronisation is like setting a kitchen timer for "hopefully long enough"¶

Mental Models¶

Model 1: The "exit slip" rule¶

Model 2: Pitfalls live in pairs¶

Model 3: "The race detector is a sieve, not a fence"¶

Model 4: "If you cannot explain ownership, you cannot explain correctness"¶

Pros & Cons¶

Pros of learning pitfalls explicitly¶

Cons / risks¶

Use Cases¶

Code Examples¶

Example 1: Captured loop variable (the most common Go concurrency bug)¶

Example 2: wg.Add(1) inside the goroutine¶

Example 3: Forgetting wg.Done(), or forgetting defer¶

Example 4: time.Sleep as synchronisation¶

Example 5: Unrecovered panic kills the program¶

Example 6: Goroutine leak — sender blocked forever¶

Example 7: Goroutine leak — receiver on never-closed channel¶

Example 8: Goroutine leak — ticker not stopped¶

Example 9: Concurrent map access — fatal error, not a race¶

Example 10: Capturing pointers across goroutines¶

Example 11: Double-close of a channel¶

Example 12: Send on closed channel¶

Example 13: Closing a receive-only channel¶

Example 14: time.After in a select loop¶

Example 15: defer inside a tight loop¶

Example 16: Forgetting to call cancel()¶

Example 17: Mutex held over a syscall or unbounded operation¶

Example 18: Mixing atomic and non-atomic access¶

Example 19: Unsynchronised init of a singleton¶

Example 20: Background goroutine outliving its parent¶

Example 21: WaitGroup reused across goroutines¶

Coding Patterns¶

Pattern 1: The "exit slip" comment¶

Pattern 2: Pass-by-parameter, capture-by-nothing¶

Pattern 3: defer cancel() next to WithCancel/WithTimeout¶

Pattern 4: Single closer, multiple senders¶

Pattern 5: Buffered result channel of size 1 for "one-shot" goroutines¶

Clean Code¶

Product Use / Feature¶

Error Handling¶

Trap 1: errors lost in a goroutine¶

Trap 2: a panic that should have been an error¶

Pattern: errgroup for "all or first failure"¶

Security Considerations¶

Performance Tips¶

Best Practices¶

Edge Cases & Pitfalls¶

Common Mistakes¶

Common Misconceptions¶

Tricky Points¶

Why does the captured loop variable bug print "5 5 5 5 5" specifically?¶

Why does go vet complain about the unused cancel?¶

Why is a buffered channel of size 1 the "leak-proof" pattern?¶

Why does sync.WaitGroup document "Add must happen before Wait"?¶

Why is closing a channel from the receiver always wrong?¶

Test¶

Tricky Questions¶

Cheat Sheet¶

Self-Assessment Checklist¶

Summary¶

What You Can Build¶

Further Reading¶

Related Topics¶

Diagrams & Visual Aids¶

`go` is a one-way door¶

A `wg.Add(1)` inside the goroutine is like sending an invitation after the meeting started¶

A `time.Sleep` for synchronisation is like setting a kitchen timer for "hopefully long enough"¶

Example 2: `wg.Add(1)` inside the goroutine¶

Example 3: Forgetting `wg.Done()`, or forgetting `defer`¶

Example 4: `time.Sleep` as synchronisation¶

Example 14: `time.After` in a select loop¶

Example 15: `defer` inside a tight loop¶

Example 16: Forgetting to call `cancel()`¶

Pattern 3: `defer cancel()` next to `WithCancel`/`WithTimeout`¶

Pattern: `errgroup` for "all or first failure"¶

Why does `go vet` complain about the unused `cancel`?¶

Why does `sync.WaitGroup` document "Add must happen before Wait"?¶

`wg.Add` inside the goroutine: race window¶