Worker Pools — Find the Bug¶
Each exercise contains a worker-pool implementation with one or more bugs. Read the code, predict what happens, then check the diagnosis. The fix is given last; resist looking until you've identified the bug.
Table of Contents¶
- How to Use This File
- Exercise 1 — The Hung Pool
- Exercise 2 — The Vanishing Workers
- Exercise 3 — Send on Closed Channel
- Exercise 4 — The Race Condition
- Exercise 5 — The Wait That Never Returns
- Exercise 6 — The Panic Cascade
- Exercise 7 — Cancellation That Doesn't
- Exercise 8 — The Lost Results
- Exercise 9 — Double Close
- Exercise 10 — Pool of Zero
- Exercise 11 — Deadlock on Full Results
- Exercise 12 — Worker Exits Too Early
- Exercise 13 — Counter Off by One
- Bug Patterns Summary
How to Use This File¶
- Read the code carefully. Don't run it yet.
- Predict: what does this program do? Crash? Hang? Wrong output? Race?
- Read the Symptom section to verify your prediction.
- Read the Diagnosis to understand why.
- Read the Fix only after you've thought about how you'd fix it.
Run each fixed snippet with go run -race.
Exercise 1 — The Hung Pool¶
package main
import (
"fmt"
"sync"
)
func main() {
jobs := make(chan int, 10)
results := make(chan int, 10)
var wg sync.WaitGroup
for w := 0; w < 3; w++ {
wg.Add(1)
go func() {
defer wg.Done()
for j := range jobs {
results <- j * j
}
}()
}
for i := 1; i <= 5; i++ {
jobs <- i
}
wg.Wait()
for r := range results {
fmt.Println(r)
}
}
Symptom: The program hangs forever. wg.Wait() never returns. Eventually Go reports: fatal error: all goroutines are asleep - deadlock!
Diagnosis: close(jobs) is missing. Workers stay in for j := range jobs forever, blocked on receive. wg.Wait() waits for workers that will never call Done().
Fix: Add close(jobs) after sending all jobs:
for i := 1; i <= 5; i++ {
jobs <- i
}
close(jobs) // ← added
go func() { wg.Wait(); close(results) }() // ← also need this so the consumer's range exits
Exercise 2 — The Vanishing Workers¶
package main
import (
"fmt"
"sync"
)
func main() {
jobs := make(chan int)
var wg sync.WaitGroup
for w := 0; w < 4; w++ {
go func() {
wg.Add(1)
defer wg.Done()
for j := range jobs {
fmt.Println(j)
}
}()
}
for i := 1; i <= 10; i++ {
jobs <- i
}
close(jobs)
wg.Wait()
}
Symptom: Sometimes the program hangs. Sometimes it works. Behaviour depends on goroutine scheduling.
Diagnosis: wg.Add(1) is inside the goroutine. Between for w := 0; ... and the moment any goroutine calls Add, wg.Wait() could see counter = 0 and return early. Race.
Fix: Move wg.Add(1) before go func():
for w := 0; w < 4; w++ {
wg.Add(1) // ← moved out
go func() {
defer wg.Done()
for j := range jobs {
fmt.Println(j)
}
}()
}
Exercise 3 — Send on Closed Channel¶
package main
import (
"fmt"
"sync"
)
func main() {
jobs := make(chan int)
var wg sync.WaitGroup
for w := 0; w < 4; w++ {
wg.Add(1)
go func(id int) {
defer wg.Done()
for j := range jobs {
fmt.Printf("worker %d got %d\n", id, j)
if j == 5 {
close(jobs) // shouldn't be done from here
}
}
}(w)
}
for i := 1; i <= 10; i++ {
jobs <- i
}
close(jobs)
wg.Wait()
}
Symptom: Crashes with panic: send on closed channel. Sometimes runs to completion if the timing is unlucky.
Diagnosis: Two violations: (1) a worker (a receiver) closes the jobs channel, and (2) the main goroutine then tries to close it again. Either the producer's jobs <- i panics on a closed channel, or the second close(jobs) panics.
Fix: Workers never close. The producer is the sole closer:
go func(id int) {
defer wg.Done()
for j := range jobs {
fmt.Printf("worker %d got %d\n", id, j)
// no close here
}
}(w)
Exercise 4 — The Race Condition¶
package main
import (
"fmt"
"sync"
)
func main() {
jobs := make(chan int, 100)
var wg sync.WaitGroup
var counter int
for w := 0; w < 8; w++ {
wg.Add(1)
go func() {
defer wg.Done()
for range jobs {
counter++
}
}()
}
for i := 0; i < 10000; i++ {
jobs <- i
}
close(jobs)
wg.Wait()
fmt.Println("counter:", counter)
}
Symptom: With go run -race: data race. Without race detector: counter < 10000 (e.g., 9847).
Diagnosis: counter++ is a read-modify-write across 8 goroutines without synchronisation. Lost increments.
Fix: atomic.Int64:
import "sync/atomic"
var counter atomic.Int64
go func() {
defer wg.Done()
for range jobs {
counter.Add(1)
}
}()
fmt.Println("counter:", counter.Load())
Exercise 5 — The Wait That Never Returns¶
package main
import (
"fmt"
"sync"
)
func main() {
jobs := make(chan int)
results := make(chan int)
var wg sync.WaitGroup
for w := 0; w < 3; w++ {
wg.Add(1)
go func() {
defer wg.Done()
for j := range jobs {
results <- j * 2
}
}()
}
go func() {
for i := 1; i <= 100; i++ {
jobs <- i
}
close(jobs)
}()
go func() { wg.Wait(); close(results) }()
// forgot to drain results
fmt.Println("done")
}
Symptom: Program prints "done" and exits. But run it under -race or in production and goroutines linger. With runtime.NumGoroutine() you'd see workers blocked.
Diagnosis: No consumer for results. Workers block on results <- j * 2. None of them ever calls Done(). The closer goroutine blocks on wg.Wait() forever. main doesn't notice because main doesn't wait on anything.
Fix: Always drain results, and make main wait until everything is done:
done := make(chan struct{})
go func() {
for r := range results {
_ = r // or process
}
close(done)
}()
<-done
fmt.Println("done")
Exercise 6 — The Panic Cascade¶
package main
import (
"fmt"
"sync"
)
func process(j int) int {
if j == 7 {
panic("seven is bad")
}
return j * j
}
func main() {
jobs := make(chan int, 10)
results := make(chan int, 10)
var wg sync.WaitGroup
for w := 0; w < 3; w++ {
wg.Add(1)
go func() {
defer wg.Done()
for j := range jobs {
results <- process(j)
}
}()
}
for i := 1; i <= 10; i++ {
jobs <- i
}
close(jobs)
go func() { wg.Wait(); close(results) }()
for r := range results {
fmt.Println(r)
}
}
Symptom: Program crashes: panic: seven is bad. Stack shows the worker goroutine.
Diagnosis: No recover in the worker. One bad job kills the whole program.
Fix: Recover at the top of the worker:
go func() {
defer wg.Done()
defer func() {
if r := recover(); r != nil {
fmt.Printf("worker recovered: %v\n", r)
}
}()
for j := range jobs {
results <- process(j)
}
}()
Note: a recovered worker exits early. The pool effectively shrinks. To keep the pool size, wrap each job in a recover or restart the worker.
Exercise 7 — Cancellation That Doesn't¶
package main
import (
"context"
"fmt"
"sync"
"time"
)
func main() {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
jobs := make(chan int, 100)
var wg sync.WaitGroup
for w := 0; w < 3; w++ {
wg.Add(1)
go func() {
defer wg.Done()
for j := range jobs {
time.Sleep(100 * time.Millisecond)
fmt.Println("processed", j)
}
}()
}
for i := 0; i < 100; i++ {
jobs <- i
}
close(jobs)
time.Sleep(150 * time.Millisecond)
cancel()
wg.Wait()
}
Symptom: All 100 jobs are processed despite cancel(). Cancellation has no effect.
Diagnosis: Workers never check ctx.Done(). The cancel is ignored. (cancel() only fires ctx.Done(); goroutines must observe it.)
Fix: Check ctx inside the worker:
go func() {
defer wg.Done()
for {
select {
case <-ctx.Done():
return
case j, ok := <-jobs:
if !ok {
return
}
time.Sleep(100 * time.Millisecond)
fmt.Println("processed", j)
}
}
}()
Also pass ctx into long-running parts (e.g., select { case <-time.After(...): case <-ctx.Done(): } instead of time.Sleep).
Exercise 8 — The Lost Results¶
package main
import (
"fmt"
"sync"
)
func main() {
jobs := make(chan int, 5)
results := make(chan int) // unbuffered
var wg sync.WaitGroup
for w := 0; w < 3; w++ {
wg.Add(1)
go func() {
defer wg.Done()
for j := range jobs {
results <- j * j
}
}()
}
for i := 1; i <= 5; i++ {
jobs <- i
}
close(jobs)
// Read only first 3 results, then exit
for i := 0; i < 3; i++ {
fmt.Println(<-results)
}
wg.Wait() // hangs
}
Symptom: Prints 3 results, then hangs forever.
Diagnosis: Workers process 5 jobs but consumer reads only 3 results. Workers 4 and 5 block on results <- .... They never call Done(). wg.Wait() hangs.
Fix: Always drain all results, or use a buffered results channel with capacity ≥ jobs:
The right pattern is to range over results and have a closer goroutine close it after wg.Wait().
Exercise 9 — Double Close¶
package main
import (
"fmt"
"sync"
)
func produce(jobs chan<- int, items []int, wg *sync.WaitGroup) {
defer wg.Done()
defer close(jobs)
for _, x := range items {
jobs <- x
}
}
func main() {
jobs := make(chan int, 10)
var pwg sync.WaitGroup
pwg.Add(2)
go produce(jobs, []int{1, 2, 3}, &pwg)
go produce(jobs, []int{4, 5, 6}, &pwg)
var wwg sync.WaitGroup
wwg.Add(1)
go func() {
defer wwg.Done()
for j := range jobs {
fmt.Println(j)
}
}()
pwg.Wait()
wwg.Wait()
}
Symptom: Crashes: panic: close of closed channel.
Diagnosis: Two producers, each calling defer close(jobs). The second close panics.
Fix: Use a single closer pattern with sync.Once, or have one orchestrator close after both producers finish:
Remove the defer close(jobs) inside produce.
Exercise 10 — Pool of Zero¶
package main
import (
"fmt"
"sync"
)
func runPool(n int, items []int) {
jobs := make(chan int)
var wg sync.WaitGroup
for w := 0; w < n; w++ {
wg.Add(1)
go func() {
defer wg.Done()
for j := range jobs {
fmt.Println(j)
}
}()
}
for _, x := range items {
jobs <- x
}
close(jobs)
wg.Wait()
}
func main() {
n := 0 // bug: came from a config
runPool(n, []int{1, 2, 3})
}
Symptom: Hangs forever on jobs <- 1.
Diagnosis: Pool of size 0. No workers exist to receive from jobs. The first send blocks forever.
Fix: Validate at the top:
func runPool(n int, items []int) {
if n < 1 {
panic("pool size must be >= 1")
// or: return an error
}
// ...
}
In production code, default-or-error rather than panic. Never accept "0 workers" as a valid runtime value.
Exercise 11 — Deadlock on Full Results¶
package main
import (
"fmt"
"sync"
)
func main() {
jobs := make(chan int, 10)
results := make(chan int) // unbuffered
var wg sync.WaitGroup
for w := 0; w < 4; w++ {
wg.Add(1)
go func() {
defer wg.Done()
for j := range jobs {
results <- j * 2
}
}()
}
for i := 0; i < 10; i++ {
jobs <- i
}
close(jobs)
go func() { wg.Wait(); close(results) }()
var sum int
for r := range results {
sum += r
}
fmt.Println("sum:", sum)
}
Symptom: Hmm — looks fine at first. But what if the consumer (main) is slow?
Diagnosis: This actually works because main consumes results in a tight loop. But add any work in the consumer and workers will block on results <- .... With unbuffered results and a slow consumer, the pool throttles to consumer speed. That may be desired (backpressure) or wrong (you wanted parallelism).
The bug to think about: if process is fast and consumer is slow, throughput equals consumer speed, not pool throughput. Workers are "wasted" most of the time. The fix is either:
- Buffer the results channel:
results := make(chan int, 100). - Speed up the consumer (batch writes, async I/O).
- Drop results that exceed consumer capacity.
Choice depends on whether backpressure is the goal.
Exercise 12 — Worker Exits Too Early¶
package main
import (
"context"
"fmt"
"sync"
)
func worker(ctx context.Context, jobs <-chan int, results chan<- int, wg *sync.WaitGroup) {
defer wg.Done()
for {
select {
case <-ctx.Done():
return
case j := <-jobs:
results <- j * 2
}
}
}
func main() {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
jobs := make(chan int, 5)
results := make(chan int, 5)
var wg sync.WaitGroup
wg.Add(2)
go worker(ctx, jobs, results, &wg)
go worker(ctx, jobs, results, &wg)
for i := 1; i <= 3; i++ {
jobs <- i
}
close(jobs)
go func() { wg.Wait(); close(results) }()
for r := range results {
fmt.Println(r)
}
}
Symptom: Sometimes prints 6, 4, 2 (good). Sometimes prints fewer numbers, or 0 zeros mixed in.
Diagnosis: When jobs is closed and drained, case j := <-jobs returns the zero value (0) immediately; the worker treats it as a real job. Worse: it loops on the closed channel forever, sending 0s into results until results fills.
Fix: Use the comma-ok form:
Exercise 13 — Counter Off by One¶
package main
import (
"fmt"
"sync"
)
func main() {
jobs := make(chan int, 10)
var wg sync.WaitGroup
wg.Add(4)
for w := 0; w < 5; w++ { // bug: spawning 5, but Add(4)
go func(id int) {
defer wg.Done()
for j := range jobs {
fmt.Printf("worker %d job %d\n", id, j)
}
}(w)
}
for i := 0; i < 10; i++ {
jobs <- i
}
close(jobs)
wg.Wait()
}
Symptom: panic: sync: negative WaitGroup counter.
Diagnosis: Add(4) but spawned 5 workers. The 5th worker's Done() makes the counter -1. Panic.
Fix: Add inside the loop, matched 1:1 with each goroutine:
Or compute correctly: wg.Add(5). The single-Add-per-goroutine pattern is more robust.
Bug Patterns Summary¶
| Pattern | Symptom | Defensive habit |
|---|---|---|
Forgot close(jobs) | Hang | defer close(jobs) in producer |
Add inside goroutine | Race; intermittent hang | Add before go func() |
| Worker closes channel | Panic on send | Closers and senders are separate; receivers don't close |
| Multi-producer, multi-close | Panic on second close | Single closer; use sync.Once or orchestrator close |
| No recover in worker | Crash on bad job | defer recover() at top |
| No ctx check | Cancel ignored | Select on ctx.Done() in worker |
| Slow/missing consumer | Workers block; pool hangs | Always range results; buffered results |
| Add count mismatch | Negative counter panic | One Add(1) per go func() |
| Pool size 0 | Hang | Validate n >= 1 at entry |
case j := <-jobs: on closed | Spurious zero values | Use j, ok := <-jobs; if !ok { return } |
| Shared counter without sync | Lost updates; race | Atomic or mutex |
| Result channel too small + slow consumer | Worker stuck | Bound queue + reject, or larger buffer |
| Worker that ignores ctx during long compute | Cancellation latency | Pass ctx to all I/O; check periodically in tight loops |
A bug-free worker pool is mostly about not making any of the 13 mistakes above. The pattern itself is small; the discipline is what makes it production-ready.