Backpressure — Find the Bug¶
This page contains 12 buggy Go snippets, each with a backpressure-related defect. For each:
- Read the code carefully.
- Spot the bug.
- Describe the failure mode.
- Compare with the reference answer below.
The bugs range from junior to senior level. Do not skim the answers — try to spot each before reading.
Bug 1: The unbounded slice queue¶
package main
import (
"sync"
)
type Queue struct {
mu sync.Mutex
items []string
}
func (q *Queue) Push(x string) {
q.mu.Lock()
q.items = append(q.items, x)
q.mu.Unlock()
}
func (q *Queue) Pop() (string, bool) {
q.mu.Lock()
defer q.mu.Unlock()
if len(q.items) == 0 {
return "", false
}
x := q.items[0]
q.items = q.items[1:]
return x, true
}
func main() {
q := &Queue{}
go func() {
for {
q.Push("hello")
}
}()
// ... slow consumer elsewhere
}
Answer¶
The queue is unbounded. append grows forever; the heap balloons until OOM. There is no backpressure signal.
Fix: replace with make(chan string, N) and add either blocking, non-blocking, or context-aware semantics.
Bug 2: Hidden unboundedness via a huge buffer¶
func main() {
events := make(chan Event, 100_000_000)
go process(events)
for {
events <- newEvent() // never blocks in practice
}
}
Answer¶
A 100M-slot channel is unbounded in practice. Under sustained load, the buffer fills with millions of events, each one carrying memory. The process OOMs long before the buffer is technically full.
Fix: small buffer (100–10,000) plus a deliberate policy (drop, reject, or sustained block).
Bug 3: Spawn-per-request without a limit¶
func handler(w http.ResponseWriter, r *http.Request) {
go func() {
doExpensiveWork(r)
}()
fmt.Fprintln(w, "accepted")
}
Answer¶
Each request spawns an unbounded goroutine. Under load, goroutine count climbs without limit. Memory grows; eventually OOM.
Fix: use a bounded worker pool. Submit the work; if full, return 503.
Bug 4: Forgotten close¶
func produce() {
ch := make(chan int, 10)
go consume(ch)
for i := 0; i < 100; i++ {
ch <- i
}
// (no close)
}
func consume(ch <-chan int) {
for x := range ch {
process(x)
}
}
Answer¶
The consumer's for x := range ch waits forever after the producer finishes. The goroutine leaks; the channel is never closed.
Fix: close(ch) after the loop. Or use a context to signal completion.
Bug 5: Double-close panic¶
type Stream struct {
ch chan int
closed bool
mu sync.Mutex
}
func (s *Stream) Close() {
s.mu.Lock()
if !s.closed {
close(s.ch)
s.closed = true
}
s.mu.Unlock()
}
Answer¶
This looks safe but there is a race: between the check !s.closed and close(s.ch), another goroutine could observe the same state and also try to close. The mutex makes it safe — but only because of the mutex. If you used atomic without Lock, double-close would panic.
The bigger bug: a goroutine that sends on s.ch after Close() returns will panic. The closed flag is checked inside the struct, but external callers do not check it.
Fix: sync.Once for close, and a separate "is closed?" check used by senders before they send.
Bug 6: Blocking send in a handler¶
var jobCh = make(chan Job, 100)
func handler(w http.ResponseWriter, r *http.Request) {
j := parseJob(r)
jobCh <- j // blocks if channel full
fmt.Fprintln(w, "queued")
}
Answer¶
When the channel is full, the handler blocks. Other requests still arrive at the server. Handler goroutines accumulate. The system slowly leaks goroutines until OOM.
Fix: non-blocking send with select default (drop) or context-bound send (return 503 on timeout).
Bug 7: Drop without metric¶
Answer¶
Dropping is fine when it is the right policy. But without a counter, operators have no signal that drops are happening. The system silently loses data.
Fix: increment an atomic counter; log at low rate.
default:
atomic.AddUint64(&drops, 1)
if atomic.LoadUint64(&drops)%1000 == 0 {
log.Printf("dropped %d events total", drops)
}
Bug 8: Race between len and act¶
Answer¶
Between len(ch) < cap(ch) and the send, another goroutine could fill the channel. The send then blocks. The function does not behave as "non-blocking check-then-send."
Fix: use select with default for an atomic non-blocking send.
Bug 9: Missing context check in worker¶
Answer¶
The worker has no way to be cancelled. If doWork is long, shutting down requires waiting for every in-flight job. Worse, if doWork itself blocks forever, the worker hangs.
Fix: accept a context and check ctx.Err() between work steps. Pass the context to doWork so it can be cancelled too.
func worker(ctx context.Context, jobs <-chan Job) {
for {
select {
case j, ok := <-jobs:
if !ok { return }
doWork(ctx, j)
case <-ctx.Done():
return
}
}
}
Bug 10: Sending to a closed channel via slow shutdown¶
func (p *Pool) Submit(j Job) {
p.jobs <- j
}
func (p *Pool) Shutdown() {
close(p.jobs)
}
// Caller:
go p.Submit(j1)
p.Shutdown()
go p.Submit(j2) // may panic
Answer¶
After Shutdown, any concurrent Submit may try to send on a closed channel and panic. The lifecycle is not protected.
Fix: add an atomic "closed" flag and check it in Submit. Use sync.Once for close.
func (p *Pool) Submit(j Job) error {
if p.closed.Load() { return ErrClosed }
p.jobs <- j
return nil
}
func (p *Pool) Shutdown() {
p.once.Do(func() {
p.closed.Store(true)
close(p.jobs)
})
}
Note: there is still a race — a Submit can check closed, find false, then close happens before the send. Mitigate with a select and a closed-detection channel:
func (p *Pool) Submit(j Job) error {
select {
case <-p.done:
return ErrClosed
case p.jobs <- j:
return nil
}
}
Where p.done is closed in Shutdown before closing p.jobs.
Bug 11: Naive retry that amplifies overload¶
func Send(ctx context.Context, msg Msg) error {
for {
err := client.Do(ctx, msg)
if err == nil { return nil }
if !isTransient(err) { return err }
time.Sleep(10 * time.Millisecond)
}
}
Answer¶
Two bugs. (1) Infinite retry with no maximum. (2) Fixed delay with no jitter. A transient outage triggers many clients to retry every 10 ms in sync — amplifying the load by 100×.
Fix: cap retries, use exponential backoff with jitter, and honour ctx.Done.
for i := 0; i < maxRetries; i++ {
err := client.Do(ctx, msg)
if err == nil { return nil }
if !isTransient(err) { return err }
delay := baseDelay * time.Duration(1<<i)
jitter := time.Duration(rand.Int63n(int64(delay)/2))
select {
case <-time.After(delay + jitter):
case <-ctx.Done(): return ctx.Err()
}
}
return errors.New("retries exhausted")
Bug 12: Misplaced defer releases semaphore early¶
func handler(w http.ResponseWriter, r *http.Request) {
sem <- struct{}{}
defer func() { <-sem }()
go expensiveWork(r) // continues after handler returns
fmt.Fprintln(w, "queued")
}
Answer¶
The semaphore is released when the handler returns. But expensiveWork continues to run in a goroutine. The slot is freed but the work is still in flight. New requests can grab the slot while the previous work is still consuming resources.
Fix: either do the work synchronously in the handler (and let backpressure work as designed), or move work to a worker pool that has its own admission. Do not mix "semaphore in handler" with "work in goroutine."
Bonus Bugs¶
Bug 13: Wrong direction of close¶
The consumer should not close the channel; the sender should. Closing from the consumer means future sends panic. Sometimes the sender is still active; this is a clear bug.
Fix: producer closes; consumer just reads.
Bug 14: Unbounded inner buffer¶
type Pool struct {
jobs chan Job
}
func (p *Pool) Submit(j Job) {
select {
case p.jobs <- j:
default:
go func() { p.jobs <- j }() // "make room" by spawning more
}
}
The "fix" is worse than the disease. Spawning a goroutine that blocks on the full channel is just hiding the unboundedness in goroutine count. Memory still grows.
Fix: drop or reject; do not pretend overflow does not exist.
Reflection¶
If you missed many bugs, the most common pattern is: the bug is at the boundary. Look at where data enters or leaves a goroutine. The bug usually lives in:
- The send (or receive) operation.
- The buffer size.
- The shutdown path.
- The handler's interaction with workers.
- The retry/backoff logic.
Backpressure bugs cluster at these seams. Train your eyes to look there first.
Code Review Checklist for Backpressure Bugs¶
When reviewing PRs, flag each of these:
- Unbounded slices used as queues.
- Huge channel buffers (> 10,000) without justification.
-
go func()in HTTP handlers without limits. -
for range chwithout context cancellation. -
selectcases with sends but nodefaultorctx.Done()— unless blocking is intentional. -
close(ch)from a non-sole-owner. - Counters incremented without ever being read.
- Drop / reject branches without metrics.
- Retries without exponential backoff and jitter.
- Retries without a maximum.
- Per-request resource acquisition outside the request goroutine.
A 5-minute review with this checklist prevents months of incidents.
Closing¶
Backpressure bugs are insidious because they hide until production. The buggy code "works" — until the buffer fills. Drill these patterns until you spot them in seconds.
Bug 15: Hedged request that doubles load on overloaded downstream¶
func hedgedGet(ctx context.Context, fns []func(context.Context) ([]byte, error)) ([]byte, error) {
out := make(chan []byte, len(fns))
for _, fn := range fns {
fn := fn
go func() {
d, err := fn(ctx)
if err == nil { out <- d }
}()
}
return <-out, nil
}
Answer¶
This sends all requests immediately, not hedged. There is no delay between starts. When the downstream is overloaded, sending N parallel duplicates makes it worse.
Fix: Send the first request; if no response within delay, send the second; etc. Cancel losers.
Bug 16: Sleep-based "draining"¶
Answer¶
Sleep is not synchronisation. If work takes longer than 30s, it is killed mid-flight. If work takes shorter, the shutdown wastes time. There is no signal "are we done?"
Fix: sync.WaitGroup for in-flight work. Close waits on the WaitGroup with a context timeout.
Bug 17: Drop-oldest that spins¶
Answer¶
Under heavy concurrent contention, the <-q.ch might pop an item just popped by another goroutine, then both retry. The loop can spin without progress.
Fix: acquire a mutex around the operation, or use a single-slot semaphore to serialise drop-oldest:
func (q *Queue) Push(x int) {
select {
case q.ch <- x:
default:
select {
case <-q.ch: // drop one
default:
}
q.ch <- x // now there is room (probably); but blocks if many concurrent pushes
}
}
For high concurrency, use a mutex.
Bug 18: Adaptive limiter that never escapes overload¶
Answer¶
Two issues. (1) Not thread-safe. (2) Halving without a floor: limit can reach 0. After that, no requests succeed; no observations happen; the limit never recovers.
Fix: mutex, minimum limit floor, growth on success.
Bug 19: Buffer sized larger than worker pool can drain in SLO¶
Answer¶
A queue depth of 10,000 means the last job in queue waits 10000/8 × 100ms = 125 seconds. p99 latency is unbounded under load. The SLO is violated as soon as the queue has more than ~16 items.
Fix: size buffer to (SLO - per-job) × workers / per-job = (200-100) × 8 / 100 = 8. Set buffer to 8. Beyond that, reject.
Bug 20: Context inherits parent but not deadline¶
func process(parent context.Context, j Job) error {
ctx := context.Background() // new root context!
return doWork(ctx, j)
}
Answer¶
The function creates a new root context instead of deriving from parent. The parent's deadline does not propagate. Even if the caller cancels, doWork runs forever.
Fix: ctx := parent, or derive with context.WithTimeout(parent, ...).
Final Notes¶
The bugs in this page are not theoretical. Every one has shipped in real Go code at some point. Reading them is a useful exercise; spotting them in unfamiliar code is the real skill.
When reviewing code with concurrency, develop a "smell check":
- Where is data buffered? Is the buffer bounded?
- Where does a goroutine block? Can it ever wake up?
- Where does work cross a boundary? Is there admission control?
- Where is
closecalled? Is it from a single owner? - Where does the system shut down? Does it drain?
Five questions, applied consistently, catch the majority of backpressure bugs.