Goroutine Best Practices — Optimize¶
Each section presents working-but-poorly-disciplined Go code and rewrites it to production grade. The point is not to make it run faster but to make it correct, observable, and operable. The rewrites apply the twelve rules from junior consistently.
How to use this file¶
- Read the "Before" code.
- List which rules it violates and the operational risks they create.
- Sketch a rewrite in your head.
- Read the "After" code.
- Compare. Note which rules were applied and how.
Optimization 1: Fan-out with hand-rolled WaitGroup+chan error¶
Before¶
func fetchAll(urls []string) ([]Response, error) {
var wg sync.WaitGroup
results := make([]Response, len(urls))
errCh := make(chan error, len(urls))
for i, url := range urls {
wg.Add(1)
go func(i int, url string) {
defer wg.Done()
r, err := http.Get(url)
if err != nil {
errCh <- err
return
}
defer r.Body.Close()
results[i] = parseResponse(r)
}(i, url)
}
wg.Wait()
close(errCh)
for err := range errCh {
if err != nil {
return nil, err
}
}
return results, nil
}
Violations¶
- No
context.Context(Rule 4). - No bounded concurrency — 1000 URLs spawn 1000 goroutines (Rule 10).
- Hand-rolled coordination where
errgroupwould do (Rule 6). - First error doesn't cancel peers; they all run.
- No
recoveron the goroutines (Rule 5). http.Getdoes not respect any timeout; can hang forever.
After¶
func fetchAll(ctx context.Context, urls []string) ([]Response, error) {
results := make([]Response, len(urls))
g, ctx := errgroup.WithContext(ctx)
g.SetLimit(16)
for i, url := range urls {
i, url := i, url
g.Go(func() (err error) {
defer func() {
if r := recover(); r != nil {
err = fmt.Errorf("panic fetching %s: %v", url, r)
}
}()
req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
if err != nil {
return err
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
return err
}
defer resp.Body.Close()
results[i] = parseResponse(resp)
return nil
})
}
if err := g.Wait(); err != nil {
return nil, err
}
return results, nil
}
Rules applied¶
- 4:
ctxthreaded; HTTP request respects it. - 5: recover at goroutine boundary, converted to error.
- 6:
errgroupreplaces hand-rolled coordination. - 10:
SetLimit(16)bounds concurrency. - 1: clear exit story — each goroutine returns when the HTTP request completes or
ctxcancels.
Observable benefits¶
- First error cancels peers (saves work and downstream load).
- Shutdown via
ctxis immediate. - Concurrency bounded; doesn't OOM on 100 000 URLs.
- Panics surface as errors, not process crashes.
Optimization 2: Worker pool reading from a queue¶
Before¶
func consumeQueue(q Queue) {
for {
msg, err := q.Receive()
if err != nil {
log.Println("recv:", err)
continue
}
go process(msg)
}
}
Violations¶
- No exit (Rule 1).
- No context (Rule 4).
- Unbounded goroutines per message (Rule 10).
- No recover (Rule 5).
- Errors from
processare silently lost.
After¶
func consumeQueue(ctx context.Context, q Queue, workers int) error {
msgs := make(chan Msg, workers)
g, ctx := errgroup.WithContext(ctx)
// Producer: reads from q, pushes to msgs.
g.Go(func() error {
defer close(msgs)
for {
select {
case <-ctx.Done():
return ctx.Err()
default:
}
msg, err := q.Receive(ctx)
if err != nil {
if ctx.Err() != nil {
return ctx.Err()
}
return fmt.Errorf("receive: %w", err)
}
select {
case msgs <- msg:
case <-ctx.Done():
return ctx.Err()
}
}
})
// Workers: read from msgs.
for i := 0; i < workers; i++ {
i := i
g.Go(func() error {
for {
select {
case <-ctx.Done():
return ctx.Err()
case msg, ok := <-msgs:
if !ok {
return nil
}
if err := processWithRecover(ctx, i, msg); err != nil {
log.Printf("worker %d: %v", i, err)
}
}
}
})
}
return g.Wait()
}
func processWithRecover(ctx context.Context, id int, msg Msg) (err error) {
defer func() {
if r := recover(); r != nil {
err = fmt.Errorf("worker %d panic: %v", id, r)
}
}()
return process(ctx, msg)
}
Rules applied¶
- 1: producer and workers each exit on
ctx.Done()or channel close. - 4:
ctxthroughout. - 5:
processWithRecoverboundary. - 6:
errgroup. - 10: fixed
workerscount bounds concurrency.
Observable benefits¶
- Bounded memory under load.
- Graceful shutdown: cancel
ctx, producer stops pulling, workers drainmsgs, all exit. - Per-worker error logging; a panic in one message doesn't crash the rest.
Optimization 3: Background ticker¶
Before¶
Violations¶
- No exit story (Rule 1).
- No context (Rule 4).
- No recover (Rule 5).
time.Sleepfor the loop is technically a ticker, but doesn't compose with cancellation.
After¶
func runFlusher(ctx context.Context, interval time.Duration) error {
defer func() {
if r := recover(); r != nil {
log.Printf("flusher panic: %v\n%s", r, debug.Stack())
}
}()
ticker := time.NewTicker(interval)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return ctx.Err()
case <-ticker.C:
if err := flush(ctx); err != nil {
log.Printf("flush: %v", err)
}
}
}
}
// in main:
g.Go(func() error { return runFlusher(ctx, time.Second) })
Rules applied¶
- 1: returns when
ctxcancels. - 4: context threaded;
flushaccepts it. - 5: recover at the loop boundary.
- 8: ticker, not sleep, so it composes with
select.
Observable benefits¶
- On SIGTERM, the flusher stops within one tick interval at most.
- Panics don't crash the service.
Optimization 4: HTTP handler spawning background work¶
Before¶
func handler(w http.ResponseWriter, r *http.Request) {
body, _ := io.ReadAll(r.Body)
go saveToS3(body)
w.WriteHeader(202)
}
Violations¶
- No exit story (Rule 1) —
saveToS3may run after the server shuts down. - No context (Rule 4) — request context isn't threaded;
saveToS3can't know if the request was abandoned. - No recover (Rule 5).
- Unbounded spawning per request (Rule 10).
- The detached goroutine outlives the handler; nothing tracks completion.
After¶
type Server struct {
saver *BackgroundSaver
}
type BackgroundSaver struct {
sem chan struct{}
wg sync.WaitGroup
}
func NewBackgroundSaver(concurrency int) *BackgroundSaver {
return &BackgroundSaver{
sem: make(chan struct{}, concurrency),
}
}
func (b *BackgroundSaver) Save(ctx context.Context, body []byte) error {
select {
case b.sem <- struct{}{}:
case <-ctx.Done():
return ctx.Err()
}
b.wg.Add(1)
go func() {
defer b.wg.Done()
defer func() { <-b.sem }()
defer func() {
if r := recover(); r != nil {
log.Printf("save panic: %v", r)
}
}()
if err := saveToS3(ctx, body); err != nil {
log.Printf("save failed: %v", err)
}
}()
return nil
}
func (b *BackgroundSaver) Shutdown(ctx context.Context) error {
done := make(chan struct{})
go func() {
b.wg.Wait()
close(done)
}()
select {
case <-done:
return nil
case <-ctx.Done():
return ctx.Err()
}
}
func (s *Server) handler(w http.ResponseWriter, r *http.Request) {
body, err := io.ReadAll(r.Body)
if err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
return
}
if err := s.saver.Save(r.Context(), body); err != nil {
http.Error(w, "busy", http.StatusServiceUnavailable)
return
}
w.WriteHeader(202)
}
Rules applied¶
- 1: every spawned goroutine has an exit;
Shutdownwaits. - 4:
ctxfrom the request is threaded;saveToS3respects it. - 5: recover in each goroutine.
- 10:
sembounds concurrency. - 11:
BackgroundSavershould now have a doc comment about concurrency safety.
Observable benefits¶
- Service can shut down gracefully and wait for in-flight saves.
- Backpressure: if the pool is full, the request gets 503 instead of OOM.
- A failing
saveToS3doesn't crash the server.
Optimization 5: Shared mutable state¶
Before¶
type Stats struct {
count int
sum int
}
func (s *Stats) Add(x int) {
s.count++
s.sum += x
}
func (s *Stats) Mean() float64 {
return float64(s.sum) / float64(s.count)
}
Used from many goroutines concurrently.
Violations¶
- Race on
countandsum(Rule 9 detected, Rule 7 prevention missing). - No doc on concurrency safety (Rule 11).
After (mutex version)¶
// Stats is safe for concurrent use by multiple goroutines.
type Stats struct {
mu sync.Mutex
count int
sum int
}
func (s *Stats) Add(x int) {
s.mu.Lock()
defer s.mu.Unlock()
s.count++
s.sum += x
}
func (s *Stats) Mean() float64 {
s.mu.Lock()
defer s.mu.Unlock()
if s.count == 0 {
return 0
}
return float64(s.sum) / float64(s.count)
}
After (atomic version, if Mean is rarely called)¶
// Stats is safe for concurrent use by multiple goroutines.
type Stats struct {
count atomic.Int64
sum atomic.Int64
}
func (s *Stats) Add(x int) {
s.count.Add(1)
s.sum.Add(int64(x))
}
func (s *Stats) Mean() float64 {
// Note: count and sum are read independently; Mean is not atomic
// across both fields. If consistency is required, use the mutex version.
c := s.count.Load()
if c == 0 {
return 0
}
return float64(s.sum.Load()) / float64(c)
}
Rules applied¶
- 7: appropriate primitive for the workload.
- 9: race-free, verified with
-race. - 11: documented safety.
Observable benefits¶
- Calling
Statsconcurrently from 1 000 goroutines no longer races. - Tail latency under contention is bounded by lock fairness.
Optimization 6: Test using time.Sleep¶
Before¶
func TestPipeline(t *testing.T) {
in := make(chan int)
out := pipeline(in)
in <- 1
in <- 2
in <- 3
close(in)
time.Sleep(200 * time.Millisecond)
if len(collect(out)) != 3 {
t.Fail()
}
}
Violations¶
time.Sleepfor synchronisation (Rule 8).collectmay run before pipeline drains.- Flaky on slow CI.
After¶
func TestPipeline(t *testing.T) {
defer goleak.VerifyNone(t)
in := make(chan int)
out := pipeline(in)
go func() {
defer close(in)
in <- 1
in <- 2
in <- 3
}()
var got []int
for v := range out {
got = append(got, v)
}
if len(got) != 3 {
t.Fatalf("got %d, want 3", len(got))
}
}
Rules applied¶
- 8: no
time.Sleep. Thefor v := range outwaits untiloutis closed, whichpipelinemust do when its input drains. - 9: implicit; the test now has a determined synchronisation point.
- 12:
goleak.VerifyNonecatches any leftover goroutines.
Observable benefits¶
- Test is deterministic.
- Test fails immediately if
pipelinedoesn't closeoutcorrectly. - Test fails if
pipelineleaks a goroutine.
Optimization 7: Service main with multiple components¶
Before¶
func main() {
go runHTTPServer()
go runKafkaConsumer()
go runMetricsFlusher()
select {} // block forever
}
Violations¶
- No SIGTERM handling.
- No exit story for the main goroutine (or the children).
- No coordination if one component fails.
After¶
func main() {
if err := run(); err != nil {
log.Fatal(err)
}
}
func run() error {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
sigCh := make(chan os.Signal, 1)
signal.Notify(sigCh, os.Interrupt, syscall.SIGTERM)
go func() {
<-sigCh
log.Println("shutdown signal received")
cancel()
}()
g, ctx := errgroup.WithContext(ctx)
g.Go(func() error { return runHTTPServer(ctx) })
g.Go(func() error { return runKafkaConsumer(ctx) })
g.Go(func() error { return runMetricsFlusher(ctx) })
err := g.Wait()
if err != nil && err != context.Canceled {
return err
}
return nil
}
Rules applied¶
- 1: each component returns when
ctxcancels. - 4: single root
ctx. - 6:
errgroupcoordinates everything.
Observable benefits¶
- SIGTERM triggers orderly shutdown.
- One component crashing brings the others down for restart.
- Exit code reflects success or failure.
Optimization 8: Logging panics with structured information¶
Before¶
go func() {
defer func() {
if r := recover(); r != nil {
log.Println("recover:", r)
}
}()
work()
}()
Violations¶
- Not structured; hard to search.
- No stack trace.
- No metric (Rule 12-adjacent: leak/panic visibility).
After¶
go func() {
defer func() {
if r := recover(); r != nil {
slog.Error("goroutine panic",
slog.String("name", "worker"),
slog.Any("panic", r),
slog.String("stack", string(debug.Stack())),
)
metrics.GoroutinePanics.WithLabelValues("worker").Inc()
}
}()
work()
}()
Rules applied¶
- 5: recover with full observability.
Observable benefits¶
- Stack trace tells you where the panic happened.
- Metric on the dashboard tells you panics are happening now.
- Structured fields are searchable in log aggregators.
Summary¶
Every "Before" snippet in this file represents code that works in development and breaks in production. The "After" snippets apply the twelve rules consistently. The pattern of the rewrite is always the same:
- Add
ctx context.Contextas the first parameter. - Add
defer recoverat every goroutine boundary. - Bound spawning (errgroup.SetLimit or semaphore).
- Replace
time.Sleepwith event-based synchronisation. - Document concurrency safety on every exported type.
- Make every goroutine's exit story one comment line.
- Run
-raceandgoleakto verify.
Internalise the pattern. When you see a "Before"-shape in your codebase, you should be able to type out the "After" without thinking.