Channels vs Mutexes — Middle¶

Table of contents¶

What this file assumes
Channel patterns: pipelines
Channel patterns: worker pools
Channel patterns: semaphores
Channel patterns: reply channels
Channel patterns: fan-out and fan-in
Channel patterns: timeouts and cancellation with select
Mutex patterns: read-write mutex
Mutex patterns: sync.Map
Mutex patterns: hybrid mutex plus condvar
Atomics — the third primitive
Performance comparison, measured
Choosing — a decision tree
Combining channels and mutexes
Common middle-level mistakes
Tricky points
Cheat sheet
Self-assessment checklist
Summary
Further reading

What this file assumes¶

You can: - Spawn goroutines, sync.WaitGroup-wait for them, run -race. - Fix a data race with sync.Mutex or a channel. - Explain ownership transfer vs shared access in plain terms.

You will learn here: - Five concrete channel patterns and three mutex patterns. - The role of sync/atomic in the choice. - How to benchmark a primitive and read the result. - How to compose channels with mutexes when the problem calls for both.

Channel patterns: pipelines¶

A pipeline is a chain of goroutines connected by channels. Each stage reads from its input, transforms, and writes to its output.

func ints(ctx context.Context) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        for i := 0; ; i++ {
            select {
            case <-ctx.Done():
                return
            case out <- i:
            }
        }
    }()
    return out
}

func squares(in <-chan int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        for v := range in {
            out <- v * v
        }
    }()
    return out
}

func sums(in <-chan int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        total := 0
        for v := range in {
            total += v
            out <- total
        }
    }()
    return out
}

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Millisecond)
    defer cancel()
    for v := range sums(squares(ints(ctx))) {
        fmt.Println(v)
        if v > 1_000_000 {
            break
        }
    }
}

Pipeline conventions: - Each stage returns its output channel. - Each stage closes its output channel when its input is exhausted. - A ctx (or a separate done channel) lets upstream stages stop cleanly. - Buffer sizes are kept small (often 0 or 1) until profiling justifies more.

A pipeline cannot be expressed with mutexes without inventing a queue type and a condvar — that's exactly what channels are.

Channel patterns: worker pools¶

Workers range over an input channel. A single producer (or many) sends jobs. A sync.WaitGroup coordinates "all workers done".

type Job struct {
    URL string
}
type Result struct {
    URL    string
    Status int
}

func runPool(jobs <-chan Job, results chan<- Result, workers int) {
    var wg sync.WaitGroup
    for i := 0; i < workers; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for j := range jobs {
                results <- fetch(j)
            }
        }()
    }
    wg.Wait()
    close(results)
}

Calling pattern:

jobs := make(chan Job, 64)
results := make(chan Result, 64)

go runPool(jobs, results, 8)

go func() {
    for _, u := range urls {
        jobs <- Job{URL: u}
    }
    close(jobs)
}()

for r := range results {
    fmt.Println(r)
}

Three rules to make worker pools robust: - One goroutine owns close(jobs). Closing it from a worker is a race. - A small runPool wrapper handles close(results) after wg.Wait(). Don't put this logic in main. - Bound jobs's capacity to manage memory. Unbounded queues hide backpressure.

A mutex-based work queue is possible — protect a []Job slice with a sync.Mutex and use a sync.Cond to wait when empty — but it's three times the code and harder to get right. Channels are the right tool here.

Channel patterns: semaphores¶

A buffered channel of capacity N is a counting semaphore. Holders "acquire" by sending; "release" by receiving.

type Sem chan struct{}

func New(n int) Sem            { return make(Sem, n) }
func (s Sem) Acquire()         { s <- struct{}{} }
func (s Sem) Release()         { <-s }

func main() {
    sem := New(3) // max 3 concurrent

    var wg sync.WaitGroup
    for _, u := range urls {
        wg.Add(1)
        go func(u string) {
            defer wg.Done()
            sem.Acquire()
            defer sem.Release()
            fetch(u)
        }(u)
    }
    wg.Wait()
}

This bounds the number of concurrent fetch calls to 3 without a mutex anywhere. The Go memory model guarantees the (k+N)-th send happens-after the k-th receive — that's exactly a counting semaphore.

The channel form also integrates with select for timeouts:

func (s Sem) AcquireCtx(ctx context.Context) error {
    select {
    case s <- struct{}{}:
        return nil
    case <-ctx.Done():
        return ctx.Err()
    }
}

A mutex-based equivalent needs a sync.Cond, a counter, and a Goroutine-poked timer. It's more code and harder to cancel.

Channel patterns: reply channels¶

A request goroutine sends a value and a reply channel; the server processes the request and writes the answer on the reply channel. This is how you build "ask a goroutine a question".

type req struct {
    key   string
    reply chan int
}

type Counter struct {
    reqs   chan req
    addOps chan int
}

func NewCounter() *Counter {
    c := &Counter{
        reqs:   make(chan req),
        addOps: make(chan int),
    }
    go c.run()
    return c
}

func (c *Counter) run() {
    counts := map[string]int{}
    for {
        select {
        case r := <-c.reqs:
            r.reply <- counts[r.key]
        case k := <-c.addOps:
            counts[k]++
        }
    }
}

func (c *Counter) Inc(key string)   { c.addOps <- key }
func (c *Counter) Get(key string) int {
    r := req{key: key, reply: make(chan int)}
    c.reqs <- r
    return <-r.reply
}

Pattern points: - The reply channel is allocated per call, often with capacity 1 so the server can write without waiting. - The server's select interleaves reads and writes; no mutex is needed because all state lives inside the one goroutine. - The cost is the per-call channel allocation. For hot paths, this can dominate — a mutex-protected map may be cheaper. Measure.

A mutex-based equivalent:

type Counter struct {
    mu     sync.Mutex
    counts map[string]int
}
func (c *Counter) Inc(k string) { c.mu.Lock(); c.counts[k]++; c.mu.Unlock() }
func (c *Counter) Get(k string) int {
    c.mu.Lock(); defer c.mu.Unlock()
    return c.counts[k]
}

Half the lines, faster, and equally correct. The reply-channel form earns its keep only when the server is doing more than a map lookup — when there are real invariants to maintain across multiple fields. Otherwise, ship the mutex.

Channel patterns: fan-out and fan-in¶

Fan-out. N goroutines reading from one channel. Trivial: just start them.

Fan-in. One channel collecting from N goroutines. Two ways:

// 1. Reusing one channel — every producer writes to it.
results := make(chan int)
var wg sync.WaitGroup
for i := 0; i < N; i++ {
    wg.Add(1)
    go func() {
        defer wg.Done()
        results <- compute()
    }()
}
go func() { wg.Wait(); close(results) }()
for r := range results {
    fmt.Println(r)
}

// 2. Merging N input channels into one (when each producer owns its channel).
func merge(cs ...<-chan int) <-chan int {
    out := make(chan int)
    var wg sync.WaitGroup
    wg.Add(len(cs))
    for _, c := range cs {
        go func(c <-chan int) {
            defer wg.Done()
            for v := range c {
                out <- v
            }
        }(c)
    }
    go func() { wg.Wait(); close(out) }()
    return out
}

Use form 1 for worker pools. Use form 2 when each producer has a distinct lifecycle (e.g. one channel per database, one per user session).

Channel patterns: timeouts and cancellation with select¶

select is the channel switch. It picks one ready case; if none is ready, it blocks until one is.

select {
case msg := <-ch:
    handle(msg)
case <-time.After(5 * time.Second):
    return errors.New("timeout")
case <-ctx.Done():
    return ctx.Err()
}

Three rules: 1. If multiple cases are ready, the runtime picks one pseudo-randomly. You cannot rely on case order. 2. A default: case runs when no other case is ready. It turns select non-blocking. Use it for "poll without waiting". 3. A nil channel disables its case. Setting ch = nil is the idiomatic way to remove a case from a long-running select loop.

time.After returns a channel. It's fine for one-shot use, but in a loop it leaks: every iteration allocates a new timer that stays alive for the duration. Inside hot loops, use time.NewTimer once and Reset. We'll cover the leak in find-bug.md.

Mutex patterns: read-write mutex¶

sync.RWMutex allows many readers OR one writer:

type Snapshot struct {
    mu sync.RWMutex
    m  map[string]int
}

func (s *Snapshot) Get(k string) int {
    s.mu.RLock(); defer s.mu.RUnlock()
    return s.m[k]
}
func (s *Snapshot) Set(k string, v int) {
    s.mu.Lock(); defer s.mu.Unlock()
    s.m[k] = v
}

When does RWMutex pay off? When all three are true: - Reads dominate (think 10x or 100x more reads than writes). - The read-side critical section does meaningful work — not just one integer load. - You measured the workload and confirmed RWMutex is faster than plain Mutex and faster than atomic.Pointer[T].

When it doesn't pay off: - Short critical sections (single integer or pointer load) — the RWMutex bookkeeping cost (an atomic CAS on a reader counter shared across CPUs) dominates the savings. - High write rate — writers wait for all readers to drain, which can starve. - When atomic.Pointer[T] would work — readers do a single MOV and pay nothing for the rare writer.

We benchmark these in optimize.md.

Mutex patterns: sync.Map¶

sync.Map is a built-in concurrent map. Its documented use cases (from src/sync/map.go):

"(1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys."

API:

var m sync.Map
m.Store("k", 1)
v, ok := m.Load("k")
m.Delete("k")
m.LoadOrStore("k", 2)
m.Range(func(k, v any) bool { return true })

Two production warnings: - Untyped. Keys and values are any. You'll write type assertions everywhere. Wrap in a typed struct if you must use it. - Not always faster. sync.Map has overhead per operation. For workloads outside its two documented cases — and many real workloads are mixed read/write on shared keys — a plain map under sync.RWMutex (or sharded) outperforms it.

Default reach for a concurrent map: map + sync.RWMutex. Reach for sync.Map only when profiling shows you're in one of its two sweet spots.

Mutex patterns: hybrid mutex plus condvar¶

Some problems need "wait until a condition is true" — for example, a bounded queue with a "wait while full" guard. The primitive is sync.Cond:

type BoundedQueue struct {
    mu       sync.Mutex
    notEmpty *sync.Cond
    notFull  *sync.Cond
    items    []int
    capacity int
}

func New(cap int) *BoundedQueue {
    q := &BoundedQueue{capacity: cap}
    q.notEmpty = sync.NewCond(&q.mu)
    q.notFull = sync.NewCond(&q.mu)
    return q
}

func (q *BoundedQueue) Push(v int) {
    q.mu.Lock(); defer q.mu.Unlock()
    for len(q.items) == q.capacity {
        q.notFull.Wait()
    }
    q.items = append(q.items, v)
    q.notEmpty.Signal()
}

func (q *BoundedQueue) Pop() int {
    q.mu.Lock(); defer q.mu.Unlock()
    for len(q.items) == 0 {
        q.notEmpty.Wait()
    }
    v := q.items[0]
    q.items = q.items[1:]
    q.notFull.Signal()
    return v
}

But — this is exactly what a buffered channel does:

q := make(chan int, capacity)
q <- v   // Push, blocks if full
v := <-q // Pop, blocks if empty

sync.Cond exists for the cases where channels don't fit — typically when the "wake" condition is not a value being added or removed (e.g. "wake when the configuration has been reloaded"). In everyday code, sync.Cond is rare. If you're reaching for it, ask whether a channel would be cleaner.

Atomics — the third primitive¶

sync/atomic provides single-word atomic operations with no scheduler involvement. Since Go 1.19, the typed wrappers (atomic.Int64, atomic.Uint64, atomic.Pointer[T], atomic.Bool) are the idiomatic API:

var n atomic.Int64
n.Add(1)
v := n.Load()
n.Store(42)
swapped := n.CompareAndSwap(42, 100)

When to use atomics: - Counters (atomic.Int64). - Flags (atomic.Bool). - Pointers to immutable snapshots (atomic.Pointer[T]). - Single-word values where no compound invariant needs protecting.

Atomics are faster than mutexes for these patterns by 5–50x (no scheduler hop, no lock acquisition — one CPU instruction). They are not a general substitute for mutexes: you cannot atomically update two fields together.

A common pattern: read-mostly config with hot reload.

type Config struct {
    Timeout time.Duration
    Backend string
}

type Server struct {
    cfg atomic.Pointer[Config]
}

func (s *Server) Handle() {
    c := s.cfg.Load()        // pointer load — sub-nanosecond
    use(c.Timeout, c.Backend)
}

func (s *Server) Reload(c *Config) {
    s.cfg.Store(c)            // atomic pointer store
}

Readers never block. The writer never blocks readers. The "trick" is that *Config is treated as immutable — Reload constructs a new one rather than mutating the old.

Performance comparison, measured¶

On go1.22 darwin/arm64 M1 Pro, single-process, 32 goroutines:

Operation	ns/op
`atomic.Int64.Add(1)`	6
`mu.Lock(); n++; mu.Unlock()`	30 (uncontended), 100 (contended)
`rwmu.RLock(); _ = m[k]; rwmu.RUnlock()` (short section)	40
`ch <- v` paired with `v := <-ch` on size-1 chan	80
`ch <- v` paired with `v := <-ch` on unbuffered chan	120
Reply-channel round trip	350
`selectgo` with 2 cases ready	60

These numbers do not generalise to "always use atomics". They establish relative cost per operation. For a counter that runs 1 million times per second, choosing an atomic over a channel saves 50% CPU. For a counter that runs 10 times per second, the choice is invisible — pick whichever expresses intent better.

The most important rule: measure your actual workload before optimising. Microbenchmarks lie because they measure the primitive in isolation; real code has cache effects, work between operations, and contention patterns that change everything.

Choosing — a decision tree¶

Walk it top to bottom; first match wins.

Does the operation modify a single integer, bool, or pointer? → sync/atomic.
Are you transferring ownership of a value from one goroutine to another? → channel.
Are you broadcasting a one-shot signal to many goroutines? → close(chan struct{}) or context.Context.
Are you serialising access to in-place state with a multi-field invariant? → sync.Mutex (or RWMutex if reads dominate and read sections are non-trivial).
Is the state read 100x more than written and is the writer rare? → atomic.Pointer[T] with immutable snapshots.
Do you need to wait for an arbitrary condition? → can you express it as a channel? if not, sync.Cond (rare).

If you reach the end of the tree without a match, the design is unusual — slow down and write the invariant down explicitly.

Combining channels and mutexes¶

Real systems use both. The split: - Channels carry events between subsystems. - Mutexes protect shared state inside one subsystem.

Example: a webhook delivery system.

type Delivery struct {
    URL string
    Body []byte
}

type Service struct {
    queue   chan Delivery        // channel: events in
    stats   stats                // mutex-protected: shared metrics
}

type stats struct {
    mu        sync.Mutex
    sent      int64
    failed    int64
}

func (s *Service) run() {
    for d := range s.queue {           // channel ownership transfer
        if err := s.post(d); err != nil {
            s.stats.mu.Lock()           // mutex on shared metrics
            s.stats.failed++
            s.stats.mu.Unlock()
        } else {
            s.stats.mu.Lock()
            s.stats.sent++
            s.stats.mu.Unlock()
        }
    }
}

The channel carries the work; the mutex protects the metrics that any goroutine (including a /metrics HTTP handler) might read.

A different split would be wrong. Replacing the channel with a mutex-protected []Delivery adds 50 lines of Cond plumbing for no benefit. Replacing the mutex with a channel to "ask the stats goroutine" turns every metric read into a scheduler hop.

Common middle-level mistakes¶

time.After in a loop. Allocates a timer every iteration. Use time.NewTimer outside the loop and Reset.
Forgetting to drain a buffered channel before closing. Closing is fine even with pending values — receivers will drain. The mistake is forgetting that senders must not run after close.
Returning <-chan T from a method that doesn't own the channel. If the caller can close it, you have a race over who closes. The convention is: the goroutine that creates the channel via make is the one that closes it.
Using sync.Map reflexively. It's a tool, not the default. Plain map + sync.Mutex is the right starting point.
RWMutex with single-load critical sections. Slower than plain Mutex because of bookkeeping. Benchmark.
Mutex held during I/O. A handler that holds a lock while calling out to a database serialises all other handlers behind it. Release the lock first, copy the data out, then do I/O.
Goroutine leaks from unbounded channels. A consumer that exits early can leave a producer blocked on send. Either bound the channel and add a context for cancellation, or close the input from the consumer side via a side-channel.

Tricky points¶

select with one case is the same as a plain receive but pays for selectgo overhead (~30 ns extra). Don't write select { case v := <-ch: ... } — just write v := <-ch.
A buffered channel's k-th receive happens-before the (k+C)-th send. That's the memory model guarantee for "buffered channel as semaphore". Worth reading once and trusting forever.
sync.RWMutex upgrade is not supported. Holding RLock and trying to Lock will deadlock. Release the read lock first, then Lock (and re-check the invariant — another goroutine may have changed state between your unlock and lock).
sync.Mutex is fair-ish. Starvation mode (Go 1.9+) kicks in if a waiter waits >1ms; then the mutex hands off to waiters in FIFO order. You almost never need to think about it, but it's why your contended benchmarks plateau instead of starving.

Cheat sheet¶

Problem	Primitive
Counter	`atomic.Int64`
Boolean flag	`atomic.Bool`
Read-mostly snapshot	`atomic.Pointer[T]`
Map, mixed access	`map + sync.Mutex`
Map, dominantly read	`map + sync.RWMutex` (if read section is non-trivial)
Map, write-once read-many	`sync.Map`
Producer → consumer	`chan T`
Worker pool	`chan Job` + `sync.WaitGroup`
Counting semaphore	`chan struct{}` of capacity N
Fan-out signal	`close(chan struct{})`
Cancellation	`context.Context` (channel underneath)
Per-call ask-a-goroutine	reply `chan T`

Self-assessment checklist¶

I can build a pipeline of 3 stages with ctx-based cancellation.
I can build a worker pool with runPool(jobs, results, workers).
I can implement a counting semaphore in three lines with chan struct{}.
I know when RWMutex is faster than Mutex and when it's slower.
I know sync.Map's two documented use cases and don't reach for it elsewhere.
I can choose between sync.Mutex, sync/atomic, and chan T for any single-operation scenario.
I use defer mu.Unlock() and I don't hold a mutex during I/O.
I benchmark before optimising.

Summary¶

Channels carry values; mutexes protect in-place state; atomics handle single words. Each has a sweet spot and a pathological case. The defaults at this level: - Counter: atomic. - Flag: atomic. - Shared map: map + sync.Mutex (or sharded if hot). - Producer/consumer: channel. - Fan-out signal: closed channel or context.Context.

Reach for RWMutex only when reads dominate and the read section is non-trivial. Reach for sync.Map only when your access pattern matches its two documented cases. Reach for sync.Cond only when channels don't fit.

In senior.md we'll move from picking a primitive to refactoring real code from one primitive to another, with concrete before/after measurements and the production decision frameworks that justify the change.