Mutex — Middle Level¶
Topic: Mutex Focus: RWMutex, TryLock, fairness, recursive locks
Table of Contents¶
- Introduction
- Prerequisites
- Glossary
- Core Concepts
- Real-World Analogies
- Mental Models
- Code Examples
- Pros & Cons
- Use Cases
- Coding Patterns
- Clean Code
- Best Practices
- Edge Cases & Pitfalls
- Common Mistakes
- Tricky Points
- Test Yourself
- Tricky Questions
- Cheat Sheet
- Summary
- What You Can Build
- Further Reading
- Related Topics
- Diagrams & Visual Aids
Introduction¶
At the junior level you learned the basic shape of a mutex: one lock, one critical section, lots of bugs avoided. That model gets you surprisingly far, but the moment you put real load on it you discover something uncomfortable — a single mutex is the easiest way to turn an eight-core machine into a one-core machine. Threads queue up, throughput plateaus, and a profiler shows most of your CPU time is spent waiting.
The middle-level view of mutexes is about all the knobs people invented to get around that ceiling. You will meet reader-writer locks, which let many readers share access at the same time. You will meet TryLock, which lets you decline to wait. You will meet fair versus unfair mutexes and learn why "fair" is not automatically better. You will meet reentrant mutexes, adaptive mutexes, and the dreaded lock convoy, where a few slow threads can drag down an entire process.
Most importantly, you will start thinking about lock granularity — the art of splitting one big lock into many small locks without introducing new bugs. Almost every performance story in high-concurrency systems is, at some level, a lock-granularity story.
The goal of this guide is not to memorize every API. It is to give you a mental model strong enough that when you read someone else's code and see sync.RWMutex, std::shared_mutex, or ReentrantReadWriteLock, you immediately know what trade-offs are on the table.
Prerequisites¶
Before reading this guide you should already be comfortable with:
- The junior-level mutex material: critical sections,
Lock()/Unlock(),defer Unlock()in Go, RAII guards in C++, race conditions, and basic deadlocks. - The difference between a goroutine and an OS thread (or the equivalent abstraction in your language).
- Atomic operations and
compare-and-swapat least at the conceptual level — you do not need to implement them, but you should know what they do. - Memory visibility basics: that without synchronization, a write on one core might not be seen on another, and that mutexes establish a happens-before relationship.
- Reading concurrent code: being able to look at a function and identify what is shared state and what is local.
If any of those feel shaky, revisit junior.md and the topic on memory models before continuing.
Glossary¶
- Critical section — a region of code that must be executed by only one thread at a time to preserve invariants.
- RWMutex / shared_mutex — a lock with two modes: shared (many readers) and exclusive (one writer).
- TryLock — non-blocking attempt to acquire a lock; returns immediately with success or failure.
- Timed lock — blocking attempt with a deadline, e.g.
tryLock(timeout). - Fair lock — grants the lock to waiters in FIFO order.
- Unfair lock — allows new arrivals to "barge in" ahead of waiters; can improve throughput at the cost of latency variance.
- Reentrant lock — the same thread can acquire the same lock multiple times without deadlocking.
- Lock ordering — a global rule that defines the order in which locks must be acquired to avoid deadlocks.
- Lock granularity — how much state is protected by one lock; coarse means a lot, fine means little.
- Spinlock — a lock that busy-waits in a tight loop instead of sleeping.
- Adaptive mutex — spins briefly, then sleeps if the lock is still held; typical of modern
pthread_mutex_t. - Futex — Linux primitive (Fast Userspace Mutex) that allows fast uncontended locking in user space and only enters the kernel on contention.
- Lock convoy — pathological situation where many threads end up serialized behind one slow lock holder, even after that holder is done.
- Starvation — a thread is repeatedly denied progress because others keep winning the lock.
- Write starvation — a writer cannot make progress because readers keep arriving in an RWMutex.
Core Concepts¶
Reader-writer locks (RWMutex / shared_mutex / pthread_rwlock_t)¶
A plain mutex treats every access the same: read or write, you wait. But many real workloads are read-heavy — caches, config tables, routing maps. For those, blocking all readers behind each other is wasteful.
A reader-writer lock has two modes:
- Shared (read) mode — many threads can hold the lock simultaneously, as long as nobody holds it in write mode.
- Exclusive (write) mode — only one thread can hold the lock, and no readers may be active.
| Goroutine A: RLock |---read---|
| Goroutine B: RLock |---read---|
| Goroutine C: RLock |---read---|
| Goroutine D: Lock |---WRITE---|
All three readers run concurrently; the writer waits until they all release.
The languages:
- Go:
sync.RWMutexwithRLock(),RUnlock(),Lock(),Unlock(). - C++17:
std::shared_mutexwithlock_shared()/unlock_shared()for readers andlock()/unlock()for writers. - Java:
java.util.concurrent.locks.ReentrantReadWriteLockwith.readLock()and.writeLock()views. - C / POSIX:
pthread_rwlock_twithpthread_rwlock_rdlockandpthread_rwlock_wrlock.
When does an RWMutex actually win? When reads are common, writes are rare, and the critical section is long enough that the extra bookkeeping pays for itself. For very short critical sections, the overhead of an RWMutex can make it slower than a plain mutex.
Try-lock and timed lock¶
A plain Lock() says "I will wait as long as necessary." Sometimes that is the wrong promise. You might want to:
- Skip work if it is already in progress.
- Try a different shard if this one is busy.
- Give up after a deadline.
For this, languages provide:
TryLock()— returnstrueif it got the lock,falseif not.TryLockTimeout(d)/try_lock_for(d)— blocks up to a duration, then gives up.
TryLock is also a useful debugging tool. If you suspect a deadlock, you can replace Lock() with a TryLock plus a timeout, log the failure, and the code will tell you exactly which lock is being held too long.
Warning: TryLock is easy to abuse. A spin loop of TryLock with no backoff is just a hand-rolled, very inefficient spinlock. Always pair it with a sensible fallback strategy.
Fair vs unfair mutex¶
A fair mutex grants the lock to waiters in the order they arrived (FIFO). An unfair mutex lets a thread that happens to be running take the lock even if other threads are queued, in the name of throughput.
Examples:
- Java's
ReentrantLockcan be constructed as fair (new ReentrantLock(true)) or unfair (default). - Go's
sync.Mutexis mostly unfair for performance but has a starvation mode (introduced in 1.9) that kicks in if a waiter has been waiting more than 1ms: while in starvation mode, the lock is handed directly to the next waiter to prevent indefinite starvation. pthread_mutex_tis typically unfair (implementation-defined).
Fair locks have higher latency under contention because every handoff involves waking a parked thread, which costs a context switch. Unfair locks have higher throughput because the currently-running thread can re-acquire a just-released lock without context-switching, but a slow or unlucky thread can be starved.
Choose fairness when you genuinely need bounded waiting (e.g. a request queue where every request must eventually be served). Choose unfairness for most performance-critical code.
Recursive / reentrant mutex¶
A reentrant mutex lets the same thread acquire the lock multiple times without deadlocking. The lock internally tracks a count and the owner; only when the count returns to zero is the lock released.
ReentrantLock lock = new ReentrantLock();
lock.lock();
lock.lock(); // OK, count = 2
lock.unlock(); // count = 1
lock.unlock(); // count = 0, released
In Go, sync.Mutex is not reentrant. Calling Lock() twice on the same goroutine deadlocks immediately. This is a deliberate design choice: the Go authors view reentrant locks as a code smell that hides design problems.
Where reentrancy helps: you have a public method that locks, and it calls another public method that also wants to lock. With a non-reentrant mutex you must refactor so the inner method has a non-locking variant. With a reentrant mutex, it Just Works — but at the price of making lock holding much less explicit.
Lock ordering rule and deadlocks¶
When code acquires more than one lock, deadlock becomes possible. The canonical example:
If Thread 1 has A and Thread 2 has B, both wait forever. The standard solution is the lock ordering rule: assign a global total order to all locks (e.g. by address, by name, by ID), and require that every code path acquires them in that order.
// Pick the lower-id account first.
first, second := from, to
if first.ID > second.ID {
first, second = second, first
}
first.mu.Lock()
second.mu.Lock()
// ...
second.mu.Unlock()
first.mu.Unlock()
This is mechanical, but it works. The hard part in real systems is keeping the rule documented and enforced as the code evolves.
Lock granularity¶
Coarse-grained locking protects a lot of state with one lock. It is easy to reason about and rarely deadlocks, but it is a bottleneck.
Fine-grained locking gives each piece of state its own lock. It can scale wonderfully but introduces more lock ordering work and more opportunities for bugs.
A typical evolution path:
- One global mutex around the whole structure.
- One mutex per shard / bucket / chunk.
- Per-node locks plus careful lock-coupling for traversals.
- Lock-free with atomics where possible.
Each step gives you more concurrency and more pain. Stop at the level where your performance goals are met.
Spinlock vs blocking mutex¶
A spinlock busy-waits: it loops, checking an atomic flag until the lock becomes free. A blocking mutex parks the thread in the kernel until the lock is released.
Spinlocks win when:
- The critical section is very short (a few dozen instructions).
- Context switch cost dominates wait time.
- You are running on the same number of cores as threads (kernels, drivers).
Blocking mutexes win when:
- Critical sections might be long.
- You have more threads than cores (typical for user-space programs).
- Energy efficiency matters; spinning burns CPU.
Most modern mutexes are hybrid: they spin briefly (assuming a short critical section), then fall back to sleeping if the lock is still held. This is the adaptive mutex pattern.
Mutex vs semaphore vs binary semaphore¶
- A mutex is owned: only the thread that locked it may unlock it. It is used for mutual exclusion.
- A counting semaphore is a counter: any thread may acquire or release, and the counter limits how many simultaneous holders there can be.
- A binary semaphore is a counting semaphore with max=1. It looks like a mutex but lacks ownership semantics — one thread can
waitand a different thread cansignal. This is useful for signaling but is a footgun if you want mutual exclusion.
Rule of thumb: if a single thread acquires and releases for protection, use a mutex. If you are signaling between threads, use a semaphore or condition variable.
Adaptive mutex (futex on Linux)¶
On Linux, pthread_mutex_t and Go's sync.Mutex are both built on the futex (Fast Userspace muTEX) syscall. The trick: in the uncontended case, lock/unlock is a single atomic compare-and-swap entirely in user space — no syscall, no context switch. Only when contention is detected does the code enter the kernel to park or wake a waiter.
The "adaptive" part adds a brief spin before parking. The reasoning is that the lock holder is often about to release, and spinning for a few microseconds is cheaper than the full park/wake round trip.
This is why a mutex on Linux feels nearly free when uncontended and gets gradually more expensive as contention rises.
Lock convoy phenomenon¶
A lock convoy is a pathological state where a frequently-acquired lock serializes many threads behind one slow holder. After the holder releases, all the waiters wake up one by one, each takes the lock, does a tiny bit of work, then releases. Because they are all woken in roughly the same time window, they end up forming a "convoy" that travels through the lock together, and throughput collapses to the rate of one acquisition at a time.
Symptoms:
- CPU utilization is low but threads are blocked.
- Latency spikes correlate with a particular lock.
- Removing the lock or sharding it dramatically improves throughput.
Fixes include sharding the lock, using lock-free structures, or using a queue with a single owning thread.
Real-World Analogies¶
- RWMutex — a museum painting. Many people may look at the same time (readers), but when a curator wants to take it down for cleaning (writer), the room is closed and everyone else has to leave first.
- TryLock — knocking on a meeting-room door. If it is locked, you do not stand in the hallway waiting; you go book a different room.
- Fair vs unfair — a queue at a coffee shop. A fair barista serves strictly FIFO. An unfair barista hands the next coffee to whoever happens to be standing closest, which is faster overall but means quiet customers may wait a long time.
- Reentrant lock — a hotel room key card that lets the owner come back in repeatedly without needing a new card each time.
- Lock ordering — agreeing that in every recipe, you measure flour before sugar. Two cooks never collide if they share the rule.
- Lock granularity — a single big bathroom vs many small ones. One big door is easier to manage but creates queues; many small ones allow more parallelism but require more cleaning.
- Spinlock — pacing right outside the door, ready to walk in the instant it opens. Fine for a 5-second wait, miserable for an hour.
- Adaptive mutex — pacing for a minute, then if the door is still shut going to sit in the waiting area and being called when ready.
- Lock convoy — at a busy elevator, everyone keeps just missing the doors and ending up on the next trip, so the whole crowd travels together instead of spreading out.
Mental Models¶
Locks as serialization¶
Every lock is a chokepoint that converts parallel work into serial work for the time the lock is held. If you draw your program as a graph of goroutines and lock holds, the time spent inside locks is your serial fraction. Amdahl's law then tells you the maximum speedup you can ever get from more cores. This is why granularity matters so much — splitting one lock into N usually divides your serial fraction by close to N.
Readers and writers as priorities¶
An RWMutex is a tiny priority scheduler: when only readers want it, they all pass; when a writer wants it, somebody has to decide whether to favor the writer (which can starve readers) or the readers (which can starve writers). Different implementations make different choices. Go's sync.RWMutex prevents new readers from acquiring once a writer is waiting, which avoids writer starvation at the cost of slightly increased reader latency.
Lock-as-token¶
Think of holding a lock as holding a physical token. Only one token per mutex. You may not duplicate it, you may not give it to a friend, you must return it before leaving the building. This model rules out a lot of bad patterns: locking on goroutine A and unlocking on goroutine B feels obviously wrong when you imagine someone else returning your wallet.
The two-axes diagram of locks¶
Plot locks on two axes: fairness (unfair to fair) and granularity (coarse to fine). Most performance-critical systems live in the "unfair + fine" quadrant. Most teaching examples live in "fair + coarse" because it is simpler to reason about. Knowing where you are on this map helps you predict the failure modes.
Code Examples¶
Example 1 — Bank account with deposit/withdraw¶
package main
import (
"errors"
"fmt"
"sync"
)
type Account struct {
mu sync.Mutex
id int
balance int
}
func NewAccount(id, balance int) *Account {
return &Account{id: id, balance: balance}
}
func (a *Account) Deposit(amount int) {
a.mu.Lock()
defer a.mu.Unlock()
a.balance += amount
}
func (a *Account) Withdraw(amount int) error {
a.mu.Lock()
defer a.mu.Unlock()
if a.balance < amount {
return errors.New("insufficient funds")
}
a.balance -= amount
return nil
}
func (a *Account) Balance() int {
a.mu.Lock()
defer a.mu.Unlock()
return a.balance
}
// Transfer demonstrates lock ordering: always lock the lower id first.
func Transfer(from, to *Account, amount int) error {
first, second := from, to
if first.id > second.id {
first, second = second, first
}
first.mu.Lock()
defer first.mu.Unlock()
second.mu.Lock()
defer second.mu.Unlock()
if from.balance < amount {
return errors.New("insufficient funds")
}
from.balance -= amount
to.balance += amount
return nil
}
func main() {
a := NewAccount(1, 1000)
b := NewAccount(2, 500)
var wg sync.WaitGroup
for i := 0; i < 100; i++ {
wg.Add(2)
go func() {
defer wg.Done()
_ = Transfer(a, b, 1)
}()
go func() {
defer wg.Done()
_ = Transfer(b, a, 1)
}()
}
wg.Wait()
fmt.Println("a:", a.Balance(), "b:", b.Balance())
}
Why this is interesting: without the first.id < second.id rule, two concurrent transfers in opposite directions could deadlock. With the rule, every transfer locks accounts in the same global order, so no cycle can form.
Example 2 — Dining philosophers with lock ordering¶
package main
import (
"fmt"
"math/rand"
"sync"
"time"
)
type Fork struct {
id int
mu sync.Mutex
}
func philosopher(id int, left, right *Fork, wg *sync.WaitGroup) {
defer wg.Done()
first, second := left, right
if first.id > second.id {
first, second = second, first
}
for meal := 0; meal < 3; meal++ {
time.Sleep(time.Duration(rand.Intn(50)) * time.Millisecond)
first.mu.Lock()
second.mu.Lock()
fmt.Printf("philosopher %d eats meal %d\n", id, meal)
time.Sleep(time.Duration(rand.Intn(20)) * time.Millisecond)
second.mu.Unlock()
first.mu.Unlock()
}
}
func main() {
const n = 5
forks := make([]*Fork, n)
for i := range forks {
forks[i] = &Fork{id: i}
}
var wg sync.WaitGroup
for i := 0; i < n; i++ {
wg.Add(1)
go philosopher(i, forks[i], forks[(i+1)%n], &wg)
}
wg.Wait()
}
Without the swap, philosopher i would always grab fork i first and philosopher 0 in particular would create a cycle with philosopher 4. With the swap, the global "lower id first" rule eliminates the cycle.
Example 3 — RWMutex read-heavy cache¶
package main
import (
"fmt"
"sync"
"time"
)
type Cache struct {
mu sync.RWMutex
store map[string]string
}
func NewCache() *Cache {
return &Cache{store: make(map[string]string)}
}
func (c *Cache) Get(key string) (string, bool) {
c.mu.RLock()
defer c.mu.RUnlock()
v, ok := c.store[key]
return v, ok
}
func (c *Cache) Set(key, value string) {
c.mu.Lock()
defer c.mu.Unlock()
c.store[key] = value
}
func main() {
c := NewCache()
c.Set("hello", "world")
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
if v, ok := c.Get("hello"); ok {
fmt.Printf("reader %d sees %s\n", i, v)
}
}(i)
}
// One writer racing with readers.
wg.Add(1)
go func() {
defer wg.Done()
time.Sleep(time.Microsecond)
c.Set("hello", "world v2")
}()
wg.Wait()
}
All readers may run concurrently. The writer momentarily blocks new readers and waits for current ones to finish. If reads dominate, this beats a plain sync.Mutex for throughput.
Example 4 — TryLock with backoff¶
package main
import (
"fmt"
"math/rand"
"sync"
"time"
)
type Job struct {
mu sync.Mutex
id int
}
func process(j *Job, workerID int) {
backoff := 1 * time.Millisecond
for attempts := 0; attempts < 5; attempts++ {
if j.mu.TryLock() {
defer j.mu.Unlock()
fmt.Printf("worker %d got job %d\n", workerID, j.id)
time.Sleep(10 * time.Millisecond) // simulate work
return
}
fmt.Printf("worker %d backing off %v\n", workerID, backoff)
time.Sleep(backoff + time.Duration(rand.Intn(int(backoff))))
backoff *= 2
}
fmt.Printf("worker %d gives up on job %d\n", workerID, j.id)
}
func main() {
j := &Job{id: 42}
var wg sync.WaitGroup
for i := 0; i < 4; i++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
process(j, i)
}(i)
}
wg.Wait()
}
This is the rough shape of an optimistic worker: try to claim the job, and if you can't, back off exponentially with jitter so workers do not synchronize their retries.
Pros & Cons¶
Pros of RWMutex over plain mutex¶
- Parallel readers can improve throughput dramatically for read-heavy workloads.
- Conceptually models the read/write distinction directly.
Cons of RWMutex¶
- Higher overhead per operation than a plain mutex.
- Risk of writer starvation if implementation favors readers.
- More complex to reason about.
- Reentrancy and upgrade (read -> write) semantics differ between languages; getting them wrong causes deadlocks.
Pros of TryLock¶
- Lets callers express "do this if I can, otherwise skip."
- Useful for soft locks, debouncing, and avoiding pile-up.
- Helpful diagnostic tool.
Cons of TryLock¶
- Tempting to abuse as a hand-rolled spinlock.
- Can hide bugs by silently doing nothing when a lock is contended.
Pros of fair locks¶
- Bounded waiting; no starvation.
- Predictable latency.
Cons of fair locks¶
- Lower throughput due to enforced context switches.
- Slower in low-contention cases.
Pros of reentrant locks¶
- Simpler API for recursive call chains.
- Eases refactoring of code that calls between locked methods.
Cons of reentrant locks¶
- Encourages large, sprawling locked regions.
- Hides lock structure; harder to audit.
- Slightly higher overhead per acquisition.
Use Cases¶
- In-memory caches — RWMutex is the textbook fit.
- Configuration objects — rarely written, read by many goroutines.
- Connection pools —
TryLockfor "give me a connection if one is free" semantics, fall back to opening a new one if not. - Resource throttling — fair locks when you must serve clients in arrival order.
- Background workers —
TryLockto ensure only one instance of a job runs at a time even if scheduled twice. - Graph traversals — fine-grained per-node locks acquired in a fixed order to avoid deadlocks.
- Bank-style transfers — pair of locks acquired in canonical order.
- Plugin systems with re-entry — reentrant locks when a plugin may call back into the host.
Coding Patterns¶
Pattern: Always lock-order¶
When two or more locks may be taken together, pick a canonical order and document it at the top of the file. The most common scheme is "by ID ascending."
Pattern: Short critical sections¶
Do as little as possible while holding the lock. Compute outside; copy in, do the minimal mutation, copy out, release.
Pattern: RAII / defer Unlock¶
Acquire and immediately set up the release. In Go: defer mu.Unlock(). In C++: a std::lock_guard or std::scoped_lock. In Java: a try/finally around the body. This guarantees release even on panics or exceptions.
Pattern: Copy-then-process¶
Inside the critical section, copy the data you need to a local variable; release the lock; do the slow processing on the copy. This drastically shortens the critical section.
Pattern: Snapshot publication¶
For very read-heavy structures, store the data behind an atomic.Pointer and replace the whole snapshot on write. Readers never block. Writers pay a copy cost but are rare. This is sometimes called "copy-on-write."
Pattern: Sharded locks¶
Instead of one lock for a map, split the keyspace into N shards, each with its own lock. Contention drops by roughly a factor of N.
Pattern: TryLock + fallback queue¶
If TryLock fails, push the work into a queue handled by the current holder, rather than waiting. Avoids convoys.
Clean Code¶
- Name the mutex after what it protects:
usersMu, notmu. - Place the mutex field directly above the fields it protects, and add a comment:
// usersMu protects users below. - One method, one critical section. If a method needs to do unrelated work, split it.
- Never hold a lock across a function call you do not control (callback, channel send, I/O). The user-supplied code might re-enter or block forever.
- Do not return values that share state with the locked region without copying.
- Lock fields, not whole structures, when the locking story is per-field.
Best Practices¶
- Profile before adding locks. Many "concurrency bugs" are really single-threaded bugs, and many "scalability issues" are really algorithmic ones.
- Use RWMutex only when reads dominate. A useful heuristic: at least 5:1 read:write ratio and critical sections longer than ~100ns.
- Avoid upgrading a read lock to a write lock. Most APIs do not support it; the safe pattern is "release read, take write, re-check."
- Always use defer / RAII to release. Manual
Unlock()at the end of a function is a bug waiting to happen. - Keep critical sections free of I/O. Disk and network calls inside a lock are how you get production incidents.
- Document the lock order. A single comment at the top of the file can prevent hours of debugging later.
- Prefer fine-grained over coarse-grained when you can measure the benefit — but do not jump to fine-grained without data.
- Use a single lock per logical resource. Two locks for the same data is double the bug surface.
- Make zero-value mutexes work. In Go,
sync.Mutex{}is usable immediately. Do not require an initializer. - Run with the race detector in CI.
go test -racecatches data races that no amount of code review will.
Edge Cases & Pitfalls¶
- Forgetting to RUnlock. A read lock leak is just as deadly as a write lock leak; writers will eventually block forever.
- Upgrading a read lock. In Go,
RLockthenLockfrom the same goroutine deadlocks becauseLockwaits for all readers, including you. - Copying a mutex. A
sync.Mutexis not safe to copy after first use. Passing a struct by value copies its mutex and breaks its protection. - Locking on a method receiver that is itself a value. Method
func (a Account) Deposit()operates on a copy; its mutex protects nothing. - Locking inside callbacks. If you call user code while holding a lock, the user code might call back into your code and try to take the same lock; with non-reentrant locks that is an instant deadlock.
- TryLock loop without backoff. That is a spinlock. On a busy core, it burns CPU without making progress.
- Long critical sections. Anything more than tens of microseconds is suspicious; anything that does I/O is almost certainly wrong.
- Mixing lock-free reads with locked writes without proper memory fences — you may see torn or stale values.
- Returning pointers from inside a critical section to data still protected by that lock — caller dereferences without holding the lock.
Common Mistakes¶
- Treating an RWMutex as "the better mutex." For short critical sections it can be slower than
sync.Mutexbecause of extra bookkeeping. - Forgetting that Go's
sync.Mutexis not reentrant and recursing into a locked method. - Using
TryLockand ignoring the false return value, silently dropping work. - Holding a lock while sending on an unbuffered channel — the goroutine reading the channel might need the same lock.
- Acquiring locks in different orders in different code paths.
- Using
defer mu.Unlock()at the top of a long function whose critical section should be smaller —defermakes the lock last until the function returns. - Forgetting that mutexes contribute to memory ordering. People sometimes remove a mutex thinking "it's just one int", losing memory visibility guarantees.
- Adding
runtime.Gosched()afterLock()in the hope of being "fair." It just hurts performance. - Using a binary semaphore as a "mutex" between threads, then accidentally unlocking from a different thread.
Tricky Points¶
- Go's RWMutex prevents new readers when a writer is waiting. This means a thundering herd of readers cannot starve a writer indefinitely, but it also means that during high write activity, RWMutex behaves more like a Mutex.
std::shared_mutexin C++ does not guarantee writer preference. The standard leaves it implementation-defined. Different platforms may behave differently.- Lock upgrades and downgrades. Downgrading (write -> read) is usually safe and supported. Upgrading (read -> write) is usually not supported and tends to deadlock — for upgrade scenarios, release and re-acquire and double-check invariants.
- Recursive locks have hidden costs. Each acquisition does extra bookkeeping (owner thread, count), and they make it harder to reason about lock scope.
- Spinning vs sleeping on a single-core machine. Spinning is pointless because the holder cannot make progress while you spin. Adaptive mutexes detect this on some platforms.
- Go's mutex starvation mode changes behavior under load. Code that benchmarks fine in low contention can behave quite differently when a waiter has been stuck for 1ms.
- Fairness has subtle consequences for memory ordering because waking a thread is itself a synchronization point.
Test Yourself¶
- Two threads run
Lock(A); Lock(B)andLock(B); Lock(A). Why is this a deadlock, and how does lock ordering fix it? - When does an RWMutex underperform a plain Mutex?
- What is the difference between
TryLockandLockwith a zero timeout? - In Go, what happens if you call
Lock()twice from the same goroutine on async.Mutex? - Describe the lock convoy phenomenon in your own words. How can you detect one in production?
- Why is
sync.Mutexnot safe to copy after first use? - Suppose you want at most one instance of a background job to run at a time, but you do not want to queue. Which lock primitive fits?
- What is the difference between fair and unfair locks, and when would you choose each?
- Why is a binary semaphore not a substitute for a mutex?
- What does "adaptive" mean for a mutex, and why is it beneficial on Linux?
Tricky Questions¶
-
Q: You have an in-memory cache and want both
GetandGetOrLoad, whereGetOrLoadmay take a long time to load if the key is missing. Should the load happen while holding the write lock? A: No. Hold the read lock, check, release, then either return or transition to "loading." A common pattern is to keep a per-keysync.Onceso concurrent loads of the same key share a single load. -
Q: A profiler shows your
sync.Mutexis the top hot spot. Should you switch tosync.RWMutex? A: Only if the workload is read-heavy and your critical sections are non-trivial. Sometimes the right answer is sharding the lock, not switching to RWMutex. -
Q: You use
TryLockinside a loop with no sleep. Performance is poor. Why? A: You wrote a spinlock with cache-line contention; the loop hammers the lock variable across cores, evicting cache lines. Add a backoff or restructure. -
Q: Your code locks accounts in transfer like
from.Lock(); to.Lock();. It works in tests. Why does it deadlock in production? A: Two concurrent transfersA -> BandB -> Aproduce opposite acquisition orders. Tests with one direction never hit this. Add lock ordering by account ID. -
Q: You have a Go RWMutex and a writer is waiting. New readers keep arriving. Are they blocked or do they proceed? A: New readers block. Go's implementation prevents new readers from acquiring while a writer is waiting, to avoid writer starvation.
-
Q: A reentrant lock has an internal "count + owner." Two threads share a reentrant lock. Why does this still serialize them? A: Reentrancy only helps the same owner. If a different thread wants the lock, it still waits. Reentrancy is not about parallelism; it is about recursion.
-
Q: You see
pthread_mutex_lockreturn EOWNERDEAD. What does it mean? A: The previous owner died while holding the lock. You acquired it, but the state it protected may be inconsistent. You can mark it consistent or unrecoverable. -
Q: Your service exposes per-request latency p99 spikes correlated with a particular
sync.Mutex. What likely diagnoses fit? A: Lock convoy, GC pause, long I/O while locked, or starvation mode kicking in. Look at lock-hold duration histograms.
Cheat Sheet¶
PRIMITIVE WHEN TO USE
sync.Mutex Default. Short, exclusive critical sections.
sync.RWMutex Read-heavy data, longer critical sections.
TryLock Opportunistic claim; skip work if busy.
TryLockTimeout Bounded wait; soft deadline for a section.
Reentrant lock Recursive call chains, public methods that
call other locked methods. Avoid if possible.
Fair lock Strict FIFO; bounded waiting required.
Unfair lock Throughput-first; default everywhere else.
LOCK ORDER Always acquire by canonical order (e.g. ID).
DEFER UNLOCK Pair Lock with defer Unlock immediately.
COPY-THEN-PROCESS Snapshot under lock; process outside.
SHARDED LOCKS Reduce contention by splitting keyspace.
NO I/O UNDER LOCK Network and disk inside a lock = incident.
NO CALLBACK UNDER LOCK User-supplied code may re-enter.
Summary¶
At the middle level, mutexes stop being a single concept and become a family of trade-offs. Reader-writer locks buy parallelism for read-heavy workloads but charge overhead. TryLock and timed locks let callers control how patient they are. Fairness lets you choose between predictable latency and raw throughput. Reentrant locks reduce friction for recursive call chains but hide structure. Spinlocks and adaptive mutexes exploit the difference between short and long waits. Lock granularity decides whether your program can scale or not.
The thing that ties all this together is discipline: short critical sections, clear lock ordering, careful naming, defer/RAII for release, and constant attention to what you do inside the lock. The juniors learn the syntax. The middles learn the trade-offs. The seniors learn when not to use a mutex at all — but we will get to that.
What You Can Build¶
- A read-heavy in-memory cache with
RWMutexand per-keysync.Onceto deduplicate loads. - A bank-style transfer system with deadlock-free pair locking.
- A worker pool that uses
TryLockto claim jobs and exponential backoff to avoid convoys. - A sharded hash map that splits a global map into 32 mutex-protected buckets.
- A connection pool that uses timed locks to bound checkout latency.
- A configuration hot-reload system using
atomic.Pointerfor copy-on-write reads. - A graph traversal library that uses fixed-order per-node locks.
Further Reading¶
- Russ Cox, "Go memory model" — the foundational reference for what Go's mutexes guarantee.
- Dmitry Vyukov, "Mutex design in Go 1.9" — explains the starvation mode.
- Maurice Herlihy and Nir Shavit, The Art of Multiprocessor Programming — chapters on spin locks and queue locks.
- Linux man pages:
pthread_mutex_init,futex,pthread_rwlock_init. - Java documentation:
ReentrantLock,ReentrantReadWriteLock,StampedLock. - C++ reference:
std::shared_mutex,std::lock_guard,std::scoped_lock. - Sutter / Alexandrescu, C++ Coding Standards, items on locking.
- Brendan Gregg's blog posts on lock profiling with perf and off-cpu analysis.
Related Topics¶
- Atomics and memory ordering
- Condition variables
- Semaphores
- Channels vs locks
- Lock-free data structures
- Race detection and tooling
- Memory model fundamentals
Diagrams & Visual Aids¶
Reader-writer lock state machine¶
+--------------------+
| |
v |
+------------------+ RUnlock |
| Shared (R..R) |------------+
+------------------+
| ^
writer | | all readers done
arrives v |
+------------------+
| Writer waiting |
+------------------+
|
v
+------------------+
| Exclusive |
+------------------+
|
v Unlock
+-------+
| Free |
+-------+
Lock-ordering eliminates deadlock¶
Without ordering:
T1: Lock(A) ----+
|---> Lock(B) blocks (T2 holds B)
T2: Lock(B) ----+
|---> Lock(A) blocks (T1 holds A)
=> DEADLOCK
With "lower id first":
T1: Lock(A) -> Lock(B)
T2: Lock(A) -> Lock(B) (waits for T1 to release A)
=> No cycle, eventually progresses.
Adaptive mutex timeline¶
time --->
| try CAS | spin spin spin spin | park (kernel) | woken | own lock |
^ ^ ^ ^
| | hopes holder releases | full syscall | resume
| uncontended path | cost only on real contention
Lock convoy¶
Holder: |---work---|release|
Waiter 1: wakes->lock->|tiny work|release|
Waiter 2: wakes->lock->|tiny work|release|
Waiter 3: wakes->lock->|...
Effective throughput = 1 / (wake + lock + work)
Coarse vs fine granularity throughput¶
Throughput
^
| * fine-grained (8 shards)
| *
| *
| * *
| * * coarse (single mutex saturates)
| *
+------------------------> cores