Race Detection — Middle Level¶
Table of Contents¶
- Introduction
- The Go Memory Model
- Happens-Before Edges
- Common Race Patterns
- Atomics
- Detector Internals
- CI Integration
- Reading Complex Race Reports
- False Negatives
- Idiomatic Race-Free Patterns
- Anti-Patterns
- Testing Strategy
- Tricky Cases
- Cheat Sheet
- Summary
Introduction¶
Junior level introduced data races, the -race flag, and the mutex+atomic fixes for the simplest patterns. Middle level deepens the foundation:
- The Go memory model (the formal rules).
- Happens-before edges (every synchronisation point).
- Atomic operations (when they suffice, when they do not).
- Detector internals (ThreadSanitizer).
- CI integration patterns.
By the end you should be able to explain why a particular fix works (not just that it does) and to integrate -race into a CI pipeline.
The Go Memory Model¶
The Go memory model is a contract between programmer and runtime. It says: a read of a variable in goroutine B is guaranteed to observe a write in goroutine A if and only if there is a happens-before edge from A's write to B's read.
Without an edge, the read may see:
- Nothing (the original zero value).
- The most recent write.
- A value from any previous write.
- A partial write (torn read) on platforms where the type is wider than a hardware load.
In practice, modern CPUs and the Go compiler may reorder reads and writes for performance. Synchronisation primitives publish "barriers" that prevent unsafe reordering across them.
A simple example showing reordering:
The output is not guaranteed to be one of {0,0}, {0,1}, {2,1}. It can also be {2, 0}: goroutine 2 saw b = 2 (the later write) but not a = 1 (the earlier one), because writes were reordered or the cache propagated them in the wrong order.
The memory model says: this is a data race. The result is undefined.
Happens-Before Edges¶
A non-exhaustive list of edges Go provides:
| Edge | Meaning |
|---|---|
| Channel send → matching receive | Send completes before receive returns the value. |
| Channel close → receive of zero value | Close completes before any receive that observes the close. |
| Receive on unbuffered channel → send completion | Send blocks until receive starts; both happen-before each other in different ways. |
| Mutex Unlock → next Lock | Unlock completes before the next Lock returns. |
| Once.Do(f) → return of any later Do | f's effects are visible after first Do returns. |
| WaitGroup.Done → matching Wait return | Done's effects visible after Wait returns. |
| Atomic store with Release → atomic load with Acquire | All prior writes visible to loads that see the stored value. |
| Goroutine creation → goroutine body starts | Code before go f() happens-before f runs. |
| End of init → main start | All package init runs before main. |
These are the only tools that make memory writes visible across goroutines. Anything else is a race.
A subtler one: goroutine end does NOT happen-before anything by default. To make goroutine A's writes visible to goroutine B, you need an explicit edge — typically wg.Wait or a channel receive.
Common Race Patterns¶
Captured loop variable (pre-Go 1.22)¶
i is one variable; main loop writes it; goroutines read it. Fix: i := i per iteration, or upgrade to Go 1.22+.
Shared map¶
Maps are not safe for concurrent use. Even with -race off, this can panic with "concurrent map writes". Use sync.RWMutex or sync.Map.
Double-checked locking, Go-style¶
type Cache struct {
mu sync.Mutex
m map[string]string
init bool
}
func (c *Cache) Get(k string) string {
if !c.init { // race: read without lock
c.mu.Lock()
if !c.init {
c.m = make(map[string]string)
c.init = true
}
c.mu.Unlock()
}
return c.m[k] // race: read map without lock
}
The first if !c.init is unsynchronised. It might read the new value of init while m is still nil. Fix: use sync.Once:
func (c *Cache) Get(k string) string {
c.once.Do(func() { c.m = make(map[string]string) })
c.mu.RLock()
defer c.mu.RUnlock()
return c.m[k]
}
Atomic with non-atomic read¶
var flag int32
go func() { atomic.StoreInt32(&flag, 1) }()
if flag == 1 { ... } // race: non-atomic read
Mixing atomic and non-atomic accesses on the same variable is a race. Use atomic.LoadInt32 for the read.
Partial update¶
type Stats struct {
Total int
OK int
}
go func() { s.Total++; s.OK++ }()
go func() { fmt.Println(s.Total, s.OK) }()
The reader can see Total = 5, OK = 4: the writer was halfway through. Lock the whole update.
Slice append¶
append is multi-step (read len, decide if realloc, write). Concurrent appends race. Lock or use a channel.
Atomics¶
sync/atomic provides:
Load*,Store*,Add*,Swap*,CompareAndSwap*forint32,int64,uint32,uint64,uintptr,unsafe.Pointer.- Typed wrappers (Go 1.19+):
atomic.Int32,atomic.Int64,atomic.Bool,atomic.Uint32, etc. atomic.Pointer[T](Go 1.19+) for typed pointer atomics.
Atomic operations:
- Are atomic: no torn reads/writes.
- Establish sequential consistency in the Go memory model: an
atomic.Storehappens-before anyatomic.Loadthat observes the stored value. - Are not a substitute for mutexes when the operation spans multiple variables.
Example: refreshable config
type Config struct{ /* ... */ }
var cfg atomic.Pointer[Config]
func Load() *Config { return cfg.Load() }
func Store(c *Config) { cfg.Store(c) }
Hot path readers do an atomic load; the writer publishes a new pointer. No mutex, no GC pause. This is not a substitute for sync.RWMutex if the config is modified in place — the trick relies on immutability of the pointed-to struct.
Detector Internals¶
The Go race detector is built on ThreadSanitizer (TSan), originally from Google. Sketch of how it works:
- Every memory access is instrumented at compile time. The compiler inserts a function call before each load/store recording the address, the type (read or write), and the goroutine id.
- TSan maintains a vector clock per goroutine. Each synchronisation event (channel send, mutex unlock, etc.) updates the clock.
- For each address, TSan keeps a small history of recent accesses and their vector clocks.
- When a new access arrives, TSan checks the history. If there is a previous access to the same address from a different goroutine, and the two clocks are not ordered (no happens-before edge), it reports a race.
Implications:
- Memory cost: vector clocks grow with goroutine count. The detector caps tracked goroutines at ~8128.
- CPU cost: every access is now a function call. Hence the 5-10x slowdown.
- Coverage: only memory accesses TSan instruments. Cgo memory and unsafe pointer arithmetic can slip through.
CI Integration¶
A standard CI pipeline has at least three test stages:
go vet ./...— static checks.go test ./...— fast tests.go test -race ./...— race-detection tests.
Some teams add:
go test -race -count=10 ./...— repeated for flaky races.go test -race -timeout=10m ./...— long-running stress tests.
A typical Makefile:
test:
go test ./...
test-race:
go test -race -count=1 ./...
test-race-stress:
go test -race -count=20 ./...
ci: test test-race
GitHub Actions snippet:
halt_on_error=1 makes the test stop the moment a race is found; exitcode=66 makes the process exit non-zero so CI fails.
Reading Complex Race Reports¶
A real-world race report often spans dozens of frames. The structure is always:
==================
WARNING: DATA RACE
{Read|Write} at 0x... by goroutine {N}:
<stack trace>
Previous {read|write} at 0x... by goroutine {M}:
<stack trace>
Goroutine {N} (running) created at:
<stack trace>
Goroutine {M} (finished) created at:
<stack trace>
==================
Reading order:
- Address — the same on both sides; identifies which variable.
- Goroutine N's stack — where the new access happened.
- Goroutine M's stack — where the previous access happened.
- Creation stacks — where each goroutine was launched.
Walk both stacks and find the lines that touch the variable. The fix is somewhere there: add a mutex, change to atomic, redesign the data flow.
False Negatives¶
The detector can miss races if:
- The two accesses never happen in the same run (timing).
- The shared variable is accessed only once per run.
- The accesses are inside cgo.
- The accesses are through
unsafearithmetic that bypasses the instrumentation. - The detector hits its goroutine cap.
So -race says "no race detected", not "no races exist". For high-stakes code, run -race -count=N with a stress harness that exercises every code path.
Idiomatic Race-Free Patterns¶
Pattern: don't share, communicate¶
Pass the value through a channel instead of mutating a shared variable. The channel send/receive provides the happens-before edge.
Pattern: per-goroutine state¶
Each goroutine has its own scratch space. Aggregate at the end via channels or a final mutex-protected merge.
Pattern: copy-on-write config¶
A pointer guarded by atomic.Pointer. Readers always see a complete, immutable snapshot. Writers prepare a new struct and atomically swap.
Pattern: sharded counter¶
N independent counters, each updated by a fixed shard of the workers, summed at read time. Reduces contention versus one global atomic.
Pattern: sync.Once for one-time initialisation¶
The canonical way to lazily initialise a shared resource.
Pattern: pass ctx, not shared cancel flags¶
Ctx provides happens-before on cancellation through the channel close.
Anti-Patterns¶
- Sleep-based "synchronisation":
time.Sleep(100*time.Millisecond)to "let the other goroutine catch up". Not an edge. - Volatile-style hacks: Go has no
volatile. Use atomics. - Mutex-then-non-mutex access: protecting writes but not reads (or vice versa). The detector flags this.
- Mutex per-statement: locking inside the loop body for every increment when one outer Lock would do.
- Premature
RWMutex: it has higher overhead thanMutex; only use when reads dominate by a large factor.
Testing Strategy¶
- Run all tests with
-racein CI. - Stress-test concurrency-heavy code with
-race -count=N(often N = 100). - Use
goleakto detect goroutine leaks alongside race detection. - Run benchmarks with and without
-raceto catch performance regressions caused by added synchronisation. - Have a flaky-test runbook: any test that fails intermittently is suspect; repeat under
-raceimmediately.
Tricky Cases¶
- 64-bit atomic on 32-bit platform.
int64atomics require 8-byte alignment. Useatomic.Int64(Go 1.19+) which guarantees alignment. - Race on a slice header vs underlying array. Two separate races: header (len, cap, data pointer) and array elements. Different mutexes might be needed.
- Race in
deferd closure. Defer captures variables; concurrent goroutine reading them is a race. - Race on closed-channel signal. Closing a channel from one goroutine while another sends panics, but it is also a race on the channel's internal state. The runtime fast-path catches this; do not rely on it.
Cheat Sheet¶
| Edge | Use |
|---|---|
| Channel | Pipeline communication |
| Mutex | Multi-field updates |
| Atomic | Single int/pointer |
| Once | Lazy init |
| WaitGroup | Wait-then-read |
| Ctx cancel | Broadcast stop |
# CI race stage
GORACE="halt_on_error=1 exitcode=66" go test -race -count=1 ./...
# Stress
go test -race -count=100 ./...
# Bench without race
go test -bench=. -run=^$ ./...
Summary¶
The middle level grounds race detection in the formal Go memory model. Every fix you write is establishing a happens-before edge — the only thing the language uses to synchronise memory. Atomics work for single-cell publication; mutexes work for multi-field updates; channels work for communication. The race detector is a vector-clock-based tool with overhead and limits, and it must be a CI gate. A passing go test -race ./... is the minimum bar for production Go.