Acquire / Release — Junior Level¶
Table of Contents¶
- Introduction
- Prerequisites
- Glossary
- Core Concepts
- Real-World Analogies
- Mental Models
- Pros & Cons
- Use Cases
- Code Examples
- Coding Patterns
- Clean Code
- Product Use / Feature
- Error Handling
- Security Considerations
- Performance Tips
- Best Practices
- Edge Cases & Pitfalls
- Common Mistakes
- Common Misconceptions
- Tricky Points
- Test
- Tricky Questions
- Cheat Sheet
- Self-Assessment Checklist
- Summary
- What You Can Build
- Further Reading
- Related Topics
- Diagrams & Visual Aids
Introduction¶
Focus: "What does it mean to publish a value to another goroutine, and why is plain assignment not enough?"
Imagine you have two goroutines. One of them builds a result — maybe it reads a configuration file, parses it, and creates a *Config struct. The other one wants to use that result. The first goroutine writes:
The second one reads:
This program looks innocent. Sometimes it works. On some machines, with some compilers, on some days it crashes with a nil-pointer dereference. Why? Because the second goroutine saw ready == true but cfg was still nil.
How is that possible? The first goroutine clearly set cfg before ready. The answer is that clearly depends on whose eyes you are looking through. The CPU running the first goroutine, the cache between that CPU and main memory, the compiler that produced the machine code, and the CPU running the second goroutine all have their own ideas about what "before" means. Without an explicit synchronization operation between the writes and the reads, none of them is obligated to preserve your intended ordering.
This is where acquire and release come in. They are the two halves of a handshake that makes the publish-and-consume pattern work safely:
- A release operation says: "Everything I wrote before this point is now visible to any goroutine that performs an acquire on the same location."
- An acquire operation says: "I just read a value that some other goroutine released. Whatever they did before the release is now visible to me, and it happens before anything I do next."
At the junior level you do not need to write hand-rolled acquire/release code. Go gives you tools that contain acquire and release for you:
sync.Mutex.Lockperforms an acquire;sync.Mutex.Unlockperforms a release.sync/atomic.LoadXxxperforms an acquire (and more — Go promises sequential consistency for atomics, which is strictly stronger).sync/atomic.StoreXxxperforms a release (and more).sync.Once.Doperforms both, around the initialization function.- Sending and receiving on a channel performs both at the endpoints.
After reading this file you will:
- Understand why the
cfg/readyexample above is broken, and how to fix it. - Know the difference between atomic (no torn reads) and ordered (no reordering across the operation).
- Be able to use
sync.Mutexandsync/atomicfor safe one-shot publication. - Have an intuition for what happens-before means and when Go promises it.
- Recognise the "publish a pointer through a flag" pattern and three buggy variations of it.
You do not need to know about C++ memory_order_acquire, hardware fences, or double-checked locking yet. Those come at middle, senior, and professional levels.
Prerequisites¶
- Required: Comfort with goroutines and the
gokeyword. You should know thatgo f()starts a new goroutine. - Required: Familiarity with
sync.WaitGroupor at least with the idea that the main goroutine can exit before others. - Required: Knowledge of pointers in Go. You should understand that
*Configis a pointer and that two goroutines can hold the same pointer. - Helpful: Some experience with a data race — having seen
go run -racecomplain at least once. - Helpful: Awareness that CPUs have caches and that writes to memory are not instantaneous.
If you have written a program that uses sync.Mutex to protect a shared counter, you are ready.
Glossary¶
| Term | Definition |
|---|---|
| Acquire | A read-side memory operation that prevents subsequent reads and writes in program order from being reordered before it. After an acquire on location X, you are guaranteed to see everything that any goroutine wrote before performing a release on X. |
| Release | A write-side memory operation that prevents preceding reads and writes in program order from being reordered after it. A release "publishes" all earlier writes to any goroutine that subsequently performs an acquire on the same location. |
| Memory order | The contract between the program and the runtime/CPU about which reorderings of memory operations are allowed. Common orderings: relaxed, acquire, release, acq_rel, seq_cst (sequential consistency). |
| Happens-before | A partial order on memory operations. If event A happens-before event B, then B is guaranteed to see A's writes. The Go memory model defines exactly which operations establish happens-before. |
| Publication | The act of making a value visible to other goroutines. Safe publication requires a release on the publishing side and an acquire on the consuming side. |
sync.Mutex | A mutual-exclusion lock. Lock is an acquire; Unlock is a release. The mutex guarantees that the critical section of one goroutine happens-before the critical section of the next goroutine to lock the mutex. |
sync/atomic | The standard library package providing atomic primitives: LoadInt32, StoreInt32, CompareAndSwap, etc. In Go, atomics provide sequentially consistent ordering — strictly stronger than acquire/release. |
| Atomic operation | A read, write, or read-modify-write that is indivisible — no other goroutine can see a "half-finished" version. Distinct from ordered: an atomic without ordering would still be indivisible but might be reordered with surrounding code. In Go you do not need to worry about this distinction because all sync/atomic operations are also ordered. |
| Torn read / write | A non-atomic read or write of a multi-word value that another goroutine catches mid-update, seeing half the old bytes and half the new bytes. Torn reads on aligned 64-bit values on a 64-bit CPU are common in C; Go's sync/atomic types prevent them. |
| Data race | Two goroutines accessing the same memory location concurrently where at least one access is a write, with no synchronization. The result is undefined behavior. go run -race detects most cases. |
sync.Once | A primitive guaranteeing that a function is executed exactly once, even under concurrent calls. The execution of that function happens-before any return from Do on any goroutine. |
| Sequential consistency | The strongest practical memory model. All operations appear to execute in some single global order consistent with each goroutine's program order. Go provides this for sync/atomic and for mutex operations. |
| Memory barrier (fence) | A CPU instruction that prevents the hardware from reordering memory operations across it. Acquire and release barriers are one-sided; a full fence is two-sided. |
Core Concepts¶
Publication needs two parts¶
You cannot make a write visible to another goroutine by setting a variable, no matter how careful you are about the order in your source code. To publish a value safely you need two things:
- A release on the producing side, after you finished building the value.
- An acquire on the consuming side, before you use the value.
Both must operate on the same synchronization location. Examples of matching pairs:
- Producer calls
mu.Unlock(); consumer callsmu.Lock()on the same mutex. - Producer calls
atomic.StoreInt32(&flag, 1); consumer callsatomic.LoadInt32(&flag)and observes1. - Producer sends on
ch; consumer receives from the samech. - Producer is inside
once.Do(f); consumer callsonce.Do(f)and returns.
If either half is missing or operates on a different location, the publication is broken.
What a release does¶
When you perform a release operation (let's say atomic.StoreInt32(&ready, 1)), the runtime and the hardware promise:
- Every read and write your goroutine performed before the release, in program order, will be visible to any goroutine that later performs an acquire on
&ready. - The release itself is atomic — no torn write.
- The release acts as a one-sided fence: writes after the release in program order may be reordered before it. (This rarely matters in practice because you usually do not write more after publishing.)
What an acquire does¶
When you perform an acquire operation (let's say if atomic.LoadInt32(&ready) == 1), the runtime and the hardware promise:
- Every read and write you perform after the acquire, in program order, will see the effects of writes the releasing goroutine made before its release.
- The acquire itself is atomic — no torn read.
- The acquire acts as a one-sided fence: reads and writes before it in program order may be reordered after it. (Again, rarely matters in practice.)
The handshake¶
The acquire–release contract requires both halves to operate on the same location and for the acquire to actually observe the value published by the release. If you load ready and it is still 0, you have not synchronized with anything — you simply learned that the producer has not run yet.
Producer goroutine Consumer goroutine
================== ==================
cfg = loadConfig() ─┐
│ any writes here are
│ guaranteed visible to a
│ consumer that observes
│ the released value
atomic.StoreInt32(&r,1) ───┘ release ┌── if atomic.LoadInt32(&r)==1
│ acquire (observed)
│
│ use(cfg) ◄── safe
│ the writes above
│ the release are
│ now visible
Atomic without ordering does not exist in Go¶
In C++ you can write atomic.store(1, std::memory_order_relaxed) — atomic but with no ordering guarantees. In Go you cannot. Every sync/atomic operation is sequentially consistent, which is strictly stronger than acquire/release. This makes Go's atomics easier to reason about but slightly more expensive on weakly ordered hardware. As a junior you should be glad: there is one less thing to get wrong.
The Go memory model in one sentence¶
A read of a variable is not guaranteed to observe a write to that variable unless the read is ordered after the write by a happens-before relation established by some synchronization primitive.
Synchronization primitives that establish happens-before:
- The
gostatement (caller happens-before the goroutine body's first action — except the body itself). - Channel send/receive on the same channel.
sync.Mutex.Lock/Unlock,sync.RWMutex.RLock/RUnlock, etc.sync.Once.Do.- All
sync/atomicoperations on the same memory location. runtime.SetFinalizer(rarely relevant).
If a write and a read are not ordered by any of these, the read might see the write, might see a stale value, or might see a torn intermediate — undefined behavior.
Real-World Analogies¶
The package on the porch¶
You order a package. The delivery driver places it on your porch and rings the doorbell.
- Placing the package = the writes you want to publish.
- Ringing the doorbell = the release operation.
- Hearing the doorbell = the acquire operation.
- Picking up the package = using the published values.
If the driver rings the doorbell before placing the package, you might open the door and find nothing. The bell must come after the package.
If you open the door without waiting for the bell (you just keep opening it randomly), you might catch the moment between placement and bell-ring and again find nothing. You must wait for the bell.
The book in the library¶
A librarian shelves a new book (cfg = loadConfig()), then writes its location on a public index card (atomic.Store(&ready, 1)).
A reader checks the index card (atomic.Load(&ready)). If they see the location written, they walk to that shelf and pick up the book. Because the librarian wrote the card after shelving the book, the reader is guaranteed to find the book where the card says.
If the librarian writes the card first and shelves later — or if a reader checks the shelf without reading the card — the protocol breaks.
The factory whistle¶
A factory produces a batch of goods, then blows a whistle. Trucks arrive only after hearing the whistle. They are guaranteed to find the batch loaded.
If a truck driver heard half a whistle (a torn read), they might pull up at an empty dock. That is why the whistle itself must be atomic, not just the load.
Mental Models¶
"Walls that mail moves over"¶
Imagine your writes as letters being sent to a public mailbox. Each write is a letter; the mailbox is main memory. The acquire and release act as walls in the producer's letter pile:
- Release: a wall pushed down in the pile. All letters above the wall are sealed and shipped. Anything below the wall is still in your hand.
- Acquire: a wall pushed up in the consumer's reading pile. All letters above the wall are read; the consumer cannot pre-read letters below it.
Without the walls, the runtime is free to ship letters in any order, batch them, or drop the order entirely.
The committee meeting¶
Each goroutine is a committee member writing notes. Without synchronization, every member's notes are private and may or may not reach the others, in any order, at any time.
A release is like raising your hand and saying "I commit this packet of notes to the public record." An acquire on the same record is like another member saying "I now read the latest packet." Anything in the packet is visible. Anything not packaged is still private to the original writer.
The newspaper edition¶
The producer writes articles all day (private). At deadline, they "release" the edition — print it. Anyone who reads the morning paper after print time (acquire) sees a consistent edition: every article in it is real, no half-written sentences.
But if a reader steals a draft off the editor's desk before printing (no acquire — a plain unsynchronized read), they see whatever happened to be on the page at that instant, including half-sentences.
Pros & Cons¶
Pros of using acquire/release semantics (via Go primitives)¶
- Correctness across all platforms. Once you use mutexes, channels, atomics, or
sync.Once, your publication works on x86, ARM, RISC-V, POWER — every architecture Go supports. You do not need to know which has weaker ordering. - Documentable contract. Other engineers reading your code recognise
atomic.Storeorsync.Onceand immediately understand "this is a publication point." - No torn reads or writes. All
sync/atomicoperations on aligned values are indivisible. - Composability. Mutex acquires/releases compose: a goroutine that locks two mutexes in sequence inherits the happens-before chain of both.
- Race detector catches violations.
go run -racewill scream at unsynchronized publication; the synchronized version is silent.
Cons / costs¶
- Overhead. A mutex lock/unlock pair is a few dozen nanoseconds even uncontended. An atomic load is a few nanoseconds. A channel send/receive is a few hundred nanoseconds. Most code does not care; very hot loops do.
- Hidden serialization. A mutex serializes all goroutines that touch it. If you only meant to publish once, a mutex is overkill —
atomic.Storeis cheaper. - Easy to use wrong. Storing the flag before the value, or reading the value before the flag, both look correct but are wrong. The race detector will not always catch this.
- API confusion. Beginners often think
atomic.Value.Storeonly stores atomically and forget it also publishes. Or they reach for raw atomics when a mutex would be clearer.
Use Cases¶
Use acquire/release-shaped publication when:
- You build a value once and need to share it with many readers.
- You set a one-shot flag (
done,closed,ready). - You publish a pointer (config, cache, lookup table) for many readers to use.
- You implement lazy initialization (
sync.Once). - You implement a sentinel/canary that signals "the work is finished."
Do not use raw atomics when:
- You need to coordinate more than a one-shot publication. A mutex is clearer.
- You need to atomically update a multi-field struct. Wrap with a mutex or use
atomic.Value/atomic.Pointer. - You want to wait for an event. Use a channel.
- You want fairness between goroutines. Atomics do not provide fairness.
Code Examples¶
Example 1: the broken publication¶
package main
import (
"fmt"
"sync"
)
type Config struct {
URL string
Port int
}
var cfg *Config
var ready bool // <-- plain bool, not atomic
func main() {
var wg sync.WaitGroup
wg.Add(2)
go func() {
defer wg.Done()
cfg = &Config{URL: "https://example.com", Port: 8080}
ready = true
}()
go func() {
defer wg.Done()
for !ready { // busy-wait
// spin
}
fmt.Println(cfg.URL) // possible nil dereference
}()
wg.Wait()
}
Why is this broken?
- Compiler reordering. The Go compiler may reorder
cfg = ...andready = truebecause, from the producing goroutine's perspective, the order does not affect its own behavior. - CPU reordering. Even if the compiler preserves order, the CPU may commit
ready = trueto memory beforecfg = ...reaches it. - Cache invalidation lag. The consumer's cache may see
ready=truebefore it sees the new value ofcfg. - Race detector.
go run -race main.gowill flag this as a data race even before it crashes.
Example 2: fixed with sync/atomic¶
package main
import (
"fmt"
"sync"
"sync/atomic"
)
type Config struct {
URL string
Port int
}
var cfg *Config
var ready int32 // 0 or 1
func main() {
var wg sync.WaitGroup
wg.Add(2)
go func() {
defer wg.Done()
cfg = &Config{URL: "https://example.com", Port: 8080}
atomic.StoreInt32(&ready, 1) // RELEASE
}()
go func() {
defer wg.Done()
for atomic.LoadInt32(&ready) == 0 { // ACQUIRE on each check
// spin — but at least it's a synchronized spin
}
fmt.Println(cfg.URL) // safe — cfg write happens-before this read
}()
wg.Wait()
}
The release on &ready ensures the write to cfg is visible to any goroutine that observes ready == 1. The race detector is happy. The program is correct on every architecture Go supports.
Example 3: cleaner with atomic.Pointer¶
package main
import (
"fmt"
"sync"
"sync/atomic"
)
type Config struct {
URL string
Port int
}
var cfg atomic.Pointer[Config]
func main() {
var wg sync.WaitGroup
wg.Add(2)
go func() {
defer wg.Done()
c := &Config{URL: "https://example.com", Port: 8080}
cfg.Store(c) // single release on the pointer itself
}()
go func() {
defer wg.Done()
var c *Config
for {
c = cfg.Load() // acquire
if c != nil {
break
}
}
fmt.Println(c.URL)
}()
wg.Wait()
}
This is the idiomatic Go pattern for one-shot publication. There is no separate flag — the pointer itself is the flag (nil = not ready, non-nil = ready). Both atomicity and ordering come from the atomic.Pointer type. (atomic.Pointer[T] was added in Go 1.19; before that you used unsafe.Pointer with atomic.LoadPointer/StorePointer, or atomic.Value.)
Example 4: using a mutex¶
package main
import (
"fmt"
"sync"
)
type Config struct {
URL string
Port int
}
var (
mu sync.Mutex
cfg *Config
)
func setConfig(c *Config) {
mu.Lock() // (not the publication itself; just protects the write)
defer mu.Unlock()
cfg = c
}
func getConfig() *Config {
mu.Lock()
defer mu.Unlock()
return cfg
}
func main() {
var wg sync.WaitGroup
wg.Add(2)
go func() {
defer wg.Done()
setConfig(&Config{URL: "https://example.com", Port: 8080})
}()
go func() {
defer wg.Done()
var c *Config
for c == nil {
c = getConfig()
}
fmt.Println(c.URL)
}()
wg.Wait()
}
This works because mu.Unlock() is a release and mu.Lock() is an acquire. The cost is higher than atomic.Pointer but the pattern generalizes to more complex critical sections.
Example 5: sync.Once does it for you¶
package main
import (
"fmt"
"sync"
)
type Config struct {
URL string
Port int
}
var (
once sync.Once
cfg *Config
)
func loadConfig() {
cfg = &Config{URL: "https://example.com", Port: 8080}
}
func GetConfig() *Config {
once.Do(loadConfig)
return cfg
}
func main() {
var wg sync.WaitGroup
for i := 0; i < 4; i++ {
wg.Add(1)
go func() {
defer wg.Done()
fmt.Println(GetConfig().URL)
}()
}
wg.Wait()
}
once.Do guarantees loadConfig runs exactly once and that its writes happen-before the return of Do in every other goroutine. This is the canonical lazy-init pattern.
Example 6: channel as the synchronization¶
package main
import "fmt"
type Config struct {
URL string
Port int
}
func main() {
ch := make(chan *Config)
go func() {
c := &Config{URL: "https://example.com", Port: 8080}
ch <- c // release
}()
c := <-ch // acquire
fmt.Println(c.URL)
}
A channel send/receive establishes happens-before between the writes preceding the send and the reads following the receive. The pointer travels through the channel; the writes that built *Config are visible to the consumer.
Example 7: lazy double-checked load (junior version)¶
We will look at correct double-checked locking in detail at the senior level. For now, here is the safe-but-not-cheap pattern using sync.Once:
package main
import "sync"
type Conn struct{ /* ... */ }
type Pool struct {
once sync.Once
conn *Conn
}
func (p *Pool) Get() *Conn {
p.once.Do(func() {
p.conn = openConn()
})
return p.conn
}
func openConn() *Conn { return &Conn{} }
You will see code in older projects that tries to do this with a manual if conn == nil { mu.Lock(); ... } check. Do not write that yourself yet — sync.Once is correct, cheap, and well-understood.
Coding Patterns¶
Pattern: one-shot publication with atomic.Pointer¶
var current atomic.Pointer[State]
func publish(s *State) { current.Store(s) }
func read() *State { return current.Load() }
Use when: many readers, occasional writer (or a single one-shot writer), no need to wait.
Pattern: ready-flag handshake¶
var (
value any
ready atomic.Int32
)
func produce(v any) {
value = v
ready.Store(1)
}
func consume() any {
for ready.Load() == 0 {
runtime.Gosched()
}
return value
}
The producer writes value first, then sets ready. The consumer spins on ready, then reads value. The order is essential — and the atomic on ready is essential to make it safe.
Pattern: lazy init with sync.Once¶
var (
once sync.Once
svc *Service
)
func Service() *Service {
once.Do(func() { svc = newService() })
return svc
}
Use when: initialization is expensive and only one goroutine should run it, but many may need the result.
Pattern: closed-channel as a broadcast release¶
var (
doneCh = make(chan struct{})
result *Result
)
func produce() {
result = compute()
close(doneCh) // release to all readers
}
func consume() *Result {
<-doneCh // acquire
return result
}
close(ch) is a release that every receive from ch synchronizes with. This is the cleanest one-to-many publication idiom in Go.
Clean Code¶
- Hide synchronization behind a function. Do not expose
atomic.Int32as a public package variable; wrap it in a typed accessor. - Name flags with verbs or states.
ready,closed,loaded, notflag,b,x. - Comment every atomic operation with what it publishes/consumes.
// release: makes cfg visible to readers. - Prefer
atomic.Pointer[T]overatomic.LoadPointer(unsafe.Pointer(&x)). The generic version was added in Go 1.19 and is type-safe. - One synchronization per fact. If you publish through both a mutex and an atomic flag, you doubled the cost and the bug surface.
// BAD: redundant
mu.Lock()
cfg = c
ready.Store(1)
mu.Unlock()
// GOOD: pick one
cfg = c
ready.Store(1) // alone is enough if other writes only happen here
// BETTER: collapse to one atomic
cfgPtr.Store(c)
Product Use / Feature¶
Lazy-loaded configuration¶
A web service reads /etc/myservice/config.yaml the first time someone requests /healthz. Subsequent requests reuse the cached parse. sync.Once is the entire implementation.
One-shot feature flag¶
A flag that becomes true once the database has been migrated. Every request reads atomic.LoadInt32(&migrated) before deciding which code path to take. The migration goroutine calls atomic.StoreInt32(&migrated, 1) when done.
Shared lookup table¶
A read-mostly map (currency conversion rates) updated once per hour. The updater builds a new map[string]Rate, then rates.Store(&newMap). Every request reads rates.Load(). No mutex, no locking, full happens-before via atomic.Pointer.
Error Handling¶
- Do not return half-built values. If
loadConfigfails, theatomic.Pointershould remain nil (or the previous good value). NeverStorea partially populated*Config. - Handle initialization errors in
sync.Once.Dowill not re-run a failed initializer. You either store an error alongside the value or use the newersync.OnceValue/sync.OnceFunchelpers (Go 1.21+). - Treat torn reads as impossible if you used
sync/atomic. The package guarantees indivisibility on aligned values. (On 32-bit ARM, plain 64-bit ops can tear;sync/atomicdoes not.)
var once sync.Once
var (
cfg *Config
err error
)
func Load() (*Config, error) {
once.Do(func() {
cfg, err = loadFromDisk()
})
return cfg, err
}
Both cfg and err are published by the same Do call. If loadFromDisk failed, every caller of Load sees the same err.
Security Considerations¶
- Race conditions are security bugs. A torn write to an authorization flag can leave a partially-updated state where the system "accidentally" grants access.
sync/atomicprevents this for single-word fields. - Don't bypass synchronization to "optimize." Hand-rolled lock-free code is a security risk if you don't understand the memory model. Use
syncandsync/atomicuntil you've proven the cost matters. - Publish read-only. When you
atomic.Pointer.Store(&immutableValue), every reader must treat*immutableValueas immutable. Mutating a published value is a data race even if the pointer itself is atomic.
Performance Tips¶
- An uncontended
atomic.LoadInt32is around 1–2 ns. An uncontendedsync.Mutex.Lock/Unlockpair is around 10–25 ns. A channel send/receive on an unbuffered channel is 100–300 ns. - For the read-mostly case (publish rarely, read often),
atomic.Pointer[T].Loadis the cheapest correct choice. - Do not "optimize away" a
sync.Mutexto atomics unless you can prove the result is still correct. The race detector helps but does not catch every reordering. runtime.Gosched()in a spin loop reduces contention on the CPU. In a short one-shot publication, a tight spin is fine; in a long wait, prefer a channel orsync.Cond.
Best Practices¶
- Treat publication and consumption as a pair of operations on the same location.
- Prefer high-level primitives (
channel,sync.Once,sync.Mutex,atomic.Pointer[T]) over hand-rolled flag-and-value pairs. - Run
go test -race ./...in CI. The race detector finds the vast majority of publication bugs at junior level. - Document the publication contract in package docs: "Set once at startup, then safe to read from any goroutine."
- Never share the same
*Tfor both reading and writing once published. If you need to mutate, allocate a new*Tand republish. - Avoid
atomic.Valuefor new code ifatomic.Pointer[T]works — it's type-safe and harder to misuse. - Do not mix
atomicoperations with plain reads or writes on the same location.
Edge Cases & Pitfalls¶
- Reading and writing the same field both atomically and non-atomically is a race. All accesses must go through
sync/atomicor none of them do. atomic.Value.Storepanics if you change the dynamic type. You must store the same concrete type on every call.atomic.Pointer[T]is statically typed and avoids this.- 64-bit atomics on 32-bit ARM require the variable to be 8-byte aligned. As of Go 1.19, the
atomic.Int64/atomic.Uint64/atomic.Pointer[T]struct types guarantee alignment; rawint64fields in structs do not. - Captured loop variables in a goroutine plus a flag write are a classic combo bug. Each iteration's goroutine may see a different snapshot.
sync.Oncedoes not re-run on error. Plan for this.for !ready { }(withreadyas a plain bool) is the textbook broken publication. Always makereadyatomic.
Common Mistakes¶
Mistake 1: storing the flag before the value¶
The flag must be stored after the value. The release on ready publishes everything written before it — not after.
Mistake 2: reading the value before the flag¶
You read value before performing the acquire on ready. Even if the flag is set, the value you read may have been the stale pre-publication snapshot.
Mistake 3: half-atomic field¶
var ready int32
go func() {
ready = 1 // plain write — race detector flags this
}()
go func() {
if atomic.LoadInt32(&ready) == 1 { /* ... */ }
}()
If you load atomically, you must store atomically on every writer.
Mistake 4: using atomic for multi-word state¶
// Two atomics do NOT add up to one atomic struct
atomic.StoreInt32(&x, 1)
atomic.StoreInt32(&y, 2)
// A reader can see x=1, y=0
If x and y must move together, use a mutex or publish a *State pointer.
Mistake 5: closing the door before sending the package¶
The close is the release. Any writes after the close are not part of the publication.
Common Misconceptions¶
- "
atomic.Loadis just like a normal read but safer." It is much more than that. It is also an acquire fence — it orders all subsequent reads and writes after it. - "If the value is a single word (an
intor a*T), I don't need atomics." False. Without atomics there is no happens-before relation, so the read may return a stale value indefinitely, even though no torn read occurs. - "x86 has strong memory ordering, so I can skip atomics." Even on x86, the compiler is free to reorder. Atomics are also a compiler barrier.
- "
sync.Mutexis slow." Uncontended, it's tens of nanoseconds. Contention is what makes it slow; the lock primitive itself is cheap. - "
sync.Onceis for thread-safe singletons." It is — but it is also a clean publication primitive for any "compute once, share many" pattern.
Tricky Points¶
- A read of a
*Tthat isnilafter another goroutine just stored a non-nil value is still allowed if you read non-atomically. The Go memory model does not promise eventual visibility without synchronization. atomic.AddInt32is both a load and a store; it counts as both an acquire and a release.atomic.CompareAndSwapis both on success; on failure it's still both, but no actual update happened.atomic.Pointer[T].CompareAndSwapcompares pointer identity, not value equality. Two pointers to equal structs are not equal pointers.
Test¶
package publish_test
import (
"sync"
"sync/atomic"
"testing"
)
func TestPublishPointer(t *testing.T) {
var p atomic.Pointer[int]
var wg sync.WaitGroup
wg.Add(2)
go func() {
defer wg.Done()
x := 42
p.Store(&x)
}()
go func() {
defer wg.Done()
for {
if v := p.Load(); v != nil {
if *v != 42 {
t.Errorf("got %d, want 42", *v)
}
return
}
}
}()
wg.Wait()
}
Run with go test -race. The synchronized version passes; remove the atomic and you get a data-race report.
Tricky Questions¶
-
Q: Why is
for !ready { }(withreadyas a plainbool) wrong even on x86 where memory ordering is strong? A: Because the compiler may hoist the read out of the loop. The loop becomest := ready; for !t { }, which spins forever iftwas false at entry. Atomics are also a compiler barrier. -
Q: If I store a pointer with
atomic.Pointer.Store, can a reader still see a partially-initialized struct? A: No, provided you fully built the struct before callingStore. The release onStorepublishes every write made before it in program order — including the writes to fields of*T. -
Q: What happens if I
atomic.Pointer.Storea pointer and a reader is in the middle of reading its fields when IStorea new pointer? A: Nothing bad — the reader holds its own copy of the old pointer, and the struct it points to is still alive (Go has a GC). The new pointer is independent. -
Q: Does
sync.Mutex.Locksynchronize with a previousUnlockby a different goroutine on the same mutex? A: Yes — exactly that. Unlock is the release, Lock is the acquire on the same memory. -
Q: Is
runtime.GOMAXPROCS(1)enough to make my unsynchronized publication safe? A: No. Even with one OS thread, the compiler may still reorder. And the runtime may preempt your goroutine mid-write.
Cheat Sheet¶
PUBLISHING (one writer, many readers)
=====================================
Idiomatic Go (Go 1.19+):
var v atomic.Pointer[T]
v.Store(built) // release
x := v.Load() // acquire
Lazy init:
var once sync.Once
once.Do(initFn)
Through a channel:
done := make(chan struct{})
go func() { build(); close(done) }()
<-done // safe to read what build() wrote
Through a mutex:
mu.Lock()
field = v
mu.Unlock()
// reader does mu.Lock(); v := field; mu.Unlock()
COMMON BUGS
===========
Plain bool flag: race
Plain pointer: race
Atomic + non-atomic on same field: race
Store flag before value: visible-too-early
Load value before flag: see-old-value
WHEN TO USE WHAT
================
One-shot publication: atomic.Pointer[T] or sync.Once
Read-mostly state: atomic.Pointer[T]
Multi-field consistent: sync.Mutex
Wait for event: channel
Lazy init with error: sync.OnceValues (Go 1.21+)
Self-Assessment Checklist¶
- I can explain why
for !ready { }with a plainboolis incorrect. - I can use
atomic.StoreInt32andatomic.LoadInt32to publish a flag. - I can use
atomic.Pointer[T]to publish a struct. - I know that a mutex provides both acquire (Lock) and release (Unlock).
- I can name three Go primitives that provide acquire/release semantics.
- I know what
sync.Once.Doguarantees and when not to use it. - I know the race detector catches missing synchronization.
- I know that
sync/atomicin Go provides sequential consistency, not just acquire/release.
Summary¶
Publishing a value from one goroutine to another requires two cooperating operations: a release on the producer side and an acquire on the consumer side, on the same synchronization location. Without both, the receiving goroutine may see a stale value, a partially-built struct, or a torn intermediate.
Go provides this contract through several primitives:
sync.Mutex—Lockis acquire,Unlockis release.sync/atomic— every operation is sequentially consistent, which subsumes acquire/release.sync.Once—Doprovides both around the initialization function.- Channels — send and receive on the same channel synchronize.
As a junior, your job is to recognise these primitives and reach for them by default. Do not write hand-rolled flag-and-value pairs without sync/atomic. Do not assume "single-word read is always safe." Run go test -race.
The next level (middle) builds on this with publication patterns for real services: read-mostly configuration, lazy initializers, and the read-the-flag-then-the-value handshake.
What You Can Build¶
- A thread-safe lazy-loaded configuration cache.
- A one-shot feature flag toggled at startup and read by every request.
- A read-mostly currency-rate table refreshed every hour.
- A "service ready" sentinel used by health checks.
- A safe-publication wrapper around
*sql.DBfor a connection pool that opens lazily.
Further Reading¶
- The Go memory model: https://go.dev/ref/mem
syncpackage docs: https://pkg.go.dev/syncsync/atomicpackage docs: https://pkg.go.dev/sync/atomic- Russ Cox, "Hardware Memory Models": https://research.swtch.com/hwmm
- Russ Cox, "Programming Language Memory Models": https://research.swtch.com/plmm
Related Topics¶
- Mutex and RWMutex (lock primitives that provide acquire/release).
- Channels (send/receive establish happens-before).
sync.Once(publication primitive for lazy init).sync/atomic(the lowest-level publication primitive).- The Go race detector (
-raceflag). - Memory ordering and barriers (parent topic; see senior and professional levels for deeper coverage).
Extended Examples and Walkthroughs¶
Walkthrough: from broken to fixed, step by step¶
Let's take the original buggy program and refactor it five times, each time stronger than the last. The exercise is not to find "the" right answer — it's to feel the design space.
Version 0: completely broken.
var cfg *Config
var ready bool
go func() {
cfg = build()
ready = true
}()
go func() {
for !ready { }
use(cfg)
}()
Both fields are plain. The compiler may hoist ready out of the loop, turning the consumer into an infinite spin. Even if it doesn't hoist, the CPU may publish ready=true before cfg=..., leaving the consumer with cfg=nil. The race detector flags this immediately.
Version 1: atomic flag, plain pointer.
var cfg *Config
var ready int32
go func() {
cfg = build()
atomic.StoreInt32(&ready, 1)
}()
go func() {
for atomic.LoadInt32(&ready) == 0 { }
use(cfg)
}()
This is correct. The atomic store on ready is a release; it publishes the write to cfg. The atomic load is an acquire; the subsequent use(cfg) sees the new pointer. The race detector is happy.
But there are two stylistic problems:
- Two separate variables — easy for a future maintainer to write to
cfgwithout updatingready, or vice versa. - A spin loop with no
runtime.Gosched()will burn a CPU core. On a single-core machine it can starve the producer.
Version 2: atomic pointer.
var cfg atomic.Pointer[Config]
go func() {
cfg.Store(build())
}()
go func() {
var c *Config
for c = cfg.Load(); c == nil; c = cfg.Load() {
runtime.Gosched()
}
use(c)
}()
One variable. The nil-ness of the pointer is the flag. We added a Gosched so other goroutines (including the producer) get CPU time. This is the idiomatic Go pattern for "build once, read many."
Version 3: signal with a closed channel.
var cfg *Config
ready := make(chan struct{})
go func() {
cfg = build()
close(ready)
}()
go func() {
<-ready
use(cfg)
}()
No busy-wait. The consumer blocks on <-ready until close happens. close is a release; the receive is an acquire. As a bonus, this scales naturally to many consumers — every receive on a closed channel returns immediately.
Version 4: lazy init with sync.Once.
var (
once sync.Once
cfg *Config
)
func Get() *Config {
once.Do(func() { cfg = build() })
return cfg
}
Consumers do not need to know that the value is computed on demand. There is no producer goroutine — the first caller to Get is also the producer. sync.Once handles both the deduplication and the publication.
Version 5: sync.OnceValue (Go 1.21+).
A single declaration. Get() returns the value built by build, exactly once. Type-safe, panic-free, no global variable to forget about.
The progression illustrates a key Go principle: prefer the highest-level primitive that captures your intent. You drop to atomic.Pointer when you need read performance; you drop to atomic.Int32 when you need a single flag; you stop dropping further.
Walkthrough: why "atomic" without "ordering" is not enough¶
This is a thought experiment that does not apply to Go directly (because Go's atomics are seq-cst), but it builds intuition.
Suppose we had atomic_relaxed operations that were indivisible but had no ordering guarantees:
A consumer doing atomic_relaxed.LoadInt32(&ready) would never see a torn write — it always sees 0 or 1. But it might see ready=1 before it sees the new cfg. The relaxed atomic gives you indivisibility but not visibility ordering.
That is why acquire/release exist as a separate concept. Indivisibility (no torn reads) and ordering (no reordering across) are two different guarantees. Go's sync/atomic gives you both, always.
Walkthrough: the race detector and what it catches¶
Run go run -race main.go. You will see something like:
WARNING: DATA RACE
Write at 0x... by goroutine 7:
main.main.func1()
/tmp/r.go:6 +0x...
Previous read at 0x... by goroutine 8:
main.main.func2()
/tmp/r.go:7 +0x...
The race detector instruments every memory access and tracks the happens-before relation. A race is two accesses (one of them a write) with no happens-before edge between them.
Fix with an atomic:
var x int32
func main() {
go func() { atomic.StoreInt32(&x, 1) }()
go func() { _ = atomic.LoadInt32(&x) }()
time.Sleep(time.Second)
}
Now there is a release-acquire pair (if the consumer happens to observe the write) or two unrelated accesses (if not). Either way, no race.
The race detector is your best friend at the junior level. Run all tests with -race. Many teams enable it in CI as a separate job.
Walkthrough: publishing a slice header¶
A []T in Go is a three-word struct (pointer, length, capacity). Storing a slice directly is not atomic — three separate words may be observed independently.
// WRONG
var data []int
go func() {
data = []int{1, 2, 3}
}()
go func() {
for len(data) == 0 { } // race
fmt.Println(data[0]) // may read garbage
}()
Two ways to publish a slice:
// Right way 1: atomic pointer to the slice header.
type slice struct{ data []int }
var s atomic.Pointer[slice]
go func() {
s.Store(&slice{data: []int{1, 2, 3}})
}()
go func() {
for s.Load() == nil { }
fmt.Println(s.Load().data[0])
}()
// Right way 2: through a channel.
ch := make(chan []int, 1)
go func() { ch <- []int{1, 2, 3} }()
go func() { fmt.Println((<-ch)[0]) }()
Either way, the header is published atomically, and the backing array is reachable through the published pointer.
Walkthrough: what go run -race actually detects¶
-race is implemented via "happens-before vector clocks." For every goroutine, the runtime maintains a logical clock. Every synchronization operation updates one or more clocks. Every memory access is tagged with the current clock of its goroutine.
When a memory location is accessed, the detector checks whether the access happens-before all previous accesses to that location. If yes, no race. If two concurrent accesses (no happens-before) exist and at least one is a write, a race is reported.
The race detector is complete but not sound: every race it reports is a real race, but it may miss races that happen to not occur in this run. Run your test suite many times if you suspect timing-dependent bugs.
Extended Vocabulary¶
| Term | Definition |
|---|---|
| Acquire fence | A barrier that prevents subsequent memory operations from being reordered before it. |
| Release fence | A barrier that prevents preceding memory operations from being reordered after it. |
| Acq-rel | An operation that is both an acquire and a release. sync.Mutex.Lock is not acq-rel — it's just an acquire; the matching Unlock is the release. But a successful CompareAndSwap is acq-rel. |
| Full fence | A barrier preventing reordering in both directions. Sequential consistency requires full fences. |
| Read-modify-write (RMW) | An atomic that reads, computes a new value, and writes — like CompareAndSwap, AddInt32, or SwapInt32. RMWs are typically acq-rel. |
| Linearizable | A history of operations is linearizable if each operation appears to take effect instantaneously at some point between its invocation and its return. Sequential consistency implies linearizability for single-object operations. |
| Causal consistency | A weaker model: if A causes B, all observers see A before B. Acquire/release implements per-location causal consistency. |
| Store buffer | A CPU structure that holds pending writes before they reach the cache. The producer of a store-buffer-induced reordering is a major source of weakly-ordered behavior. |
| MESI / MOESI | Cache coherence protocols. They keep caches coherent (every cache eventually sees the same value) but not consistent (the order in which writes appear may differ). |
More Code Examples¶
A real publication helper¶
package config
import "sync/atomic"
type Snapshot struct {
Hosts []string
Token string
}
var current atomic.Pointer[Snapshot]
// Set replaces the current snapshot. Concurrent-safe.
func Set(s *Snapshot) {
current.Store(s)
}
// Get returns the latest snapshot or nil if none yet.
func Get() *Snapshot {
return current.Load()
}
// In main:
config.Set(&config.Snapshot{Hosts: []string{"a", "b"}, Token: "..."})
// In a handler:
s := config.Get()
if s == nil {
http.Error(w, "not ready", 503)
return
}
// use s.Hosts, s.Token
Notice: the snapshot is read-only once published. Updates allocate a new *Snapshot and call Set again. The previous *Snapshot remains alive as long as any reader holds it (Go's GC handles this).
A real one-shot signal¶
package server
import (
"context"
"sync"
)
type Server struct {
ready chan struct{}
once sync.Once
}
func New() *Server {
return &Server{ready: make(chan struct{})}
}
// Ready signals that the server has finished starting up.
func (s *Server) Ready() {
s.once.Do(func() { close(s.ready) })
}
// Wait blocks until the server is ready or ctx is canceled.
func (s *Server) Wait(ctx context.Context) error {
select {
case <-s.ready:
return nil
case <-ctx.Done():
return ctx.Err()
}
}
A common pattern: signal readiness by closing a channel. Multiple goroutines can wait. Once closed, future waiters return instantly. sync.Once guards against double-close.
A real lazy initializer¶
package db
import (
"database/sql"
"sync"
)
var (
once sync.Once
db *sql.DB
err error
)
func DB() (*sql.DB, error) {
once.Do(func() {
db, err = sql.Open("postgres", os.Getenv("DSN"))
if err == nil {
err = db.Ping()
}
})
return db, err
}
The first caller pays for the connection setup. Every subsequent caller gets the same *sql.DB (and the same error, if any). The publication of both db and err is atomic from the consumer's point of view.
Extended Pitfalls¶
Pitfall: atomic stores on different variables¶
var a, b atomic.Int32
go func() {
a.Store(1)
b.Store(2)
}()
go func() {
if b.Load() == 2 {
// can we be sure a.Load() == 1?
}
}()
In Go, yes, because all atomics are sequentially consistent: every goroutine observes the same total order of atomic operations. If you saw b=2, you must also see a=1 (or any later value of a). In C++ with memory_order_release on a.store and memory_order_acquire on b.load, you would not be guaranteed — release/acquire pair only on the same location.
This is one of the places Go is stronger than C++.
Pitfall: relying on a torn read "always working"¶
type Pair struct{ A, B int }
var p Pair
go func() { p = Pair{1, 2} }()
go func() { x := p; fmt.Println(x) }()
This is a data race, and the result is undefined. On x86 you may see consistent pairs most of the time and conclude "good enough." Don't. The Go memory model is not "what the x86 happens to do on Tuesday."
Use atomic.Pointer[Pair] or a mutex.
Pitfall: closure captures and stale state¶
In Go 1.22+, the loop variable is freshly bound per iteration. In older versions, every goroutine sees the same i, possibly already incremented past the intended value. This is a different kind of memory bug — not about ordering, but about sharing. Fix with explicit shadowing:
Extra Tests¶
package publish_test
import (
"context"
"sync"
"sync/atomic"
"testing"
"time"
)
// Test that the consumer always sees the producer's writes.
func TestPublishStruct(t *testing.T) {
type Snapshot struct{ X, Y int }
var s atomic.Pointer[Snapshot]
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
s.Store(&Snapshot{X: 1, Y: 2})
}()
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
defer cancel()
for {
if snap := s.Load(); snap != nil {
if snap.X != 1 || snap.Y != 2 {
t.Errorf("bad snapshot %+v", snap)
}
wg.Wait()
return
}
select {
case <-ctx.Done():
t.Fatal("never observed publication")
default:
}
}
}
// Test that sync.Once publishes its result.
func TestOncePublishes(t *testing.T) {
var once sync.Once
var x int
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go func() {
defer wg.Done()
once.Do(func() { x = 42 })
if x != 42 {
t.Errorf("got %d, want 42", x)
}
}()
}
wg.Wait()
}
Run with go test -race. The first test confirms ordering; the second confirms sync.Once publishes its writes to all callers.
Diagrams & Visual Aids¶
Producer goroutine Consumer goroutine
================== ==================
cfg.URL = "..." for {
cfg.Port = 8080 p = atomic.Load(&ptr) <-- ACQUIRE
if p != nil { break }
}
fmt.Println(p.URL)
fmt.Println(p.Port)
atomic.Store(&ptr, &cfg) <-- RELEASE ^
|
writes above the release -----+-- become visible after the acquire
observes the released ptr
The Three Layers of Reordering
==============================
Source code order Compiler-emitted order CPU execution order
----------------- ---------------------- -------------------
cfg = build() mov [cfg], rax (may issue mov [ready]
ready = 1 mov [ready], 1 to store buffer first,
draining cfg later)
A release barrier prevents reordering in ALL THREE LAYERS:
- source: programmer must write the release after the writes (your job)
- compiler: atomic.Store is a compiler barrier (Go's job)
- CPU: atomic.Store emits the right fence for the architecture (runtime's job)
Happens-Before Graph for sync.Once
===================================
Goroutine A Goroutine B Goroutine C
----------- ----------- -----------
once.Do(init) ───────────┐
(winner runs init) │
│ │
│ writes to cfg │ "happens-before"
│ │ the return of any
▼ │ other once.Do call
return from Do │
▼
once.Do(init) ────────────┐
(loser, blocks until │
winner finishes) │
│ ▼
▼ once.Do(init)
read cfg (also blocks,
(safe) then sees cfg)
What the Race Detector Tracks
==============================
Goroutine A Goroutine B
----------- -----------
vc.A = 1 vc.A = 0
vc.B = 0 vc.B = 1
x = 5 <-- write tagged with vc.A=1, vc.B=0
(no sync edge between A and B) y = x <-- read tagged with vc.A=0, vc.B=1
^ ^
+-- the read's vector clock does NOT >= write's vector clock,
so the detector flags a race.
After a release-acquire pair on the same atomic, B's vector clock
absorbs A's clock, and subsequent reads are correctly ordered.
Extra Coding Patterns¶
Pattern: read-mostly snapshot with periodic refresh¶
package rates
import (
"sync/atomic"
"time"
)
type Rates struct {
USD, EUR, GBP float64
}
var current atomic.Pointer[Rates]
func init() {
go refreshLoop()
}
func refreshLoop() {
for {
r := fetchFromAPI()
current.Store(r)
time.Sleep(time.Hour)
}
}
func Get() *Rates { return current.Load() }
func fetchFromAPI() *Rates { /* ... */ return &Rates{} }
Every consumer calls rates.Get() and uses the returned pointer. The refresh goroutine builds a new struct and stores its pointer. Old snapshots are garbage-collected when no reader holds them. There is no lock; reads are wait-free.
Pattern: lazy singleton wrapper¶
package logger
import "sync"
type Logger struct{ /* ... */ }
var (
instance *Logger
once sync.Once
)
func Default() *Logger {
once.Do(func() { instance = newLogger() })
return instance
}
The first call to Default() runs newLogger. Every other call (including concurrent ones) returns the same *Logger. No locking on the fast path after the first call (modulo a fast atomic check inside sync.Once).
Pattern: one-time error capture¶
package pipeline
import "sync"
type Pipeline struct {
once sync.Once
err error
}
// Fail records the first error and discards subsequent ones.
func (p *Pipeline) Fail(e error) {
p.once.Do(func() { p.err = e })
}
// Err returns the first error, or nil.
func (p *Pipeline) Err() error {
return p.err // safe to read non-atomically? No!
}
Wait — p.err is read without an atomic, and p.once.Do happened on another goroutine. Is this safe?
Yes — but only because the documented contract is "read Err() only after the pipeline has finished, on the same goroutine that called Wait/Close." If you allow concurrent Fail and Err, you need an atomic pointer or a mutex.
A safer version:
type Pipeline struct {
err atomic.Pointer[error]
}
func (p *Pipeline) Fail(e error) {
p.err.CompareAndSwap(nil, &e) // only the first one wins
}
func (p *Pipeline) Err() error {
if e := p.err.Load(); e != nil {
return *e
}
return nil
}
Now Fail and Err can race freely. CAS publishes the error pointer; readers acquire it.
Pattern: build-then-publish for slice/map updates¶
type LookupTable struct {
data atomic.Pointer[map[string]int]
}
func (t *LookupTable) Replace(items map[string]int) {
// Make an independent copy so future Replace calls don't mutate
// a map a reader might be iterating.
cp := make(map[string]int, len(items))
for k, v := range items {
cp[k] = v
}
t.data.Store(&cp)
}
func (t *LookupTable) Lookup(k string) (int, bool) {
m := t.data.Load()
if m == nil {
return 0, false
}
v, ok := (*m)[k]
return v, ok
}
Readers acquire a pointer to the current map and read it freely — even iterating. The next Replace publishes a new map; old readers continue with the old one. No locking, no contention.
This is the read-copy-update (RCU) idiom, simplified.
More Common Misconceptions¶
- "
go run -raceis too slow for production-like loads, so I'll skip it." Use-racein unit tests and CI. You don't need it on production traffic, but you do need it on every PR. - "
atomic.Valueis faster thansync.Mutex." Sometimes, sometimes not.atomic.Valueis wait-free for reads but the Store path uses a mutex internally and checks type compatibility on every call. - "If I print a value before a
sync.Mutex.Lock, the print is part of the critical section." No — only operations insideLock/Unlockare protected. Prints before the lock can race with concurrent writers. - "Memory ordering is about cache coherence." Coherence keeps caches eventually-consistent. Ordering is about whether you observe writes in the right sequence. Coherence is a necessary but insufficient guarantee.
More Tricky Points¶
Spurious failures of CompareAndSwap¶
On weakly-ordered architectures, the underlying hardware instruction (LL/SC on ARM and POWER) can fail "spuriously" — failing even when the values match — due to cache events. Go's atomic.CompareAndSwap does not expose spurious failures; it retries internally until either the swap succeeds or a real mismatch is observed. (This is implementation-defined and may change, but it's the current behavior.)
Aligned vs unaligned atomic access¶
On most 64-bit platforms, an aligned 64-bit access is atomic by default in hardware. Go relies on this. On 32-bit ARM with older Go versions, accessing a misaligned 64-bit field caused crashes. The fix in modern Go is to use atomic.Int64/atomic.Uint64 struct wrappers, which guarantee alignment.
Atomicity of interface{} values¶
An interface{} is two words (type pointer + data pointer). A plain assignment i = v is not atomic. To atomically store an interface{}, wrap it in a *Holder and use atomic.Pointer[Holder], or use atomic.Value (which internally uses a noCopyMutex).
Read amplification with atomic.Pointer¶
Every Load returns a pointer. If consumers chain t.data.Load().Field, the load happens every call — readers don't share a snapshot. For hot paths, load once into a local variable:
This also makes the snapshot semantics explicit: v1 and v2 come from the same map.
Extra Self-Check Questions¶
- What is the difference between atomic and ordered? In Go, do you ever need to think about them separately?
- Why is
var done bool; for !done { }insufficient for waiting on a producer goroutine? - Name three Go primitives that establish happens-before.
- What does
close(ch)synchronize with? - Can you read the same field with both
atomic.LoadInt32and a plain access without it being a race? - What does
sync.Once.Dodo if the function panics? - What's the difference between
atomic.Valueandatomic.Pointer[T]? - If you store a
*Tatomically, then later mutate*T, is that safe? - Why might a spinning loop on an atomic flag be a bad idea on a single-core machine?
- How does the race detector know two accesses raced?
(Answers are interspersed in the sections above. If unsure on any, re-read the relevant section before moving to middle.md.)
Why This Matters in Practice¶
At first the topic feels theoretical: "I write Go, and the race detector tells me what's wrong. Why should I memorise acquire/release?"
The answer is that the race detector tells you that something is wrong, not why. The why is "I didn't establish a happens-before edge between the writer and the reader." Knowing the why lets you:
- Pick the right primitive (atomic, mutex, channel, once) for the situation.
- Read other people's code and recognise the synchronization pattern.
- Debug subtle production bugs that the race detector missed because the timing was lucky during tests.
- Reason about performance: which primitive costs less.
It also makes the next level of memory-ordering content (middle, senior, professional) feel natural rather than alien. By the time you reach the professional level, you will be implementing wait-free queues — and you will look back at this junior file fondly, because that is where the foundations were laid.
A Brief History (Optional)¶
Memory ordering became a public concern in 1979 with Leslie Lamport's paper "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs," which defined sequential consistency. By the 1990s, CPU designers realised SC was too expensive on weakly-ordered hardware and started shipping CPUs with relaxed memory models. The Alpha was infamous for being the most aggressive; ARM and POWER followed.
Java was the first mainstream language to formalise a memory model (Java 5, 2004). C++ added one in C++11. Go's memory model was published in 2009 with a major clarification in 2022. The modern Go memory model is more permissive than Java's — Go permits compilers to optimise more aggressively — but the primitives are simpler: there is no volatile, no synchronized, just channels, mutexes, atomics, and sync.Once.
The acquire/release vocabulary comes from the DRF-SC (data-race-free implies sequentially consistent) model first proposed by Adve and Hill in 1990. It's the basis of every modern language memory model, including Go's.
Connection to the Rest of Go¶
Every Go concurrency primitive provides acquire/release (or stronger) somewhere:
chan— send is release, receive is acquire.sync.Mutex/sync.RWMutex— Lock is acquire, Unlock is release.sync.Once— wraps both around the user function.sync.WaitGroup— Wait is acquire, Done is release.sync.Cond— Signal/Broadcast publish the underlying state; Wait acquires it.sync/atomic— every operation is acq-rel (and seq-cst).context.Context— Done channel close is a release.
This unity is why Go feels coherent: there is one synchronization model, exposed at different abstraction levels.
When to Move On¶
You're ready for middle.md when:
- You can write a correct lazy-init using
sync.Oncewithout checking the docs. - You can publish a struct pointer with
atomic.Pointer[T]and explain why it's safe. - You can describe what
mu.Unlock()does besides releasing the lock. - You can articulate why the race detector flags
for !done { }.
If those feel solid, proceed.
Appendix A: A Library of Worked Examples¶
The next ten subsections walk through realistic micro-services or library patterns where acquire/release matters. Each example is small enough to run, large enough to illustrate one decision.
A.1 — A connection counter¶
package conn
import "sync/atomic"
var active atomic.Int64
func Inc() int64 { return active.Add(1) }
func Dec() int64 { return active.Add(-1) }
func Now() int64 { return active.Load() }
Three functions, three atomics. Add is acq-rel — it acts as both a release of any earlier writes on the calling goroutine and an acquire of any earlier writes on goroutines whose Add it observed.
But Inc/Dec don't publish anything except the counter itself. The acq-rel happens incidentally. This is the most common case for atomics: simple counters, where the only "publication" is the count.
A.2 — A safer atomic counter wrapper¶
package counter
import "sync/atomic"
type Counter struct {
v atomic.Int64
}
func (c *Counter) Inc() { c.v.Add(1) }
func (c *Counter) Dec() { c.v.Add(-1) }
func (c *Counter) Load() int64 { return c.v.Load() }
Wrapping in a struct prevents callers from doing arithmetic on the raw atomic and accidentally introducing a race. The type system is your friend: Counter is a sealed abstraction over an atomic.Int64.
A.3 — A first-error sink¶
package sink
import "sync/atomic"
type First struct {
err atomic.Pointer[error]
}
// Record stores e only if no error has been recorded yet.
func (f *First) Record(e error) {
if e == nil {
return
}
f.err.CompareAndSwap(nil, &e)
}
// Err returns the first recorded error, or nil.
func (f *First) Err() error {
if p := f.err.Load(); p != nil {
return *p
}
return nil
}
CompareAndSwap(nil, &e) is the linchpin. It atomically tests "is the field nil?" and, if so, stores a pointer to e. The store is a release; subsequent Err calls acquire. The result: at most one error wins, all readers see the winner.
A.4 — A read-only flag exposed safely¶
package feature
import "sync/atomic"
var enabled atomic.Bool
func Enable() { enabled.Store(true) }
func Disable() { enabled.Store(false) }
func IsEnabled() bool {
return enabled.Load()
}
atomic.Bool was added in Go 1.19. Before that you'd use atomic.Int32 with 0/1. The semantics are identical: store is release, load is acquire.
A.5 — A late-binding constructor¶
package service
import "sync"
var (
once sync.Once
instance *Service
)
func Default() *Service {
once.Do(func() {
instance = &Service{
client: newClient(),
cache: newCache(),
log: newLogger(),
}
})
return instance
}
sync.Once is essentially atomic.Bool + sync.Mutex + the closure invocation. The two atomics inside (a "done" flag and a guard counter) provide acquire/release; the closure runs exactly once; every later caller acquires the published instance.
A.6 — A copy-on-write configuration¶
package cfg
import "sync/atomic"
type Config struct {
MaxConn int
Timeout int
Hosts []string
}
var current atomic.Pointer[Config]
func Set(c *Config) { current.Store(c) }
func Get() *Config { return current.Load() }
// Update applies fn to a copy of the current config and stores the result.
func Update(fn func(*Config)) {
for {
old := current.Load()
cp := *old // shallow copy
// copy slice contents if Hosts must be independent
cp.Hosts = append([]string(nil), old.Hosts...)
fn(&cp)
if current.CompareAndSwap(old, &cp) {
return
}
}
}
Update reads the current config, copies it, mutates the copy, and CASes. If a concurrent Update won, we lose the CAS and retry. This is optimistic concurrency control — a building block of lock-free algorithms.
A.7 — A capture-and-process worker¶
package worker
import "sync/atomic"
type Job struct {
ID int
Data []byte
}
type Worker struct {
current atomic.Pointer[Job]
}
func (w *Worker) Submit(j *Job) bool {
return w.current.CompareAndSwap(nil, j)
}
func (w *Worker) Take() *Job {
for {
j := w.current.Load()
if j == nil {
return nil
}
if w.current.CompareAndSwap(j, nil) {
return j
}
}
}
Submit posts a job only if the slot is empty (CAS against nil). Take reads, then CASes to clear the slot. The handshake is wait-free for the reader but lock-free for the writer (retries on contention).
A.8 — A "ready" gate¶
package gate
import (
"context"
"sync"
)
type Gate struct {
once sync.Once
ch chan struct{}
}
func New() *Gate {
return &Gate{ch: make(chan struct{})}
}
func (g *Gate) Open() {
g.once.Do(func() { close(g.ch) })
}
func (g *Gate) Wait(ctx context.Context) error {
select {
case <-g.ch:
return nil
case <-ctx.Done():
return ctx.Err()
}
}
A Gate is a one-shot synchronization primitive. The producer calls Open (release on the closed channel). All waiters are released — but each acquires their own copy of "I observed the close."
A.9 — A latched value¶
package latch
import (
"context"
"sync"
"sync/atomic"
)
type Latch[T any] struct {
once sync.Once
val atomic.Pointer[T]
ch chan struct{}
initC sync.Once
}
func New[T any]() *Latch[T] {
return &Latch[T]{ch: make(chan struct{})}
}
func (l *Latch[T]) Set(v T) {
l.once.Do(func() {
l.val.Store(&v)
close(l.ch)
})
}
func (l *Latch[T]) Get(ctx context.Context) (T, error) {
select {
case <-l.ch:
return *l.val.Load(), nil
case <-ctx.Done():
var zero T
return zero, ctx.Err()
}
}
A Latch[T] is "future-like" — write-once, read-many. sync.Once guards Set. The value is stored before the channel is closed; the close is the release; readers acquire by receiving from the (closed) channel.
A.10 — Combining sync.Map with publication¶
package registry
import "sync"
var registry sync.Map // map[string]*Service
func Register(name string, s *Service) {
registry.Store(name, s)
}
func Lookup(name string) (*Service, bool) {
v, ok := registry.Load(name)
if !ok {
return nil, false
}
return v.(*Service), true
}
sync.Map.Store is a release; Load is an acquire. Every *Service stored is fully built before the store. Readers acquire and get a safely-published pointer.
sync.Map is optimised for read-mostly workloads with stable keys; it's slower than a sync.Mutex-protected map for write-heavy workloads. The acquire/release contract is the same either way.
Appendix B: Why Sequential Consistency Matters¶
Go's sync/atomic provides sequential consistency. Why is this more than acquire/release?
Acquire/release pairs one location at a time. If you have two atomics A and B, and goroutine 1 writes A then B (with release on each), goroutine 2 might observe:
- A first, then B (consistent with G1's program order).
- B first, then A.
- B without seeing A at all.
The third case is forbidden under sequential consistency because there is a single global order of atomic operations. Under pure acquire/release, the third case is allowed if G2 only acquires on B and not on A.
Sequential consistency makes reasoning easier: every goroutine sees one consistent timeline of atomic operations. The cost is a few extra fences on weakly-ordered hardware. Go made the trade-off in favor of programmer sanity.
Here's a concrete example where SC matters:
// Dekker-like flag synchronization (simplified)
var (
flag1 atomic.Bool
flag2 atomic.Bool
)
// Goroutine 1:
flag1.Store(true)
if !flag2.Load() {
// enter critical section
}
// Goroutine 2:
flag2.Store(true)
if !flag1.Load() {
// enter critical section
}
Under sequential consistency, at most one goroutine enters the critical section. Under pure acquire/release, both might enter — because there is no global order between flag1.Store and flag2.Store.
You don't need to write code like this in Go (use a mutex), but it's good to know your atomics are strong enough that it would work.
Appendix C: Reading the Race Detector Output¶
A typical race detector report:
==================
WARNING: DATA RACE
Read at 0x00c0000200c8 by goroutine 8:
main.consumer()
/tmp/main.go:30 +0x44
main.main.func2()
/tmp/main.go:42 +0x58
Previous write at 0x00c0000200c8 by goroutine 7:
main.producer()
/tmp/main.go:22 +0x44
main.main.func1()
/tmp/main.go:38 +0x58
Goroutine 8 (running) created at:
main.main()
/tmp/main.go:40 +0x125
Goroutine 7 (finished) created at:
main.main()
/tmp/main.go:36 +0x9d
==================
What to read:
- Read at … by goroutine 8 — where the unsynchronized read happened.
- Previous write at … by goroutine 7 — the write the read is racing with.
- Goroutine N created at — the stack where each goroutine was spawned, useful when racing accesses come from anonymous funcs.
The fix is always the same: add a happens-before edge between the write and the read. Usually that means switching one or both to sync/atomic, wrapping with a mutex, or routing through a channel.
If you fix one race and the detector reports another, you have not finished — every reported race is a real race. Fix them all before merging.
Appendix D: Performance Numbers (Rough, 2025-era CPUs)¶
| Operation | Cost |
|---|---|
Plain int write | ~0.5 ns |
Plain int read | ~0.5 ns |
atomic.Int32.Load (uncontended) | ~1–2 ns |
atomic.Int32.Store (uncontended) | ~5–10 ns |
atomic.Int32.Add (uncontended) | ~5–10 ns |
atomic.CompareAndSwap (uncontended) | ~5–15 ns |
sync.Mutex.Lock+Unlock (uncontended) | ~10–25 ns |
sync.RWMutex.RLock+RUnlock (uncontended) | ~15–30 ns |
sync.Once.Do (after first call) | ~1–2 ns |
| Channel send/recv (unbuffered) | ~100–300 ns |
| Channel send/recv (buffered, uncontended) | ~30–80 ns |
These numbers vary by CPU, by Go version, by contention level. Treat them as orders of magnitude, not promises. The lesson: atomics are roughly an order of magnitude cheaper than mutexes, which are roughly an order of magnitude cheaper than channels — but channels carry orders of magnitude more semantics. Pick the right tool, not the cheapest.
Appendix E: Cross-Reference to the Rest of the Roadmap¶
- For more on goroutines:
01-goroutines/. - For more on channels:
04-channels/. - For mutexes:
06-sync-mutex/. - For
sync.WaitGroup:07-sync-waitgroup/. - For
sync.Once:08-sync-once/. - For
sync/atomicdeep dive:09-sync-atomic/. - For the formal memory model:
21-memory-model-formal/. - For hardware fences (parent's first sibling):
22-memory-ordering-barriers/01-hardware-barriers/. - For sequential consistency (next sibling):
22-memory-ordering-barriers/03-sequential-consistency/.
The acquire/release file you just read sits at the heart of this network. Almost every concurrency topic in Go has an "and here's why happens-before matters" sentence pointing back here.
Final Word for the Junior¶
If you take away one sentence: safe publication needs a release on the writer and an acquire on the reader, on the same synchronization location.
That sentence is the entire job. Everything else — sync.Once, atomic.Pointer, mutexes, channels — is just convenient packaging for that contract.
Run the race detector. Reach for sync.Once and atomic.Pointer[T]. Don't roll your own publication out of plain variables. You're ready.
Appendix F: Extra Worked Walkthroughs¶
F.1 — Diagnosing a stale-read bug¶
A teammate reports: "I store the user count in userCount and read it in the metrics handler. Sometimes the metric is 0 even though we have users." Reading the code:
var userCount int
func OnLogin() {
userCount++
}
func MetricsHandler(w http.ResponseWriter, _ *http.Request) {
fmt.Fprintf(w, "users=%d\n", userCount)
}
There are two bugs:
userCount++is not atomic. Two concurrent logins lose updates.- The read in
MetricsHandleris unsynchronized with the writes. Even if you fix #1, the reader might see a stale value indefinitely.
Fix:
var userCount atomic.Int64
func OnLogin() { userCount.Add(1) }
func MetricsHandler(w http.ResponseWriter, _ *http.Request) {
fmt.Fprintf(w, "users=%d\n", userCount.Load())
}
Now Add is acq-rel (no lost updates), Load is acquire (sees the latest value the runtime can offer). Note: there's no producer/consumer pair here — just shared mutation. The acq-rel comes for free.
F.2 — Diagnosing a flaky test¶
A test occasionally fails. The failing line is got: 0, want: 42. The setup:
func TestFlaky(t *testing.T) {
var result int
done := make(chan bool)
go func() {
result = 42
done <- true
}()
<-done
if result != 42 {
t.Errorf("got %d", result)
}
}
Wait — this looks correct. The send happens after result = 42; the receive happens before the read. There should be a happens-before edge.
And there is — Go's memory model guarantees that a send on a channel happens-before the receive completes. The test should not fail.
What's going on? In this case, the test is correct. If it's flaky, look elsewhere: maybe a different goroutine is mutating result, or maybe the test is timing-sensitive in a way we haven't noticed.
Run go test -race. If the detector is silent, the bug isn't a publication bug.
F.3 — Refactoring a mutex to atomics¶
A teammate writes:
var (
mu sync.RWMutex
flags map[string]bool
)
func IsEnabled(name string) bool {
mu.RLock()
defer mu.RUnlock()
return flags[name]
}
func Enable(name string) {
mu.Lock()
defer mu.Unlock()
if flags == nil {
flags = map[string]bool{}
}
flags[name] = true
}
Reads dominate (millions per second), writes are rare (once per minute). Can we eliminate the read lock?
Yes — replace the map with an atomic.Pointer[map[string]bool]:
var flags atomic.Pointer[map[string]bool]
func IsEnabled(name string) bool {
m := flags.Load()
if m == nil {
return false
}
return (*m)[name]
}
func Enable(name string) {
for {
old := flags.Load()
cp := map[string]bool{}
if old != nil {
for k, v := range *old {
cp[k] = v
}
}
cp[name] = true
if flags.CompareAndSwap(old, &cp) {
return
}
}
}
Reads are now lock-free; writes still serialize (via the CAS retry loop). For our workload, that's a huge win.
Important: this only works because we treat the map as immutable after publication. We never mutate *m after Store. Every "update" allocates a new map.
F.4 — Why a single goroutine can still need atomics¶
Surprisingly, even single-goroutine code can need atomics if it interacts with the runtime's preemption or with a signal handler:
var counter int32
func main() {
// Signal handler set up elsewhere may read `counter`.
for i := 0; i < 1e6; i++ {
atomic.AddInt32(&counter, 1)
}
}
Without the atomic, the signal handler might read a torn or stale value. With atomic, it sees a consistent count. This case is rare but real.
Appendix G: A Glossary of "Synchronizes-With"¶
You'll see the phrase "synchronizes-with" in the Go memory model and other specs. Definitions:
- A synchronizes-with relation is an edge between a release and an acquire on the same memory location, where the acquire observed the released value.
- Synchronizes-with edges contribute to the happens-before order: if A synchronizes-with B, then A happens-before B.
- Multiple synchronizes-with edges compose — if A synchronizes-with B and B happens-before C, then A happens-before C.
This composition is why mutexes work: each lock-unlock pair is a sync-with edge, and the chain of sync-with edges through a sequence of critical sections builds up a total order.
Appendix H: Practice Exercises (Pointer to tasks.md)¶
If you want to test what you learned, jump to tasks.md in this same folder. The exercises there start with "publish an integer flag" and work up to "implement a one-shot promise type." Solve them on paper, then in code, then with -race. You'll know you understood when the detector is silent and your code is short.
Appendix I: A Note on Cross-Language Comparisons¶
If you're coming from C++, Java, Rust, or C#:
- C++ has explicit
memory_order_acquire,memory_order_release,memory_order_seq_cst. Go always uses seq-cst for atomics. You can't pick relaxed in Go. - Java has
volatile(which is acq-rel) andsynchronized(which provides full barriers). Go'ssync/atomicis closer toAtomicXinjava.util.concurrent.atomic. - Rust has
Ordering::Acquire,Ordering::Release,Ordering::SeqCst. Same as C++. Rust forces you to think about ordering explicitly. - C# has
Volatile.Read/Volatile.Write(acq/rel) andInterlocked(seq-cst). Closer to Go.
Go intentionally hides the choice. Most Go programmers never need to think "do I want acquire or seq-cst here?" — they get seq-cst by default. This makes Go code simpler but on weakly-ordered hardware (ARM, RISC-V) slightly slower in tight atomic loops.
Appendix J: When You Cannot Use Atomics¶
Some Go code intentionally avoids sync/atomic:
- Code that runs in
go:nosplitorgo:nowritebarriercontexts (deep in the runtime). Atomics there are calls into special runtime functions. - Code that must be reentrant from signal handlers uses very limited atomic primitives.
- Cgo-imported types may not respect Go's memory model.
For 99.9% of Go code, none of this matters. Use sync and sync/atomic freely.
Appendix K: Recapping the Six Primitives¶
In one table:
| Primitive | Release | Acquire | When to use |
|---|---|---|---|
sync.Mutex | Unlock | Lock | General-purpose mutual exclusion |
sync.RWMutex | Unlock/RUnlock | Lock/RLock | Read-mostly with occasional writes |
sync.Once | end of Do body | every Do return | One-shot lazy init |
chan send | ch <- v | <-ch | Pass values, signal events |
chan close | close(ch) | <-ch (closed) | Broadcast a one-shot event |
sync/atomic | Store/CompareAndSwap | Load/CompareAndSwap | Single-word lock-free |
Memorise this table. Most concurrency code you read or write will be some combination of rows.
Appendix L: A Final Anti-Pattern Tour¶
Anti-pattern: "I'll just use runtime.Gosched() to make sure the goroutine runs first."¶
Gosched is a hint to the scheduler — it does not establish happens-before. Use a channel, a WaitGroup, or an atomic.
Anti-pattern: "I'll use time.Sleep to be sure."¶
Sleep does not synchronize either. It just gives the scheduler more time. On a slow machine or under load, 10 ms may not be enough; on a fast machine, it's overkill. Use proper synchronization.
Anti-pattern: "I'll lock around the read, but the write is fine without it."¶
go func() { x = 1 }() // race
mu.Lock()
fmt.Println(x) // race partner is the unlocked write
mu.Unlock()
Both sides of a shared variable must agree on the synchronization. One-sided locking is a race.
Anti-pattern: "atomic.Value for an immutable string."¶
This works, but atomic.Pointer[string] (or even a regular global) is simpler if the string is set once at startup. Atomics carry runtime cost; don't reach for them if you don't need concurrent updates.
Anti-pattern: passing a *T over a channel and then mutating *T in the sender.¶
ch := make(chan *Job)
go func() {
job := &Job{Status: "pending"}
ch <- job
job.Status = "done" // RACE with receiver reading Status
}()
j := <-ch
fmt.Println(j.Status)
The send publishes the pointer — but the sender still holds the same pointer. Mutating after the send races with the receiver. Either send a copy by value, or treat the sent pointer as no-longer-yours.
Appendix M: Wrap-Up Quiz¶
- Producer writes
x = 5thenatomic.StoreInt32(&flag, 1). Consumer readsatomic.LoadInt32(&flag) == 1theny := x. Isy == 5guaranteed? Yes. - Producer writes
atomic.StoreInt32(&flag, 1)thenx = 5. Consumer readsatomic.LoadInt32(&flag) == 1theny := x. Isy == 5guaranteed? No. - Producer writes
x = 5thenflag = 1(plain bool). Consumer readsif flag == 1 then y := x. Isy == 5guaranteed? No. (Data race.) - Producer writes
x = 5then sendsch <- struct{}{}. Consumer receives<-chthen readsy := x. Isy == 5guaranteed? Yes. - Producer is inside
once.Do(f).fwritesx = 5. Consumer also callsonce.Do(f)(after producer'sDoreturned on producer's goroutine) and then readsy := x. Isy == 5guaranteed? Yes.
If you got 4/5 or better, you're solid for the middle level. If not, re-read the Core Concepts and Common Mistakes sections.
Appendix N: Deeper Examples Around Each Primitive¶
N.1 — sync.WaitGroup as a publication mechanism¶
sync.WaitGroup is usually thought of as "wait for N goroutines." But it's also a publication primitive:
var wg sync.WaitGroup
var result int
wg.Add(1)
go func() {
defer wg.Done()
result = compute()
}()
wg.Wait()
fmt.Println(result) // safe: Done is a release, Wait is an acquire
Every Done call is a release of the goroutine's writes. Wait is an acquire that returns only after all goroutines have called Done. Therefore, any write made before Done is visible after Wait returns.
This is heavily used in tests: spawn helpers, write to shared state, Wait, then assert.
N.2 — sync.Cond and condition variables¶
var (
mu sync.Mutex
cond = sync.NewCond(&mu)
ready bool
result int
)
// Producer:
go func() {
mu.Lock()
result = compute()
ready = true
cond.Broadcast()
mu.Unlock()
}()
// Consumer:
mu.Lock()
for !ready {
cond.Wait()
}
r := result
mu.Unlock()
fmt.Println(r)
cond.Wait releases the mutex and blocks until Signal or Broadcast is called, then re-acquires the mutex. The mutex provides the acquire/release; Cond just adds an efficient "wake me up" mechanism on top.
You rarely need sync.Cond in modern Go — channels usually express the same pattern more clearly.
N.3 — context.Context and cancellation publication¶
ctx, cancel := context.WithCancel(parent)
go func() {
work(ctx)
}()
go func() {
time.Sleep(time.Second)
cancel() // closes ctx.Done()
}()
cancel() closes the internal channel exposed by ctx.Done(). Any goroutine doing <-ctx.Done() synchronizes with the cancel. Writes made before the cancel are visible to anyone who observed ctx.Err() != nil.
This is why returning early on ctx.Done() is safe — the cancellation is published with happens-before semantics.
N.4 — A bytes-Buffer publication trap¶
var buf bytes.Buffer
go func() {
buf.WriteString("hello")
}()
go func() {
fmt.Println(buf.String())
}()
bytes.Buffer is not safe for concurrent use. Both goroutines mutate internal state. To safely share, wrap with a mutex:
type SafeBuf struct {
mu sync.Mutex
buf bytes.Buffer
}
func (s *SafeBuf) Write(p []byte) (int, error) {
s.mu.Lock()
defer s.mu.Unlock()
return s.buf.Write(p)
}
func (s *SafeBuf) String() string {
s.mu.Lock()
defer s.mu.Unlock()
return s.buf.String()
}
The mutex provides acquire/release around every operation. Without it, both threads race on the buffer's internal slice.
N.5 — A short publication is still a publication¶
It might be tempting to think "I'm only publishing one bool, the CPU can't tear that, why bother with atomic?"
The CPU may not tear, but:
- The compiler may hoist the read outside a loop.
- The compiler may dead-code-eliminate the write.
- The hardware may delay the write becoming visible indefinitely.
- The race detector will flag it.
All four are sufficient reasons to use atomic.Bool or sync.Mutex.
Appendix O: Read the Source¶
The actual implementations of sync and sync/atomic in the Go runtime are educational reading:
src/sync/once.go— about 80 lines. Notice the fast path: an atomic Load ofdone, no mutex if already done.src/sync/mutex.go— about 250 lines. The unlocked fast path is a single CAS; the slow path handles contention.src/sync/atomic/doc.go— describes the package's contract. The actual implementations are in assembly per architecture.src/runtime/atomic_*.goandsrc/runtime/internal/atomic/— the per-architecture atomic operations.
Reading the assembly is instructive: on x86, an atomic.Store compiles to XCHG (which is an implicit LOCK); on ARM, it requires explicit DMB (data memory barrier) instructions. The Go runtime hides these architectural details from you.
Appendix P: One Last Mental Model¶
Think of acquire/release as a one-way mirror between goroutines:
- Without synchronization, each goroutine sees its own private snapshot. The mirror is opaque — neither side knows what the other is doing.
- Release flips one side of the mirror to "transparent" — but only outgoing. The releasing goroutine publishes its writes.
- Acquire flips the other side to "transparent — incoming." The acquiring goroutine sees the published writes.
- Both sides must be flipped at the same location for the mirror to be useful.
Once you have this image, the rest of acquire/release theory follows naturally.
That concludes the junior file. Continue to middle.md for publication patterns at scale, or tasks.md to practice.
Appendix Q: Drill — Build a Safe Publish/Subscribe¶
Build a Latch[T any] type. Specification:
Set(v T)— call exactly once; second call is a no-op.Get(ctx context.Context) (T, error)— block untilSethas been called orctxis canceled.- Multiple goroutines may call
Getconcurrently; all should see the samev.
Solution outline:
type Latch[T any] struct {
done chan struct{}
val T
once sync.Once
}
func NewLatch[T any]() *Latch[T] {
return &Latch[T]{done: make(chan struct{})}
}
func (l *Latch[T]) Set(v T) {
l.once.Do(func() {
l.val = v
close(l.done)
})
}
func (l *Latch[T]) Get(ctx context.Context) (T, error) {
select {
case <-l.done:
return l.val, nil
case <-ctx.Done():
var zero T
return zero, ctx.Err()
}
}
Why is reading l.val safe without an atomic? Because:
Setwritesl.valbeforeclose(l.done).closeis a release onl.done.<-l.doneis an acquire.- Therefore the write to
l.valhappens-before the read.
sync.Once ensures that even if many goroutines race to call Set, only one wins and one write happens. The channel close publishes the result.
This is the kind of code you should be able to write fluently after reading this file.
Appendix R: Drill — Fix a Broken Counter¶
Given this broken code:
Run with -race, observe failures, then fix.
Solutions, ranked by overhead:
atomic.Int64: best for simple counts.sync.Mutexaround both methods: best if you need additional invariants (e.g., counter never exceeds N).- Per-goroutine sharded counters with a "sum" function: best for write-heavy workloads on many cores.
Pick (1) unless you have a reason for (2) or (3).
Appendix S: Drill — Implement Lazy Init Three Ways¶
Implement a singleton that computes its value lazily, three ways:
Way 1: sync.Once.
var (
once sync.Once
instance *Service
)
func Get() *Service {
once.Do(func() { instance = newService() })
return instance
}
Way 2: sync.OnceValue (Go 1.21+).
Way 3: Double-checked load with atomic.Pointer.
var instance atomic.Pointer[Service]
var mu sync.Mutex
func Get() *Service {
if s := instance.Load(); s != nil {
return s
}
mu.Lock()
defer mu.Unlock()
if s := instance.Load(); s != nil {
return s
}
s := newService()
instance.Store(s)
return s
}
Way 3 is the classic double-checked locking pattern, which only works because instance.Load and instance.Store are atomic with acquire/release semantics. Way 1 is recommended for clarity; way 3 is interesting because it shows the building blocks underneath sync.Once.
We'll explore why way 3 is correct (and the famous Java bug that motivated its careful design) in senior.md.
Appendix T: Wrap-Up Reading List¶
Before moving on, you should be able to point at any line of these references and explain it:
- The "Synchronization" section of https://go.dev/ref/mem.
- The doc comment of
sync.Oncein the stdlib source. - The doc comment of
atomic.Pointerin the stdlib source. - The "Race Detector" section of the Go blog.
- Russ Cox's "Hardware Memory Models" (read it twice).
When those feel comfortable, jump to middle.md. There we start building real services.
End of junior.md. Total: a ~3000-line tour from "why publication is hard" to "how Go solves it for you, six different ways." Take your time, run the examples, and remember the one sentence:
Safe publication needs a release on the writer and an acquire on the reader, on the same synchronization location.
Appendix U: Real Bugs from the Wild¶
The following are simplified versions of real production bugs reported against open-source Go projects. Names and identifiers are changed.
U.1 — A leaked goroutine because of a missed publication¶
The worker never sees stopFlag = true because:
- The compiler hoists the read out of the loop (the variable looks loop-invariant).
- No happens-before edge between the write and the read.
Fix: use atomic.Bool for stopFlag. The worker checks stopFlag.Load() on every iteration; Stop calls stopFlag.Store(true).
This was a real bug that took a team three days to diagnose, because under the race detector the code happens to be re-loaded every iteration and works fine, whereas in optimised production builds the loop is hoisted and never terminates.
U.2 — A cache that lost updates under concurrent writers¶
type Cache struct {
data map[string]string
}
func (c *Cache) Set(k, v string) { c.data[k] = v }
func (c *Cache) Get(k string) string { return c.data[k] }
Two writers concurrently triggering map growth caused random crashes ("concurrent map writes" panic). Even read-then-write races on the same key caused lost updates.
Fix: protect with sync.RWMutex, or use sync.Map, or use an atomic.Pointer[map[string]string] with copy-on-write.
U.3 — A "double-free" of resources¶
type Conn struct{ closed bool }
func (c *Conn) Close() {
if !c.closed {
c.closed = true
c.release()
}
}
Two goroutines calling Close concurrently both observed closed == false, both ran release(), double-freeing the underlying resource.
Fix: use atomic.Bool and CompareAndSwap:
type Conn struct{ closed atomic.Bool }
func (c *Conn) Close() {
if c.closed.CompareAndSwap(false, true) {
c.release()
}
}
The CAS atomically tests-and-sets; only one caller wins.
U.4 — Confused snapshots¶
type Stats struct{ Reqs, Errs int }
var s Stats
func IncReq() { s.Reqs++ }
func IncErr() { s.Errs++ }
func Snapshot() (int, int) { return s.Reqs, s.Errs }
The metrics endpoint reports nonsensical pairs (Reqs=100, Errs=2 in one read, Reqs=50, Errs=4 in the next — the counter went down?). Because reads are not atomic, and not coordinated with writes, snapshots can mix old and new values.
Fix: use two atomic.Int64s. Each operation is atomic; readers get some moment's snapshot. Or wrap with a mutex if you need strict pair consistency.
U.5 — A logger initialised twice¶
var logger *Logger
func GetLogger() *Logger {
if logger == nil {
logger = newLogger()
}
return logger
}
Two goroutines called GetLogger at the same time; both saw nil; both created a logger; the loser's logger was orphaned, leaking the file handle it had opened.
Fix: sync.Once. (Or sync.OnceValue in modern Go.)
These five bugs are textbook examples of what acquire/release prevents. Memorise them — you'll see variants in every codebase.
That truly concludes the junior file. Go forth and publish safely.
Appendix V: Quick Reference Card (Printable)¶
+----------------------------------------------------------------+
| ACQUIRE/RELEASE QUICK REFERENCE — Go |
+----------------------------------------------------------------+
| Release operations: |
| atomic.StoreXxx(&v, val) |
| atomic.AddXxx(&v, delta) (also acquire) |
| atomic.CompareAndSwapXxx(...) (also acquire) |
| atomic.SwapXxx(...) (also acquire) |
| mu.Unlock() |
| rwmu.Unlock(), rwmu.RUnlock() |
| ch <- v |
| close(ch) |
| wg.Done() |
| |
| Acquire operations: |
| atomic.LoadXxx(&v) |
| (the RMW operations above) |
| mu.Lock() |
| rwmu.Lock(), rwmu.RLock() |
| <-ch |
| wg.Wait() |
| |
| Combined (both R and A): |
| sync.Once.Do(f) |
| |
| Tests: |
| go test -race ./... |
| |
| Idioms: |
| atomic.Pointer[T] — one-shot or read-mostly publication |
| sync.Once — lazy init |
| sync.OnceValue — lazy init returning a value (Go 1.21+) |
| close(chan) — broadcast a one-shot event |
+----------------------------------------------------------------+
Save this. Look at it the next time you write a publication.
End of file.
+ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ +
│ producer goroutine │
+ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ +
plain writes: │ cfg.URL = "..." │
│ cfg.Port = 8080 │
===============│======================│ <- release barrier
release op: │ atomic.Store(...) │
+ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ +
│ visible
v
+ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ +
acquire op: │ atomic.Load(...) │
===============│======================│ <- acquire barrier
plain reads: │ fmt.Println(p.URL) │
│ fmt.Println(p.Port) │
+ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ +
│ consumer goroutine │
+ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ +
The vertical lines are the barriers. Code cannot move across them. Writes above the release are visible to anyone who observes the release. Reads below the acquire see those writes.