sync.Once — Professional Level¶
← Back to sync.Once
We open src/sync/once.go and walk through the actual implementation: the done flag, the atomic fast path, the mutex slow path, the double-check pattern, and the memory model proof that justifies the happens-before guarantee. References are to Go 1.22 source. The algorithm has been stable since Go 1.0; only the spelling of the atomic primitives has evolved.
1. The struct¶
The whole of sync.Once is a dozen lines:
Two fields:
done— a 32-bit atomic flag.0means "not yet run."1means "run completed." There are no other states.m— async.Mutexused on the slow path to serialise concurrent first-touch callers.
The struct is 24 bytes on 64-bit (atomic.Uint32 is 4, Mutex is 8, plus alignment). It used to be declared as done uint32 (a plain integer) accessed via atomic.LoadUint32 and atomic.StoreUint32; the migration to atomic.Uint32 is a type-system improvement only — the underlying machine instructions are identical.
A historical curiosity: pre-1.13 versions had a done int32 placed first in the struct so that on 32-bit platforms the atomic load was naturally aligned. This is no longer required because the compiler aligns atomic.Uint32 correctly by definition.
2. The Do method¶
The full implementation, with comments:
func (o *Once) Do(f func()) {
// Fast path: atomic load of `done`.
// If it's 1, init has run; return immediately.
if o.done.Load() == 0 {
// Slow path
o.doSlow(f)
}
}
func (o *Once) doSlow(f func()) {
o.m.Lock()
defer o.m.Unlock()
// Double-check after acquiring the mutex.
// Another goroutine may have run f while we were blocked
// on Lock(); if so, done is now 1 and we skip f.
if o.done.Load() == 0 {
defer o.done.Store(1)
f()
}
}
That is the entire algorithm. Twelve lines, no surprises. Two things make it tick: the fast path on the hot side, and the double-check inside the slow path.
2.1 The fast path¶
After the first successful call, done is 1. Every subsequent call:
- Performs an atomic load of
done. On amd64 this is a plainMOVplus an LFENCE that is essentially free. - Compares to zero. False — branch over.
- Returns.
Total: ~2 nanoseconds on modern hardware. No mutex, no allocation. This is why Once.Do is cheap to call on the request path: after the first time, it costs as much as a single integer comparison.
2.2 The slow path¶
func (o *Once) doSlow(f func()) {
o.m.Lock()
defer o.m.Unlock()
if o.done.Load() == 0 {
defer o.done.Store(1)
f()
}
}
When the fast path observes done == 0, control falls through to doSlow. Three possibilities:
- First caller, no contention. Takes the mutex, sees
done == 0, runsf, the deferredo.done.Store(1)flips the flag, mutex released. - Late caller, init already finished. Takes the mutex (briefly contested with whoever held it last), sees
done == 1, skipsf, releases the mutex. - Concurrent first-touch. Many goroutines all see
done == 0on the fast path and pile intodoSlow. They queue on the mutex. The first one in runsfand setsdone. The rest take the mutex in order, but their double-checkif o.done.Load() == 0is now false, so they skipfand release.
The double-check is the only place that f is actually invoked. The fast path never calls f. The slow path only calls f if the post-mutex check confirms done is still zero.
2.3 Why the defer order matters¶
There are two deferred statements. They run in LIFO order:
o.done.Store(1)runs first — right afterf()returns.o.m.Unlock()runs second — after the store.
This means: the done flag is set while we still hold the mutex. Any concurrent caller waiting on the mutex will, after acquiring it, see done == 1 and skip f. There is no window where done is 1 but the mutex is unlocked and a racer could observe an inconsistent state.
Also crucial: o.done.Store(1) is deferred so it runs even if f panics. This is what gives Once its "panic counts as done" semantics. A naive implementation that wrote done = 1 only on the success path would re-run f after a panic — exactly what Once does not do.
3. Why the double-check?¶
The pattern of "load outside the lock, lock, load again, do work, store" is the double-checked locking pattern. It is famously broken in Java without volatile and broken in C++ without std::atomic. Why does it work in Go?
Two ingredients:
- The Go memory model. The Go memory model defines
atomic.Uint32.Loadandatomic.Uint32.Storeas sequentially consistent. The store-with-lock-held happens-before the load-after-lock-acquired in a different goroutine. - The placement of the store inside the mutex. The store is performed before unlocking, so its visibility is anchored to the mutex release. Any goroutine that subsequently acquires the mutex sees the store.
Without the atomic types, you would have data races on done. The fast path reads done without holding the mutex; if the store were a plain non-atomic write, the read would be a race. The Go race detector would flag this. By using atomic.Uint32, the read and write are synchronised at the language level, and the race detector is satisfied.
In other languages, the same pattern requires the equivalent of std::atomic<bool> with memory_order_acquire/memory_order_release. Go's atomics are conceptually sequentially consistent (stricter than acquire/release but on amd64 free; on arm64 a hair more expensive). Either way, the pattern is provably correct.
4. The memory model proof¶
Why does Once provide happens-before from f's body to every later Do return?
Step by step:
- Goroutine A enters
doSlow, acquires the mutex. - A's
done.Load()returns 0. - A executes
f, including all its writes (call this set $W$). - A executes
defer o.done.Store(1)—doneis now 1. - A executes
defer o.m.Unlock()— mutex released. - Goroutine B's fast path runs later in real time. B calls
done.Load(), sees1. - B returns from
Do.
The happens-before chain:
- $W$ happens-before
o.done.Store(1)(within A, program order). o.done.Store(1)happens-beforeo.m.Unlock()(within A, program order, plus the mutex release barrier).o.m.Unlock()happens-before any latero.m.Lock()(mutex contract).o.done.Store(1)happens-before any latero.done.Load()that returns1(atomic synchronisation).
Therefore $W$ happens-before any later Do that observes done == 1. Including B's, which observed the store on the fast path without taking the mutex. The synchronisation comes from the atomic store/load pair alone.
This is the formal justification of "lazy singletons just work" with sync.Once. You can write to a global inside Do, and any reader who calls Do afterwards (even via the fast path, even without going near the mutex) sees the write.
5. Why not atomic.Bool?¶
atomic.Bool exists since Go 1.19. Why does Once use atomic.Uint32?
- Historical inertia. The original implementation used
uint32withatomic.LoadUint32.atomic.Booldid not exist. Backward compatibility prevented a cosmetic switch. - No observable difference. On every architecture Go supports,
atomic.Boolandatomic.Uint32compile to the same instruction (a byte or word load and a memory barrier). The Uint32 form was already proven. - Future flexibility.
Uint32leaves room for additional states (currently unused) without an API break.Boolis binary by definition.
There is no significance to the choice; treat them as equivalent for Once's purpose.
6. The Go 1.21 helpers — implementation¶
sync.OnceFunc, OnceValue, OnceValues are wrappers around a sync.Once. Their full source (simplified):
func OnceFunc(f func()) func() {
var (
once Once
valid bool
p any
)
g := func() {
defer func() {
f = nil // release the closure for GC
if !valid {
panic(p) // re-panic with the captured value
}
}()
f()
valid = true
}
return func() {
once.Do(g)
if !valid {
panic(p)
}
}
}
Annotations:
- The captured
fis set tonilafter the first call, so the closure no longer holds a reference to whateverfcaptured. GC can reclaim it. validtracks whetherfreturned normally. If it panicked,pholds the value.- The returned wrapper re-panics on every subsequent call if the first call panicked. This is the "loud" panic policy: every caller learns about the failure.
OnceValue and OnceValues are essentially the same, generic over the return types. They capture f's output into the closure and return it.
The key innovation is the panic-replay behaviour. Raw sync.Once silently no-ops after a panicking call; the 1.21 wrappers shout. Choose deliberately.
7. Cost analysis¶
Empirical numbers on a 2024-era amd64 CPU at 3.5 GHz:
| Operation | Time |
|---|---|
once.Do(f) after first call (fast path) | ~0.7 ns |
once.Do(f) first call, uncontended (slow path) | ~30 ns + cost of f |
once.Do(f) contended first call | ~50–100 ns + cost of f per loser |
atomic.Uint32.Load | ~0.5 ns |
Mutex.Lock + Unlock, uncontended | ~10 ns |
The fast path is essentially a single atomic load. Calling Once.Do on the hot path of an HTTP request handler costs nothing meaningful.
For comparison, calling a plain function: ~1 ns. Calling Once.Do on the fast path is less than half the cost of a regular function call.
8. False sharing concerns¶
sync.Once is 24 bytes. Three of them fit in a 64-byte cache line. If two Once values land on the same cache line and live in two different goroutines, writes to one can invalidate the other's cache line. This is false sharing.
In practice, package-level Once declarations are not hot — the slow path runs at most once. The fast path is read-only. False sharing on Once is almost never a problem.
The exception: many Once values bundled in a slice or struct, where many goroutines simultaneously perform first-touch on different Onces. If this matters, pad the structs to cache-line boundaries:
This is exotic. Real programs do not need it.
9. Race detector hooks¶
The Go race detector (-race flag) instruments memory accesses with synchronisation tracking. For sync.Once:
- The atomic load/store of
doneare recognised as synchronisation operations. - The mutex Lock/Unlock are recognised.
- The happens-before edges described in the memory-model section are encoded directly.
If you write:
the race detector flags it: the second goroutine reads x without calling Do, so it has no happens-before relation with the assignment. Add once.Do(...) to the read path (even though it is a no-op on the fast path) and the race goes away.
This is one of the practical reasons to always read shared state through the same Once.Do that wrote it: not because the runtime forces you, but because that is what makes the synchronisation explicit to the race detector and to human readers.
10. Comparison to other languages¶
| Language | "Run once" primitive |
|---|---|
| C++11 | std::call_once(flag, f) with std::once_flag |
| Java | synchronized block + volatile flag, or LazyHolder idiom |
| Python | threading.RLock + flag, or functools.lru_cache(maxsize=1) |
| Rust | std::sync::Once::call_once(f) |
| C# | Lazy<T> |
The shape is universal. The differences are in:
- API ergonomics. Go's
once.Do(f)is as small as it gets. - Generic return. Rust's
LazyLock, C#'sLazy<T>, Go 1.21'sOnceValueall add a return type. Go 1.0–1.20'sOnce.Dodid not. - Reset. C++ allows
std::once_flagto be assigned over (effectively a reset). Go does not, by design.
C++'s std::call_once is the closest cousin to Go's Once. Both use a double-checked pattern internally. Both block late callers. Both treat panic/exception as "done."
11. Walking src/sync/once.go¶
The full file is about 80 lines including comments. Key landmarks:
src/sync/once.go
├── package sync
├── import (atomic)
├── type Once struct { done atomic.Uint32; m Mutex }
│
├── func (o *Once) Do(f func())
│ ├── if o.done.Load() == 0:
│ │ o.doSlow(f)
│ └── return
│
└── func (o *Once) doSlow(f func())
├── o.m.Lock()
├── defer o.m.Unlock()
├── if o.done.Load() == 0:
│ defer o.done.Store(1)
│ f()
└── return
The whole production source. No hidden state, no platform-specific paths. Read it for yourself; it is one of the cleanest files in the standard library.
12. The runtime semaphore (slow path of Mutex)¶
Once.m is a sync.Mutex. When the slow path is contended, what happens?
Mutex.Lock on amd64:
- Atomic CAS attempt on the mutex's state word: try to flip from unlocked (0) to locked (1).
- If the CAS succeeds, return — fast path.
- If it fails, enter the slow path, which involves:
- Spinning briefly (4 iterations on multicore amd64) hoping the holder releases.
- If still locked, calling
runtime_SemacquireMutex(&m.sema)— a runtime semaphore. - The semaphore parks the goroutine on a wait queue maintained by the runtime.
- When
Unlockis called, the runtime semaphore wakes the next waiter.
For Once, this matters when many goroutines hit the cold slow path simultaneously. They all attempt CAS, one wins, the rest park on the semaphore. The winner runs f (which may take milliseconds), then unlocks. Each waiter then acquires the lock in turn, runs the double-check (done == 1 now), skips f, releases.
The cost per loser: one semaphore park + one lock acquisition + one atomic load + one unlock. On the order of microseconds total, dominated by the semaphore wait. Acceptable for a one-time cost.
For sustained high contention on a Once slow path (which is rare — it only happens during the brief first-touch window), you would see this in pprof under sync.(*Mutex).lockSlow. The fix, almost always, is to pre-warm the Once synchronously before fanning out.
13. Atomic implementation per architecture¶
The atomic load/store of done:
| Architecture | Load | Store |
|---|---|---|
| amd64 | MOV | MOV + MFENCE (or XCHG) |
| arm64 | LDAR (load-acquire) | STLR (store-release) |
| ppc64 | LWZ + LWSYNC | LWSYNC + STW |
| 386 | MOV + MFENCE | LOCK XCHG |
| wasm | single-threaded; no fences needed |
On amd64, the load is essentially free (one machine instruction). The store is more expensive but only happens once. This is one reason Once is cheaper than a Mutex on the hot path: the hot path avoids the LOCK prefix that mutex acquisition requires.
On arm64, the load-acquire LDAR is slightly more expensive than a plain load but still on the order of 1 ns. Same conclusion: the fast path is cheap.
14. Why Once cannot be reset¶
Once deliberately offers no Reset method. Why?
Three reasons:
- Race correctness. A "reset" would have to flip
doneback to 0 atomically. If a concurrent goroutine is in the fast path observingdone == 0and proceeds to the slow path, it must see a consistent mutex state. The semantics of "reset while concurrent callers exist" are very hard to define cleanly. - Use-case mismatch. The vast majority of uses are commit-forever (singleton, init, idempotent close). The minority that want reset are better served by
atomic.Pointerwith explicit replacement. - API minimalism. Go stdlib resists adding methods.
Oncedoes one thing.
If you want reset, build it yourself: replace the Once value under a mutex, as described in middle and senior level. Or, more cleanly, use a different abstraction.
15. Bug hunt: the historical Once issue¶
Before Go 1.18, Once used atomic.LoadUint32 and atomic.StoreUint32 on a plain uint32 field. There was a documented issue (golang/go#41690, fixed in 1.18) where the fast path optimisation could be wrong on 32-bit platforms due to alignment of uint32 in a struct. The fix was to ensure done was the first field. The change to atomic.Uint32 in 1.19+ embeds the alignment requirement in the type, removing the trap.
This is a useful historical note: the algorithm is simple, but getting atomic on every architecture right took real iteration. The current implementation is the result of many years of refinement of a 12-line algorithm.
16. Summary¶
sync.Once, viewed at professional depth:
- State: a 4-byte atomic flag + an 8-byte mutex = 12 bytes (24 with alignment).
- Algorithm: double-checked locking, with atomic load on the fast path and mutex + double-check on the slow path.
- Correctness: justified by the Go memory model — atomic store-with-lock-held creates happens-before with atomic load-on-fast-path.
- Panic semantics: deferred store of
done = 1runs on panic, soOnceis permanently "done" after a panickingf. - Cost: ~0.7 ns on the fast path, ~30 ns +
f's cost on the uncontended slow path, microseconds per loser under contention. - Go 1.21 helpers: thin wrappers that add return values, GC release of the closure, and loud panic replay.
The standard library source is ~80 lines. Read it. It is one of the cleanest, smallest, most-relied-on primitives in Go, and it is fully explicable in a single sitting.
Next, specification level catalogues the formal API contract and links the standard library documentation, the Go memory model, and the proposal documents for the 1.21 additions.