Singleton Pattern — Under the Hood¶
1. What this level covers¶
What happens at the runtime and compiler level when Singleton code runs:
sync.Oncelayout —done uint32 + mutex.- The atomic fast path on
Do(). - The slow path (mutex acquire, second check, function call).
- Why double-checked locking without atomics doesn't work in Go.
- Memory order:
atomic.LoadUint32semantics. init()ordering algorithm.- Package-level var initialization order.
atomic.Pointerfor hot-reload.- Escape analysis for singleton-returning factories.
- Assembly for the Once check.
Anchored at Go 1.22, amd64.
2. Table of Contents¶
- What this level covers
sync.Oncestruct layout- The atomic fast path
- The slow path
- Why DCL without atomics fails in Go
- Memory order semantics
- Init() ordering algorithm
- Package-level var init within a package
atomic.Pointerfor hot-reload- Escape analysis for singleton factories
- Assembly for Once.Do
runtime.Oncesource dive- Benchmarks
- Tricky questions
- Further reading
2. sync.Once struct layout¶
From src/sync/once.go:
12 bytes total (plus padding to align to 8 bytes if needed). The done field is checked atomically; the m mutex is acquired only on the slow path.
The order matters: done first so it's loaded with one aligned read. Mutex is laid out after.
3. The atomic fast path¶
// Simplified from src/sync/once.go
func (o *Once) Do(f func()) {
if o.done.Load() == 0 {
o.doSlow(f)
}
}
Most calls hit the fast path: 1. Atomic load of done. 2. If 1, return. Total cost: ~1-2 ns.
atomic.Uint32.Load() is essentially MOVL+memory barrier. On amd64, loads are already ordered, so the barrier is free on the hardware level.
4. The slow path¶
func (o *Once) doSlow(f func()) {
o.m.Lock()
defer o.m.Unlock()
if o.done.Load() == 0 { // re-check inside the lock
defer o.done.Store(1)
f()
}
}
If done is 0 at first check: 1. Acquire mutex. 2. Re-check done (another goroutine may have completed f while we were acquiring). 3. If still 0, defer setting done=1, then call f. 4. Release mutex.
The double-check prevents two goroutines from both running f.
The defer o.done.Store(1) runs before the defer o.m.Unlock() because defers are LIFO. So done is set before the lock is released — readers seeing done=1 are guaranteed to see all side effects of f.
5. Why DCL without atomics fails in Go¶
A naive attempt:
var initialized bool
var instance *Thing
var mu sync.Mutex
func Get() *Thing {
if !initialized { // race
mu.Lock()
defer mu.Unlock()
if !initialized {
instance = newThing()
initialized = true
}
}
return instance
}
The race: the first if !initialized reads without synchronization. Two issues:
- Race detector reports it. Two goroutines reading and writing
initializedwithout proper sync. - Memory ordering.
initialized = truemay become visible to other goroutines beforeinstance = newThing()is fully written. A reader might seeinitialized=truebutinstance=nil(or a partially-constructed instance).
sync.Once's use of atomic.Uint32.Load/Store provides proper memory barriers. Hand-rolled DCL without atomics is unsafe in Go (and in Java pre-2004 — see "Double-Checked Locking is Broken" Declaration).
6. Memory order semantics¶
Go's memory model says: "A program that modifies data being simultaneously accessed by multiple goroutines must serialize such access" — via channels, mutexes, or atomic operations.
atomic.Uint32.Load has acquire semantics: subsequent reads can't be reordered before it. atomic.Uint32.Store has release semantics: prior writes can't be reordered after it.
The pair guarantees:
// Goroutine A:
instance = newThing()
done.Store(1)
// Goroutine B:
if done.Load() == 1 {
use(instance) // sees fully-initialized instance
}
The Load-Store pair forms a happens-before relationship. B's use(instance) happens-after A's instance = newThing().
This is what makes sync.Once correct. Without atomic ops, even the order of statements in the slow path doesn't guarantee visibility.
7. Init() ordering algorithm¶
From the Go spec:
Package initialization—variable initialization and the invocation of init functions—happens in a single goroutine, sequentially, one package at a time.
The rules:
- Imports first. A package's dependencies are fully initialized before the package itself starts.
- Variables in declaration order. Within a package, variables initialize in declaration order, across files in lexical order by filename.
init()runs after var init. Eachinit()function in the package runs after all package-level vars are initialized.- Multiple
init()in same package. Run in declaration order, by filename.
So:
// File a.go
var aVar = computeA() // 1st
// File b.go
var bVar = computeB() // 2nd
// File a.go
func init() { /* runs after vars */ } // 3rd
// File b.go
func init() { /* */ } // 4th
The dependence on file name ordering is fragile. Renaming a.go to z.go changes init order. Don't rely on cross-file ordering; use explicit Init(ctx) if you need it.
8. Package-level var init within a package¶
Wait — does this compile? Yes. The compiler analyzes dependencies and runs b = 10 before a = b + 1, regardless of source order. This is called dependency-ordered initialization.
But:
Here f() reads b at the moment of a's initialization. If b hasn't initialized yet (depending on the compiler's order), b has its zero value.
The compiler tries to detect this and order initialization to avoid the issue — but cross-package dependencies can defeat it. Initializer functions that reference other package-level vars are fragile.
9. atomic.Pointer for hot-reload¶
var cfg atomic.Pointer[Config]
func Reload(c *Config) { cfg.Store(c) }
func Get() *Config { return cfg.Load() }
atomic.Pointer[T] is a type-safe wrapper around unsafe.Pointer with atomic semantics. Internally:
// src/sync/atomic/type.go (paraphrased)
type Pointer[T any] struct {
_ noCopy
v unsafe.Pointer
}
func (x *Pointer[T]) Load() *T {
return (*T)(LoadPointer(&x.v))
}
func (x *Pointer[T]) Store(val *T) {
StorePointer(&x.v, unsafe.Pointer(val))
}
LoadPointer is a single atomic load — one instruction on amd64. StorePointer is an atomic store with release semantics. Together they form the same happens-before guarantees as sync.Once.
The reload is non-blocking. Readers see either the old or the new pointer, never a torn read.
10. Escape analysis for singleton factories¶
var singleton *Thing
func get() *Thing {
if singleton == nil {
singleton = newThing()
}
return singleton
}
The compiler analyzes whether *Thing escapes: - newThing() returns a pointer; the result is stored in singleton, a package-level var. - Package-level vars are in the heap (data segment, but reachable forever). - So newThing()'s result escapes to the heap.
Output:
The singleton itself doesn't move; what allocates is the Thing instance. For a singleton, this is fine (one allocation, lives forever).
11. Assembly for Once.Do¶
Compiled Once.Do on amd64 (simplified):
TEXT sync.(*Once).Do
MOVL (AX), CX ; load done field
TESTL CX, CX ; test for zero
JNE done ; if non-zero, skip
LEAQ +24(FP), DX ; capture f
CALL sync.(*Once).doSlow(SB)
done:
RET
The hot path: 3 instructions (load, test, branch). At ~0.3 ns each, sub-nanosecond total.
The slow path (doSlow) calls the mutex Lock/Unlock runtime functions. Total slow-path cost: ~50-100 ns for an uncontended mutex.
12. runtime.Once source dive¶
Read src/sync/once.go:
type Once struct {
done atomic.Uint32
m Mutex
}
func (o *Once) Do(f func()) {
if o.done.Load() == 0 {
o.doSlow(f)
}
}
func (o *Once) doSlow(f func()) {
o.m.Lock()
defer o.m.Unlock()
if o.done.Load() == 0 {
defer o.done.Store(1)
f()
}
}
Twelve lines. The entire implementation. Read it; understand it.
Note: OnceFunc, OnceValue, OnceValues (Go 1.21+) are convenience wrappers around Once:
Useful for ergonomics; same underlying cost.
13. Benchmarks¶
BenchmarkOnceFastPath-8 1000000000 0.3 ns/op 0 B/op
BenchmarkOnceFirstCall-8 30000000 45 ns/op 0 B/op
BenchmarkMutexLazyInit-8 50000000 24 ns/op 0 B/op
BenchmarkAtomicPtrLoad-8 2000000000 0.2 ns/op 0 B/op
BenchmarkInitFunc-8 1000000000 0 ns/op (compile-time)
(Per-call benchmarks; the first call to Once.Do is the slow path; subsequent are the fast path.)
Observations: - Once.Do fast path is essentially free. - Hand-rolled mutex lazy-init is 80× slower per call (every call acquires the mutex). - atomic.Pointer.Load is even faster than Once.Do (no done field check). - init()-based singletons have zero per-call cost — but pay it at startup.
14. Tricky questions¶
Q1. Why does this code race even though there's only one writer?
Answer
Even with one writer and one reader, the absence of synchronization means the read can see *any* value — 0, 1, or torn (on architectures with non-atomic int writes, though Go guarantees word-sized atomicity). The race detector flags this because the *happens-before* relationship is missing. Without it, the compiler/CPU can reorder. Fix: `atomic.Int32` or a channel for synchronization.Q2. What happens if f() in Once.Do(f) panics?
Answer
The `defer o.done.Store(1)` runs *only* if `f()` returns normally. If `f()` panics, the defer doesn't execute (well, actually it does — but it's the panic recovery path). Wait, that's wrong. Let me check. Actually: in Go, a deferred call *does* run during panic unwinding. So `o.done.Store(1)` runs *even on panic*. This means the Once is marked "done" even though `f()` failed. Subsequent calls to `o.Do(f)` find `done=1` and skip — the failure is silently ignored. This is why the `OnceValue`/`OnceValues` variants (Go 1.21+) capture the panic and re-raise on subsequent calls. For application code: don't let `f()` panic. Use `OnceValues` if you need to capture and re-raise errors.Q3. Can you copy a sync.Once?
Answer
Technically yes (it's a struct). But copying loses the synchronization — the copy's `done` field and `m` mutex are independent of the original. `sync.Once` has an unexported `noCopy` field (since Go 1.21) that triggers `go vet` warnings on copy. Don't copy. If you need to embed a Once-protected value in a struct that's copyable, indirect through a pointer: `type MyType struct { o *sync.Once }`.Q4. Why is atomic.Pointer.Load faster than sync.Once.Do in benchmarks?
Answer
`Once.Do` has to: 1. Atomic load of `done`. 2. Test the value. 3. Branch (conditional jump). `atomic.Pointer.Load`: 1. Atomic load of pointer. One step vs three. The savings are pico-seconds but consistent. For singletons accessed millions of times per second, `atomic.Pointer` shaves a real percentage of CPU.15. Further reading¶
src/sync/once.go— Once implementation (12 lines)src/sync/atomic/— atomic primitives- Go memory model — https://go.dev/ref/mem
- Russ Cox, "Go Memory Model" — the spec
- "Double-Checked Locking is Broken" Declaration — historical context
- JSR-133 — Java memory model that informed Go's
sync.Once is one of the most-used and least-understood primitives in Go. The 12-line implementation hides considerable subtlety in memory ordering. Read the source.