Lazy Initialization — Professional Level¶
Category: Object & State Patterns — defer creating an expensive value until first use, then cache it.
Prerequisites: Junior · Middle · Senior Focus: Under the hood
Table of Contents¶
- Introduction
- JVM Class-Init Locking Mechanics
- volatile Read Cost and Acquire/Release
- sync.Once Internals
- Python cached_property Descriptor Mechanics
- Startup Latency vs First-Access Spike
- Memory Retention and GC
- Benchmarks
- Diagrams
- Related Topics
Introduction¶
Lazy initialization's runtime cost lives in three places: the synchronization barrier on the access path, the one-time construction spike, and the memory retained after construction. At the professional level you should be able to reason about each — why the holder idiom has zero read-barrier cost, what a volatile read actually emits, how sync.Once reaches a lock-free fast path, and how to model the startup-vs-first-access trade quantitatively.
JVM Class-Init Locking Mechanics¶
The holder idiom is fast because of where the JVM puts the lock.
The JLS §12.4.2 initialization procedure acquires an initialization lock on the Class object, sets a state flag, runs the static initializer, then releases. Crucially:
- The lock is held only during the first initialization. Once the class is marked initialized, the JIT compiles
Holder.INSTANCEto a plain static field load — no lock, no barrier, no flag check in the hot path. - The class-init happens-before edge means the constructed instance is safely published to every thread that later reads the field — for free.
So the holder idiom is "DCL done by the JVM," but with the recheck-and-barrier cost amortized to zero after the one-time init, because the JIT can prove the class is already initialized and elide the guard. That's why it beats hand-written volatile DCL, which retains a volatile load on every access.
volatile Read Cost and Acquire/Release¶
A volatile field gives DCL its correctness, but it isn't free.
- x86/x86-64: a
volatileread is an ordinaryMOV— the hardware already provides acquire semantics for loads (TSO memory model). The cost is mostly a compiler barrier that blocks reordering optimizations, not a CPU fence. Avolatilewrite emits a store-load fence (mfork/lock-prefixed instruction), which is the expensive side. - ARM/AArch64 (weak memory model): a
volatileread emits a real load-acquire (LDAR) and the write a store-release (STLR). Both are genuine instructions with measurable cost.
Implication: DCL-with-volatile is nearly free on x86 reads but not on ARM. The holder idiom's plain static load is free on both, which matters as ARM (Apple Silicon, Graviton) dominates server fleets.
// volatile DCL: one acquire-load per access, forever
private volatile Heavy h;
// holder: one class-init, then plain loads forever
private static final class H { static final Heavy I = new Heavy(); }
sync.Once Internals¶
sync.Once reaches a lock-free fast path with a single atomic load:
// simplified from the Go runtime
type Once struct {
done atomic.Uint32
m Mutex
}
func (o *Once) Do(f func()) {
if o.done.Load() == 0 { // fast path: one atomic load
o.doSlow(f)
}
}
func (o *Once) doSlow(f func()) {
o.m.Lock()
defer o.m.Unlock()
if o.done.Load() == 0 { // re-check under lock (DCL!)
defer o.done.Store(1) // store AFTER f() runs
f()
}
}
Three things to note:
- It is double-checked locking —
doneis thevolatile-equivalent flag, atomic load/store provide the barriers. doneis set afterf()returns, so a concurrentDoblocks on the mutex until init completes — no half-constructed observation.- The common path after init is a single
atomic.Loadofdone(relaxed-ish, but ordered) — essentially the cost of a plain load on x86, a load-acquire on ARM.
sync.OnceValue wraps this and additionally caches the return value, panicking-through on re-entry if f panicked.
Python cached_property Descriptor Mechanics¶
cached_property is a non-data descriptor (it defines __get__ but not __set__). That's the entire trick:
- Instance
__dict__lookups take priority over non-data descriptors. - On first access,
__get__runs the function and writes the result intoinstance.__dict__[name]. - On every later access, the attribute is found in
__dict__before the descriptor is consulted — so the descriptor's__get__never runs again. No flag, no branch: the cache is the instance dict entry.
class cached_property: # ~stdlib shape
def __set_name__(self, owner, name): self.attrname = name
def __get__(self, instance, owner=None):
if instance is None: return self
val = self.func(instance)
instance.__dict__[self.attrname] = val # shadows the descriptor forever
return val
Consequences: - Needs a writable __dict__. With __slots__, the name must be a declared slot or it fails. - Post-3.12: no internal lock — the write isn't atomic against concurrent first reads, so two threads can both compute. The last write wins; if construction is idempotent this is harmless, otherwise add a lock. - It cannot be invalidated except by del instance.__dict__[name].
Startup Latency vs First-Access Spike¶
Lazy init reshapes a fixed cost across time. Model it:
- Let
C= construction cost of the value,p= probability a given object/run ever accesses it,N= number of objects. - Eager total:
N · C(paid at construction, on the critical path). - Lazy total:
p · N · C(paid at first access, spread out).
Lazy wins on total work whenever p < 1. But "total work" isn't the only axis:
| Axis | Eager | Lazy |
|---|---|---|
| Critical-path (startup) cost | N·C up front | ~0 |
| p95 request latency | flat | spiky (the unlucky first request pays C) |
| Tail latency | predictable | a cold first-access can blow the SLO |
The professional move: lazy by default, but warm critical lazies at startup (touch them in a background goroutine/thread post-boot) so users never eat the spike — getting lazy's "don't build the unused" and eager's flat latency.
Memory Retention and GC¶
Lazy init affects when an allocation appears, not whether it can be collected.
- A lazily-built value is reachable from its owner, so it lives as long as the owner — identical retention to eager, just delayed.
- Soft/weak references turn lazy init into a recomputable cache: hold the value softly; if the GC reclaims it under pressure, the next access rebuilds it.
private SoftReference<Heavy> ref = new SoftReference<>(null);
public synchronized Heavy heavy() {
Heavy h = ref.get();
if (h == null) { h = build(); ref = new SoftReference<>(h); }
return h;
}
- This blurs lazy init into memoization with eviction. Watch out:
SoftReferencekeeps objects alive longer than you'd think (until near-OOM), which can increase GC pressure, not reduce it. - In Go, there is no soft reference; a weak-ref-like pattern requires
runtime.AddCleanup/ finalizers or a manual size-bounded cache.
Benchmarks¶
Apple M2 Pro (ARM), single thread, post-warmup. Illustrative, not authoritative — measure your own.
Java (JMH) — per-access read cost after init¶
Benchmark Mode Cnt Score Units
EagerFinalFieldRead avgt 10 0.4 ns/op
HolderIdiomRead avgt 10 0.4 ns/op (plain static load)
VolatileDCLRead avgt 10 1.1 ns/op (LDAR on ARM)
SynchronizedGetter avgt 10 15.0 ns/op (lock every call — avoid)
The holder idiom matches a plain field read; naive synchronized getters are ~35× slower on the steady-state path.
Go (go test -bench) — per-access after init¶
BenchmarkPlainField-8 1000M 0.3 ns/op
BenchmarkSyncOnce-8 500M 1.8 ns/op (atomic load of done)
BenchmarkMutexEveryCall-8 60M 18.0 ns/op (lock every call — avoid)
Python — per-access after first¶
plain attribute read ~30 ns
cached_property (after 1st) ~35 ns (dict lookup, descriptor bypassed)
@property + manual flag ~80 ns (method call + branch every access)
lru_cache(maxsize=None) ~120 ns (hashing + dict, even for one key)
cached_property is near-free after the first hit because the descriptor is shadowed; @property with a flag pays a method call forever.
Diagrams¶
Holder idiom: lock once, free forever¶
Where the cost lives¶
Related Topics¶
- JVM class init: JLS §12.4.2; Java Concurrency in Practice (Goetz), §16 on the memory model.
- Memory model: JSR-133 FAQ; the Go memory model spec.
- Generalization: Memoization & Caching — Professional.
- Practice: Interview · Tasks · Find-Bug · Optimize
← Senior · Object & State · Next: Interview
In this topic