Observer Pattern — Optimization¶
1. How to use this file¶
Twelve scenarios where Observer-style code is slower than it needs to be. Each:
- Scenario — the inefficiency.
- Before — code with realistic benchmark.
- After (collapsible) — fix + benchmark + why faster + trade-offs + when NOT.
The honest answer for many Observer "optimizations": they don't matter until you have 1000+ observers or 100k+ events/sec. Below that, simpler code wins.
Anchored at Go 1.22, amd64.
2. Exercise 1 — Mutex around slice → atomic.Pointer copy-on-write¶
Before:
type Subject struct {
mu sync.RWMutex
observers []Observer
}
func (s *Subject) Notify(e Event) {
s.mu.RLock()
defer s.mu.RUnlock()
for _, o := range s.observers {
o.Notify(e)
}
}
After
type Subject struct {
observers atomic.Pointer[[]Observer]
}
func (s *Subject) Notify(e Event) {
obs := s.observers.Load()
if obs == nil { return }
for _, o := range *obs {
o.Notify(e)
}
}
func (s *Subject) Subscribe(o Observer) {
for {
old := s.observers.Load()
var next []Observer
if old != nil {
next = make([]Observer, len(*old)+1)
copy(next, *old)
} else {
next = make([]Observer, 0, 1)
}
next = append(next, o)
if s.observers.CompareAndSwap(old, &next) {
return
}
}
}
3. Exercise 2 — Lock held during notification¶
Before:
func (s *Subject) Notify(e Event) {
s.mu.Lock()
defer s.mu.Unlock()
for _, o := range s.observers {
o.Notify(e) // slow observer blocks others
}
}
After
Slight overhead from the slice copy, but Subscribe no longer waits behind notification. Much better latency under contention. **Trade-off:** Per-call allocation of the snapshot slice. **When NOT:** When the observer list is huge (10k+). The snapshot allocation dominates.4. Exercise 3 — Per-observer goroutine¶
Before:
After — Worker pool
type Subject struct {
queue chan workItem
}
type workItem struct {
observer Observer
event Event
}
func (s *Subject) Notify(e Event) {
for _, o := range s.observers {
s.queue <- workItem{observer: o, event: e}
}
}
func (s *Subject) workerLoop() {
for w := range s.queue {
w.observer.Notify(w.event)
}
}
5. Exercise 4 — Channel fan-out per event¶
Before:
After — Batch events
type batch struct{ events []Event }
func (s *Subject) Notify(e Event) {
s.mu.Lock()
s.pending = append(s.pending, e)
s.mu.Unlock()
}
func (s *Subject) flushLoop() {
t := time.NewTicker(1 * time.Millisecond)
defer t.Stop()
for range t.C {
s.mu.Lock()
batch := s.pending
s.pending = nil
s.mu.Unlock()
if len(batch) == 0 { continue }
for _, ch := range s.subscribers {
ch <- batch
}
}
}
6. Exercise 5 — Allocation per notify¶
Before:
func (s *Subject) Notify(e Event) {
snapshot := make([]Observer, len(s.observers))
copy(snapshot, s.observers)
for _, o := range snapshot {
o.Notify(e)
}
}
After — atomic.Pointer eliminates the copy
See Exercise 1. The atomic.Pointer pattern doesn't need per-call snapshots — the slice is shared but immutable (subscribers replace the pointer, never mutate the slice in place). Avoids both the lock and the per-call allocation. **When NOT:** Same as Exercise 1.7. Exercise 6 — Map-based topic dispatch¶
Before:
type Bus struct {
topics map[string][]chan Event
}
func (b *Bus) Publish(topic string, e Event) {
for _, ch := range b.topics[topic] { // map lookup
ch <- e
}
}
After — Cache the result for hot topics
type Bus struct {
topics atomic.Pointer[map[string][]chan Event]
hot atomic.Pointer[[]chan Event] // cache for the hottest topic
hotKey string
}
func (b *Bus) Publish(topic string, e Event) {
if topic == b.hotKey {
for _, ch := range *b.hot.Load() {
ch <- e
}
return
}
m := b.topics.Load()
if subs, ok := (*m)[topic]; ok {
for _, ch := range subs {
ch <- e
}
}
}
8. Exercise 7 — Slow observer dragging others¶
Before: Synchronous notification with one slow observer slows all of them.
After — Drop policy with metrics
Fast observers always get the event; slow ones drop. The drops counter exposes the problem. **Trade-off:** Lossy. Need to monitor drops to detect a misbehaving observer. **When NOT:** Audit logs, financial events — when loss is unacceptable.9. Exercise 8 — Notification with reflect (typed events)¶
Before:
After — Typed Subjects per event type
20× faster. No reflection. Compile-time type safety. **Trade-off:** Each event type needs its own Subject. For 50 event types, 50 instances. **When NOT:** Highly dynamic event types known only at runtime.10. Exercise 9 — Unbounded subscriber growth¶
Scenario: Memory grows as subscribers accumulate but are never removed.
After — Cap + LRU eviction
type Subject struct {
mu sync.Mutex
subs *list.List // doubly linked
maxLen int
}
func (s *Subject) Subscribe() *list.Element {
s.mu.Lock()
defer s.mu.Unlock()
if s.subs.Len() >= s.maxLen {
oldest := s.subs.Front()
s.subs.Remove(oldest)
close(oldest.Value.(chan Event))
}
ch := make(chan Event, 10)
return s.subs.PushBack(ch)
}
11. Exercise 10 — RWMutex contention¶
Before: All observers share one RWMutex. Under high read concurrency, the atomic increment on readerCount shows in profiles.
After — Sharded subscriber map
Subscriptions are partitioned by topic/key. Each shard has its own mutex. 16-way reduction in contention. **Trade-off:** Notify-all is more expensive (iterate all shards). Use only when subscriptions are naturally sharded by key. **When NOT:** Small subscriber count or notify-all workloads.12. Exercise 11 — Closure allocation per Subscribe¶
Before:
func (s *Subject) Subscribe(handler func(Event)) {
s.handlers = append(s.handlers, handler)
}
// Caller:
s.Subscribe(func(e Event) { /* closure escapes to heap */ })
Each Subscribe call allocates the closure (~32 bytes).
After — Pool closures or accept a struct
For pooling closures, accept a struct that implements an interface: If the handler is statically known, the allocation goes away. **Trade-off:** Less ergonomic than passing a closure. **When NOT:** Subscribe is rare (boot time). One allocation per Subscribe is invisible.13. Exercise 12 — Backpressure choice¶
Three strategies:
- Block:
ch <- e— preserves order, slowest. Producer waits for slowest consumer. - Drop:
select { case ch <- e: default: }— fast, lossy. - Buffer: large channel buffer + drop on full. Compromise.
Pick by: - Loss tolerance: financial → block; metrics → drop. - Order: ordered streams → block or batch; out-of-order OK → drop. - Latency: real-time → drop or block; analytical → buffer.
There's no general winner. Benchmark with your real workload.
14. When NOT to optimize¶
Most Observer code doesn't need optimization. The pattern is fast by default. Check:
- Does pprof show Observer code in the top 10? If not, skip.
- Is the QPS high enough to matter? 1k Notify/sec at 1µs = 1ms/sec of CPU. Irrelevant.
- Are observers themselves the bottleneck? Often the answer is "yes — the slow observer is the problem, not the dispatch".
Common premature optimizations: - Switching to atomic.Pointer for a 10-observer system (RWMutex is fine). - Sharding for non-sharded workloads (notify-all dominates). - Worker pool for 100 events/sec (synchronous is faster).
15. Summary¶
Wins that always ship: - Snapshot the observer list before notification (release lock before iteration). - Use select-default for slow-observer drop policy. - Use buffered channels (1+) for subscription.
Wins behind a profile: - atomic.Pointer for read-heavy lists. - Worker pool for high-rate asynchronous notification. - Typed Subjects to avoid reflection.
Rarely worth it: - Sharded subscriber maps (only at 1k+ subscribers). - Closure pooling (only at very high Subscribe rates). - Hot-key caching (only when one key dominates).
Profile first. Optimize what shows up. Don't optimize what doesn't.