Strategy Pattern — Optimize¶
1. Goal of this file¶
This file is about when a naïve strategy is slow or wasteful, and when the fix is worth shipping. Junior taught the two shapes (interface, function). Middle taught the variants — adapters, registries, composition, testing. Optimize is about the cases where a textbook strategy shows up in a CPU or allocation profile and you have to do something about it.
The honest envelope: most strategies are set once at startup (NewProcessor(stripeGateway)), called from request handlers at hundreds to thousands of QPS, and never measured. At those frequencies, the pattern is essentially free — one indirect call, ~1-2 ns of overhead, zero allocations. Nobody notices.
It becomes visible when:
- The strategy is called per element in a tight loop (sorting comparators, codec encode/decode per byte chunk).
- The strategy is constructed per call via a closure that captures state (
func sortBy(field string) func(...) bool). - The strategy is looked up by name from a registry on every request instead of resolved once.
- The strategy chain uses
reflectfor dispatch instead of type assertion or type switch. - The strategy needs a goroutine per call when a worker pool would do.
- The strategy interface is wide with stub implementations on the hot path.
Baseline you need to beat. From middle.md §13:
BenchmarkDirectCall-8 1000000000 0.85 ns/op 0 B/op 0 allocs/op
BenchmarkFunctionStrategy-8 1000000000 0.91 ns/op 0 B/op 0 allocs/op
BenchmarkInterfaceStrategy-8 700000000 1.62 ns/op 0 B/op 0 allocs/op
BenchmarkClosureCapture-8 500000000 2.10 ns/op 16 B/op 1 allocs/op
A direct call is 0.85 ns. A function strategy is ~1 ns. An interface dispatch adds another ~0.7 ns. A closure that escapes costs one allocation. That's the budget — most optimizations in this file fight for the difference between "1.6 ns and 0 allocs" and "200 ns and 3 allocs", which usually means killing closure captures, reflect, or per-call registry lookups.
Structure of the file:
- Real wins (§3–§9): pre-built closures, devirtualized dispatch, cached registry lookup, type-switch over reflect, shared mocks, segregated interfaces, flattened decorator chains.
- Wins that aren't always wins (§10–§14): boot-time strategy slice, direct field access in generics, worker pool over per-call goroutines, type-safe comparators, cached parsed config.
- Cost-benefit framing (§15).
2. Table of Contents¶
- Goal of this file
- Table of Contents
- Exercise 1: Closure capture allocates on every call
- Exercise 2: Interface dispatch in a hot path
- Exercise 3: Registry map lookup on every request
- Exercise 4: Strategy chain calling
reflectinstead of type assertion - Exercise 5: Mock strategy allocating per test call
- Exercise 6: Wide interface with stub methods
- Exercise 7: Decorator chain with deep nesting
- Exercise 8: Strategy slice rebuilt every request
- Exercise 9: Generic strategy with closure-list — replace with direct field access
- Exercise 10: Strategy spawning a goroutine per call
- Exercise 11: Strategy comparison using
reflect.DeepEqual - Exercise 12: JSON-loaded strategy re-reading config every call
- When NOT to optimize
- The optimization checklist
- Summary
3. Exercise 1: Closure capture allocates on every call¶
Scenario¶
A helper builds a comparator strategy by capturing a field argument. The closure is constructed inside the request handler and thrown away after the sort. At thousands of sorts per second, each one allocates a fresh closure on the heap.
Before¶
package usersort
import (
"sort"
"time"
)
type User struct {
Name string
Age int
CreatedAt time.Time
}
// lessByField returns a comparator. Each call captures `field` and `items`.
func lessByField(field string, items []User) func(i, j int) bool {
return func(i, j int) bool {
switch field {
case "name":
return items[i].Name < items[j].Name
case "age":
return items[i].Age < items[j].Age
case "created":
return items[i].CreatedAt.Before(items[j].CreatedAt)
}
return false
}
}
func SortBy(field string, items []User) {
sort.Slice(items, lessByField(field, items))
}
Benchmark¶
func BenchmarkClosureSort(b *testing.B) {
b.ReportAllocs()
items := makeUsers(100)
for i := 0; i < b.N; i++ {
SortBy("age", items)
}
}
The 48 B / 2 allocs is the closure itself (it captures field and items, both escape). The closure body's switch field runs on every comparison — n log n times per sort.
After
Resolve the field-to-comparator once, *outside* the closure. The comparator captures only `items`, not the string, and skips the per-call switch.package usersort
import (
"sort"
"time"
)
type User struct {
Name string
Age int
CreatedAt time.Time
}
// Comparators are package-level — no allocation, no capture of `field`.
// Each takes the slice as argument (not captured).
func lessByName(items []User) func(i, j int) bool {
return func(i, j int) bool { return items[i].Name < items[j].Name }
}
func lessByAge(items []User) func(i, j int) bool {
return func(i, j int) bool { return items[i].Age < items[j].Age }
}
func lessByCreated(items []User) func(i, j int) bool {
return func(i, j int) bool { return items[i].CreatedAt.Before(items[j].CreatedAt) }
}
func SortBy(field string, items []User) {
var less func(i, j int) bool
switch field {
case "name": less = lessByName(items)
case "age": less = lessByAge(items)
case "created": less = lessByCreated(items)
default: return
}
sort.Slice(items, less)
}
4. Exercise 2: Interface dispatch in a hot path¶
Scenario¶
A bytes pipeline encodes records via an interface. The encoder is set once at startup but called per record. At millions of records per second, the indirect interface call dominates the budget. When the concrete type is known statically, the indirection is pure overhead.
Before¶
package encode
type Encoder interface {
Encode(v int64, out []byte) []byte
}
type Varint struct{}
func (Varint) Encode(v int64, out []byte) []byte {
uv := uint64(v) << 1
if v < 0 {
uv = ^uv
}
for uv >= 0x80 {
out = append(out, byte(uv)|0x80)
uv >>= 7
}
return append(out, byte(uv))
}
type Pipeline struct {
enc Encoder
}
func NewPipeline(e Encoder) *Pipeline { return &Pipeline{enc: e} }
func (p *Pipeline) EncodeAll(records []int64, out []byte) []byte {
for _, r := range records {
out = p.enc.Encode(r, out)
}
return out
}
Benchmark¶
func BenchmarkPipelineEncode(b *testing.B) {
b.ReportAllocs()
p := NewPipeline(Varint{})
records := make([]int64, 1024)
for i := range records { records[i] = int64(i) }
out := make([]byte, 0, 4096)
b.ResetTimer()
for i := 0; i < b.N; i++ {
out = p.EncodeAll(records, out[:0])
}
}
2.9 microseconds per 1024 records — about 2.9 ns per record, almost half of which is the interface dispatch (the encode itself is only ~1.5 ns).
After
Two paths. The cleanest is to make the pipeline generic over the concrete encoder type. The compiler then specializes the loop with a direct call.package encode
type Encoder interface {
Encode(v int64, out []byte) []byte
}
type Varint struct{}
func (Varint) Encode(v int64, out []byte) []byte {
uv := uint64(v) << 1
if v < 0 { uv = ^uv }
for uv >= 0x80 {
out = append(out, byte(uv)|0x80)
uv >>= 7
}
return append(out, byte(uv))
}
// Generic pipeline — `E` is the concrete encoder type.
type Pipeline[E Encoder] struct {
enc E
}
func NewPipeline[E Encoder](e E) *Pipeline[E] { return &Pipeline[E]{enc: e} }
func (p *Pipeline[E]) EncodeAll(records []int64, out []byte) []byte {
for _, r := range records {
out = p.enc.Encode(r, out) // direct call, not interface dispatch
}
return out
}
5. Exercise 3: Registry map lookup on every request¶
Scenario¶
A codec library uses the registry pattern (middle.md §5): codecs are registered by name in init(), and the request handler resolves the codec by name on every request. The map lookup is fast (~30 ns) but unnecessary when the name is stable for a given service.
Before¶
package codec
import (
"fmt"
"sync"
)
type Codec interface {
Encode([]byte) []byte
Decode([]byte) ([]byte, error)
}
var (
registry = map[string]Codec{}
registryMu sync.RWMutex
)
func Register(name string, c Codec) {
registryMu.Lock()
defer registryMu.Unlock()
registry[name] = c
}
func Get(name string) (Codec, error) {
registryMu.RLock()
defer registryMu.RUnlock()
c, ok := registry[name]
if !ok {
return nil, fmt.Errorf("codec: unknown %q", name)
}
return c, nil
}
// Caller resolves on every request.
func EncodePayload(name string, data []byte) ([]byte, error) {
c, err := Get(name)
if err != nil { return nil, err }
return c.Encode(data), nil
}
Benchmark¶
func BenchmarkResolveAndEncode(b *testing.B) {
b.ReportAllocs()
Register("gzip", gzipCodec{})
data := []byte("payload")
for i := 0; i < b.N; i++ {
_, _ = EncodePayload("gzip", data)
}
}
The encode itself (a stub returning the same slice) is < 5 ns. The remaining 55+ ns is the RWMutex lock, map lookup, and string hashing.
After
Resolve once at service boot. Cache the `Codec` value in the consumer.package mysvc
import "myorg/codec"
type Service struct {
cdc codec.Codec // resolved once at construction
}
func NewService(codecName string) (*Service, error) {
c, err := codec.Get(codecName)
if err != nil { return nil, err }
return &Service{cdc: c}, nil
}
func (s *Service) EncodePayload(data []byte) []byte {
return s.cdc.Encode(data) // no lookup
}
type CodecCache struct {
cache sync.Map // map[string]Codec
}
func (c *CodecCache) Encode(name string, data []byte) ([]byte, error) {
if v, ok := c.cache.Load(name); ok {
return v.(Codec).Encode(data), nil
}
cdc, err := codec.Get(name)
if err != nil { return nil, err }
c.cache.Store(name, cdc)
return cdc.Encode(data), nil
}
6. Exercise 4: Strategy chain calling reflect instead of type assertion¶
Scenario¶
A middleware chain inspects whether each strategy implements an optional interface. The naïve version uses reflect.TypeOf to check capability — pulling in the reflect package's machinery for a check that a type assertion does in one instruction.
Before¶
package middleware
import (
"context"
"reflect"
)
type Handler interface {
Handle(ctx context.Context, req Request) (Response, error)
}
type Cacheable interface {
CacheKey(req Request) string
}
type Chain struct {
handlers []Handler
}
// supportsCaching uses reflect — slow.
func supportsCaching(h Handler) bool {
t := reflect.TypeOf(h)
cacheableType := reflect.TypeOf((*Cacheable)(nil)).Elem()
return t.Implements(cacheableType)
}
func (c *Chain) Handle(ctx context.Context, req Request) (Response, error) {
for _, h := range c.handlers {
if supportsCaching(h) {
// do cache lookup
}
resp, err := h.Handle(ctx, req)
if err != nil { return Response{}, err }
_ = resp
}
return Response{}, nil
}
Benchmark¶
func BenchmarkReflectChain(b *testing.B) {
b.ReportAllocs()
c := &Chain{handlers: []Handler{cachingHandler{}, plainHandler{}, plainHandler{}}}
req := Request{}
for i := 0; i < b.N; i++ {
_, _ = c.Handle(context.Background(), req)
}
}
reflect.Type.Implements does method-set comparison every call. The 288 B / 12 allocs come from reflect.Type value boxing.
After
Type assertion. One instruction, no allocation, identical semantics.package middleware
import "context"
type Handler interface {
Handle(ctx context.Context, req Request) (Response, error)
}
type Cacheable interface {
CacheKey(req Request) string
}
type Chain struct {
handlers []Handler
}
func (c *Chain) Handle(ctx context.Context, req Request) (Response, error) {
for _, h := range c.handlers {
if ck, ok := h.(Cacheable); ok {
_ = ck.CacheKey(req) // direct call via the asserted interface
}
resp, err := h.Handle(ctx, req)
if err != nil { return Response{}, err }
_ = resp
}
return Response{}, nil
}
7. Exercise 5: Mock strategy allocating per test call¶
Scenario¶
A test for a per-request handler creates a fresh mock strategy on every iteration. The mock is small but the test does 100k iterations — that's 100k mock allocations the test runner has to garbage-collect, and the benchmark measures GC noise instead of the code being tested.
Before¶
package payment_test
import (
"context"
"testing"
)
type mockGateway struct {
chargeCount int
failureMode bool
}
func (m *mockGateway) Charge(ctx context.Context, amount int, ccy string) (string, error) {
m.chargeCount++
if m.failureMode { return "", errSimulated }
return "mock_123", nil
}
func BenchmarkProcessor(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
g := &mockGateway{} // fresh allocation
p := NewProcessor(g) // fresh processor
_, _ = p.Process(ctx, sampleOrder)
}
}
Benchmark¶
The processor allocation, the mock allocation, and the order pass-through. The benchmark measures setup, not the processor logic.
After
Reuse the mock and processor across iterations. Reset counter state once per iteration.func BenchmarkProcessor(b *testing.B) {
b.ReportAllocs()
g := &mockGateway{}
p := NewProcessor(g)
b.ResetTimer()
for i := 0; i < b.N; i++ {
_, _ = p.Process(ctx, sampleOrder)
}
// chargeCount == b.N now; verify post-loop if needed.
}
8. Exercise 6: Wide interface with stub methods¶
Scenario¶
A storage backend interface has eight methods. The hot path uses one (Read). Implementations that don't support the others (e.g., a read-only cache layer) return "not supported" errors. Every type-assertion check the consumer might do for capability detection (middle.md §3) becomes "method exists but returns an error" — an interface dispatch that always fails, in the inner loop.
Before¶
package storage
import "errors"
type Storage interface {
Read(key string) ([]byte, error)
Write(key string, value []byte) error
Delete(key string) error
List(prefix string) ([]string, error)
Lock(key string) error
Unlock(key string) error
Subscribe(prefix string) (<-chan Event, error)
Stats() Statistics
}
var ErrNotSupported = errors.New("storage: operation not supported")
// ReadOnlyCache supports only Read. The other seven methods stub out.
type ReadOnlyCache struct {
data map[string][]byte
}
func (c *ReadOnlyCache) Read(key string) ([]byte, error) {
return c.data[key], nil
}
func (c *ReadOnlyCache) Write(string, []byte) error { return ErrNotSupported }
func (c *ReadOnlyCache) Delete(string) error { return ErrNotSupported }
func (c *ReadOnlyCache) List(string) ([]string, error) { return nil, ErrNotSupported }
func (c *ReadOnlyCache) Lock(string) error { return ErrNotSupported }
func (c *ReadOnlyCache) Unlock(string) error { return ErrNotSupported }
func (c *ReadOnlyCache) Subscribe(string) (<-chan Event, error) { return nil, ErrNotSupported }
func (c *ReadOnlyCache) Stats() Statistics { return Statistics{} }
// Hot loop in the consumer:
func LookupMany(s Storage, keys []string) [][]byte {
out := make([][]byte, 0, len(keys))
for _, k := range keys {
if v, err := s.Read(k); err == nil {
out = append(out, v)
}
}
return out
}
Benchmark¶
func BenchmarkLookupMany(b *testing.B) {
b.ReportAllocs()
c := &ReadOnlyCache{data: map[string][]byte{"a": {1, 2}, "b": {3, 4}, "c": {5}}}
keys := []string{"a", "b", "c", "missing", "a", "b"}
var s Storage = c
b.ResetTimer()
for i := 0; i < b.N; i++ {
_ = LookupMany(s, keys)
}
}
810 ns for 6 lookups. The interface dispatch to Read costs ~2 ns; the rest is the map lookup, the len(keys) allocation, and per-key bookkeeping.
The pain isn't this benchmark — it's the compiler's pessimism. With eight methods on Storage, the compiler can't easily devirtualize even when only Read is used. PGO would help; the simpler fix is to narrow the interface.
After
Segregate the interface. `LookupMany` accepts only a `Reader`. Implementations that don't write don't need to declare `Write`.package storage
// Reader is what LookupMany needs. Smaller method set.
type Reader interface {
Read(key string) ([]byte, error)
}
type Writer interface {
Write(key string, value []byte) error
Delete(key string) error
}
type Subscriber interface {
Subscribe(prefix string) (<-chan Event, error)
}
// Storage is the union; implementations declare what they support.
type Storage interface {
Reader
Writer
Subscriber
}
// ReadOnlyCache now satisfies only Reader.
type ReadOnlyCache struct {
data map[string][]byte
}
func (c *ReadOnlyCache) Read(key string) ([]byte, error) {
return c.data[key], nil
}
// LookupMany accepts the narrow interface.
func LookupMany(r Reader, keys []string) [][]byte {
out := make([][]byte, 0, len(keys))
for _, k := range keys {
if v, err := r.Read(k); err == nil {
out = append(out, v)
}
}
return out
}
9. Exercise 7: Decorator chain with deep nesting¶
Scenario¶
A request handler is wrapped in five decorators: logging, retry, metrics, auth, rate-limit. Each decorator implements the same interface and delegates to its inner. At request time, calling chain.Handle(...) does six interface dispatches (one per decorator, plus the terminal). For low-frequency endpoints this is invisible; for hot path RPC interceptors it's measurable.
Before¶
package middleware
import (
"context"
"log"
"time"
)
type Handler interface {
Handle(ctx context.Context, req Request) (Response, error)
}
type LoggingHandler struct {
Inner Handler
Log *log.Logger
}
func (h *LoggingHandler) Handle(ctx context.Context, req Request) (Response, error) {
start := time.Now()
resp, err := h.Inner.Handle(ctx, req)
h.Log.Printf("handle: %s, %v, err=%v", req.Method, time.Since(start), err)
return resp, err
}
type RetryHandler struct {
Inner Handler
Attempts int
}
func (h *RetryHandler) Handle(ctx context.Context, req Request) (Response, error) {
var resp Response
var err error
for i := 0; i < h.Attempts; i++ {
resp, err = h.Inner.Handle(ctx, req)
if err == nil { return resp, nil }
}
return resp, err
}
type MetricsHandler struct{ Inner Handler }
type AuthHandler struct{ Inner Handler }
type RateLimitHandler struct{ Inner Handler }
// ... each just wraps and delegates ...
// Build the chain
func BuildChain(terminal Handler) Handler {
var h Handler = terminal
h = &LoggingHandler{Inner: h, Log: log.Default()}
h = &RetryHandler{Inner: h, Attempts: 3}
h = &MetricsHandler{Inner: h}
h = &AuthHandler{Inner: h}
h = &RateLimitHandler{Inner: h}
return h
}
Benchmark¶
func BenchmarkChain(b *testing.B) {
b.ReportAllocs()
terminal := &terminalHandler{}
chain := BuildChain(terminal)
ctx := context.Background()
req := Request{}
b.ResetTimer()
for i := 0; i < b.N; i++ {
_, _ = chain.Handle(ctx, req)
}
}
Six interface dispatches plus the per-decorator work (logging allocates a format string, retry allocates the closure for time tracking, etc.).
After
Two paths. The first — and simpler — is to flatten the chain into one struct.type FlatHandler struct {
inner Handler
log *log.Logger
retryAttempts int
rateLimit *RateLimiter
auth *Authenticator
metrics *MetricsClient
}
func (h *FlatHandler) Handle(ctx context.Context, req Request) (Response, error) {
// Rate limit
if !h.rateLimit.Allow() {
return Response{}, ErrRateLimited
}
// Auth
if err := h.auth.Verify(ctx, req); err != nil {
return Response{}, err
}
// Metrics
start := time.Now()
defer func() { h.metrics.Observe(req.Method, time.Since(start)) }()
// Retry
var resp Response
var err error
for i := 0; i < h.retryAttempts; i++ {
resp, err = h.inner.Handle(ctx, req)
if err == nil { break }
}
// Logging
h.log.Printf("handle: %s, err=%v", req.Method, err)
return resp, err
}
10. Exercise 8: Strategy slice rebuilt every request¶
Scenario¶
A pricing engine constructs a list of discount strategies per request, even though the list is the same for every request. The construction allocates a []Discount, fills it with three or four strategy values, and discards it after computing the total.
Before¶
package pricing
type Discount interface {
Apply(subtotal int) int
}
type PercentOff struct{ Percent float64 }
func (p PercentOff) Apply(s int) int { return int(float64(s) * p.Percent / 100) }
type FlatOff struct{ Cents int }
func (f FlatOff) Apply(_ int) int { return f.Cents }
type MaxOff struct{ Cents int }
func (m MaxOff) Apply(s int) int {
if s < m.Cents { return s }
return m.Cents
}
// Called per request — allocates a fresh slice.
func ComputeTotal(items []Item) int {
discounts := []Discount{
PercentOff{Percent: 10},
FlatOff{Cents: 100},
MaxOff{Cents: 5000},
}
sub := subtotal(items)
for _, d := range discounts {
sub -= d.Apply(sub)
}
if sub < 0 { sub = 0 }
return sub
}
Benchmark¶
func BenchmarkComputeTotal(b *testing.B) {
b.ReportAllocs()
items := []Item{{Cents: 1000, Qty: 2}, {Cents: 500, Qty: 3}}
for i := 0; i < b.N; i++ {
_ = ComputeTotal(items)
}
}
144 bytes per call — the slice header, the slice's backing array of three Discount interface values (each 16 B), and the boxed concrete types (each escapes when stored in an interface slice).
After
Build the slice once at package init. Reuse across all calls.package pricing
var defaultDiscounts = []Discount{
PercentOff{Percent: 10},
FlatOff{Cents: 100},
MaxOff{Cents: 5000},
}
func ComputeTotal(items []Item) int {
sub := subtotal(items)
for _, d := range defaultDiscounts {
sub -= d.Apply(sub)
}
if sub < 0 { sub = 0 }
return sub
}
func ComputeTotalWith(items []Item, discounts []Discount) int { /* same loop */ }
func ComputeTotal(items []Item) int {
return ComputeTotalWith(items, defaultDiscounts)
}
var tenantDiscounts sync.Map // map[tenantID][]Discount
func ComputeTotalForTenant(tenantID string, items []Item) int {
v, ok := tenantDiscounts.Load(tenantID)
if !ok {
// Cold path: build the list, store it, retry.
list := buildDiscountsForTenant(tenantID)
v, _ = tenantDiscounts.LoadOrStore(tenantID, list)
}
return computeTotalWith(items, v.([]Discount))
}
11. Exercise 9: Generic strategy with closure-list — replace with direct field access¶
Scenario¶
A generic strategy accumulates "apply" functions in a slice, then runs them in Build(). Each .With(func...) allocates a closure. For the common case where the strategy is a settings struct with known fields, replacing the closure list with direct field setters eliminates both the closure allocations and the slice growth.
Before¶
package strategyx
type Settings[T any] struct {
apply []func(*T)
}
func New[T any]() *Settings[T] { return &Settings[T]{} }
func (s *Settings[T]) With(f func(*T)) *Settings[T] {
s.apply = append(s.apply, f)
return s
}
func (s *Settings[T]) Apply(t *T) {
for _, f := range s.apply {
f(t)
}
}
// Caller
type Compressor struct {
Level int
Concurrency int
BlockSize int
}
func configureCompressor() *Compressor {
var c Compressor
strategyx.New[Compressor]().
With(func(c *Compressor) { c.Level = 9 }).
With(func(c *Compressor) { c.Concurrency = 4 }).
With(func(c *Compressor) { c.BlockSize = 65536 }).
Apply(&c)
return &c
}
Benchmark¶
func BenchmarkClosureStrategy(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
_ = configureCompressor()
}
}
Three closures + the slice growth + the Settings struct + the final compressor = 7 allocations.
After
Direct setters on a typed strategy. No generics, no closures.package compressor
type Settings struct {
Level int
Concurrency int
BlockSize int
}
func (s *Settings) WithLevel(l int) *Settings { s.Level = l; return s }
func (s *Settings) WithConcurrency(n int) *Settings { s.Concurrency = n; return s }
func (s *Settings) WithBlockSize(b int) *Settings { s.BlockSize = b; return s }
func (s *Settings) Apply(c *Compressor) {
c.Level = s.Level
c.Concurrency = s.Concurrency
c.BlockSize = s.BlockSize
}
// Caller
func configureCompressor() *Compressor {
var c Compressor
s := (&Settings{}).
WithLevel(9).
WithConcurrency(4).
WithBlockSize(65536)
s.Apply(&c)
return &c
}
12. Exercise 10: Strategy spawning a goroutine per call¶
Scenario¶
A "concurrent strategy" launches a fresh goroutine for every call, on the theory that the work might block. In practice, the calls don't block much, and goroutine creation (~2 µs in Go 1.22) plus the channel synchronization costs more than the work itself.
Before¶
package processor
type Strategy func(item Item) Result
type Processor struct {
strategy Strategy
}
// Per-call goroutine plus channel for the result.
func (p *Processor) Process(item Item) Result {
ch := make(chan Result, 1)
go func() {
ch <- p.strategy(item)
}()
return <-ch
}
func (p *Processor) ProcessAll(items []Item) []Result {
out := make([]Result, len(items))
for i, it := range items {
out[i] = p.Process(it)
}
return out
}
Benchmark¶
func BenchmarkConcurrentStrategy(b *testing.B) {
b.ReportAllocs()
strategy := func(i Item) Result { return Result{Value: i.Value * 2} }
p := &Processor{strategy: strategy}
items := make([]Item, 100)
for i := range items { items[i] = Item{Value: i} }
b.ResetTimer()
for i := 0; i < b.N; i++ {
_ = p.ProcessAll(items)
}
}
100 items × 100 alloc/item (channel + goroutine stack initial allocation) ≈ 300 allocs. 420 µs to process 100 items that the strategy itself completes in nanoseconds each.
After
Two paths depending on whether parallelism is actually useful. **Path A — the strategy doesn't block, so kill the goroutines.**func (p *Processor) Process(item Item) Result {
return p.strategy(item) // direct call
}
func (p *Processor) ProcessAll(items []Item) []Result {
out := make([]Result, len(items))
for i, it := range items {
out[i] = p.strategy(it)
}
return out
}
package processor
import "sync"
type Strategy func(item Item) Result
type Pool struct {
workers int
strategy Strategy
jobs chan job
results chan result
wg sync.WaitGroup
}
type job struct {
idx int
item Item
}
type result struct {
idx int
res Result
}
func NewPool(workers int, strategy Strategy) *Pool {
p := &Pool{
workers: workers,
strategy: strategy,
jobs: make(chan job, workers*2),
results: make(chan result, workers*2),
}
p.wg.Add(workers)
for i := 0; i < workers; i++ {
go p.run()
}
return p
}
func (p *Pool) run() {
defer p.wg.Done()
for j := range p.jobs {
p.results <- result{idx: j.idx, res: p.strategy(j.item)}
}
}
func (p *Pool) ProcessAll(items []Item) []Result {
out := make([]Result, len(items))
go func() {
for i, it := range items {
p.jobs <- job{idx: i, item: it}
}
}()
for i := 0; i < len(items); i++ {
r := <-p.results
out[r.idx] = r.res
}
return out
}
func (p *Pool) Close() {
close(p.jobs)
p.wg.Wait()
}
13. Exercise 11: Strategy comparison using reflect.DeepEqual¶
Scenario¶
A test asserts that two strategies are "equal" — same configuration, same wrapped state. The naïve approach uses reflect.DeepEqual, which works for any type but is slow because it walks the entire object graph. For high-volume test runs (fuzzing, property tests) the comparison time dominates.
Before¶
package strategy
import "reflect"
type Strategy struct {
Name string
Weight float64
Tags []string
Configs map[string]string
}
func Equal(a, b *Strategy) bool {
return reflect.DeepEqual(a, b)
}
Benchmark¶
func BenchmarkReflectEqual(b *testing.B) {
b.ReportAllocs()
s1 := &Strategy{
Name: "primary",
Weight: 0.7,
Tags: []string{"a", "b", "c"},
Configs: map[string]string{"x": "1", "y": "2", "z": "3"},
}
s2 := &Strategy{
Name: "primary",
Weight: 0.7,
Tags: []string{"a", "b", "c"},
Configs: map[string]string{"x": "1", "y": "2", "z": "3"},
}
b.ResetTimer()
for i := 0; i < b.N; i++ {
_ = Equal(s1, s2)
}
}
reflect.DeepEqual boxes every intermediate value into reflect.Value to walk the tree.
After
Hand-rolled type-safe equality.func Equal(a, b *Strategy) bool {
if a == b { return true }
if a == nil || b == nil { return false }
if a.Name != b.Name { return false }
if a.Weight != b.Weight { return false }
if len(a.Tags) != len(b.Tags) { return false }
for i := range a.Tags {
if a.Tags[i] != b.Tags[i] { return false }
}
if len(a.Configs) != len(b.Configs) { return false }
for k, v := range a.Configs {
if b.Configs[k] != v { return false }
}
return true
}
func equalSliceComparable[T comparable](a, b []T) bool {
if len(a) != len(b) { return false }
for i := range a {
if a[i] != b[i] { return false }
}
return true
}
func Equal(a, b *Strategy) bool {
if a == b { return true }
if a == nil || b == nil { return false }
if a.Name != b.Name || a.Weight != b.Weight { return false }
if !equalSliceComparable(a.Tags, b.Tags) { return false }
return equalMapComparable(a.Configs, b.Configs)
}
14. Exercise 12: JSON-loaded strategy re-reading config every call¶
Scenario¶
A strategy is configured from a JSON file. The naïve Apply re-opens, re-reads, and re-parses the file on every invocation. For a service that processes thousands of items per second, this is thousands of filesystem syscalls and JSON parses per second.
Before¶
package pricing
import (
"encoding/json"
"os"
)
type DiscountConfig struct {
Percent float64 `json:"percent"`
MaxOff int `json:"max_off"`
EligibleSKUs []string `json:"eligible_skus"`
}
type FileBackedDiscount struct {
Path string
}
func (d *FileBackedDiscount) Apply(item Item, subtotal int) int {
data, err := os.ReadFile(d.Path)
if err != nil { return 0 }
var cfg DiscountConfig
if err := json.Unmarshal(data, &cfg); err != nil { return 0 }
if !contains(cfg.EligibleSKUs, item.SKU) { return 0 }
off := int(float64(subtotal) * cfg.Percent / 100)
if off > cfg.MaxOff { off = cfg.MaxOff }
return off
}
func contains(s []string, x string) bool {
for _, v := range s { if v == x { return true } }
return false
}
Benchmark¶
func BenchmarkFileBackedDiscount(b *testing.B) {
b.ReportAllocs()
d := &FileBackedDiscount{Path: "/tmp/discount.json"}
item := Item{SKU: "ABC", Cents: 1000}
b.ResetTimer()
for i := 0; i < b.N; i++ {
_ = d.Apply(item, 5000)
}
}
48 microseconds per call. Most of it is os.ReadFile (syscall + buffer alloc) and json.Unmarshal (allocation per field, allocation per SKU string).
After
Parse once at construction. Cache the parsed config in the strategy. Use `atomic.Pointer` if hot-reload is needed.package pricing
import (
"encoding/json"
"os"
"sync/atomic"
)
type DiscountConfig struct {
Percent float64
MaxOff int
EligibleSKUs map[string]struct{} // O(1) lookup
}
type FileBackedDiscount struct {
cfg atomic.Pointer[DiscountConfig]
}
func NewFileBackedDiscount(path string) (*FileBackedDiscount, error) {
d := &FileBackedDiscount{}
if err := d.reload(path); err != nil { return nil, err }
return d, nil
}
func (d *FileBackedDiscount) reload(path string) error {
data, err := os.ReadFile(path)
if err != nil { return err }
var raw struct {
Percent float64 `json:"percent"`
MaxOff int `json:"max_off"`
EligibleSKUs []string `json:"eligible_skus"`
}
if err := json.Unmarshal(data, &raw); err != nil { return err }
skus := make(map[string]struct{}, len(raw.EligibleSKUs))
for _, s := range raw.EligibleSKUs {
skus[s] = struct{}{}
}
d.cfg.Store(&DiscountConfig{
Percent: raw.Percent,
MaxOff: raw.MaxOff,
EligibleSKUs: skus,
})
return nil
}
func (d *FileBackedDiscount) Apply(item Item, subtotal int) int {
cfg := d.cfg.Load()
if cfg == nil { return 0 }
if _, ok := cfg.EligibleSKUs[item.SKU]; !ok { return 0 }
off := int(float64(subtotal) * cfg.Percent / 100)
if off > cfg.MaxOff { off = cfg.MaxOff }
return off
}
func (d *FileBackedDiscount) Watch(path string) {
// pseudo — use fsnotify in real code
go func() {
for range ticker.C {
d.reload(path)
}
}()
}
15. When NOT to optimize¶
The honest framing: most strategies should not be optimized. The pattern is cheap. The wins exist only when:
| Condition | Threshold to bother |
|---|---|
| Strategy call frequency | > 100k calls/sec sustained |
| Profile shows strategy method in top 5 % CPU | Yes |
| Allocation profile shows strategy closures/maps in top 10 | Yes |
| The "fix" doesn't change the public API or break correctness | Yes |
| You can write a regression test | Yes |
| The fix survives a Go version bump | Probably yes |
If you can't tick most of those, don't optimize. The strategies in crypto/cipher, database/sql drivers, compress/*, gRPC interceptors are all "naïve" by the standards of this file — they ship because the simple version is good enough.
Specific anti-patterns to avoid:
| Anti-pattern | Why it's bad |
|---|---|
| Switching everything to function strategies "for speed" | Loses interface segregation, optional capabilities, and named-type clarity for a sub-nanosecond win |
| Flattening every decorator chain | Loses composability and test isolation for ~1 µs that wasn't on the profile |
| Worker pool for non-blocking strategies | Slower than sequential and adds lifecycle complexity |
| Caching strategy lookup when the name changes per call | Cache misses match or exceed the original cost |
Hand-rolling Equal when you compare twice per run | 70× faster on noise. Wasted code |
| Removing reflect everywhere | Reflect is fine for framework-level discovery; only replace it on the hot path |
Premature sync.Pool for strategy objects | Below ~10k QPS the pool overhead matches the savings |
The default answer to "can we make this strategy faster?" is no, it's fine. The yes cases are narrow and benchmark-justified.
16. The optimization checklist¶
Before shipping any optimization from this file:
- Baseline benchmark exists (the unoptimized strategy).
- Optimized benchmark shows ≥ 2× improvement OR saves ≥ 1 allocation per call.
-
pprofconfirms the optimization targets a real hot spot (top 5 % CPU or top 10 allocs). - The new code passes the same tests as the old.
-
-gcflags=-mshows no unexpected escapes (especially for closure changes). -
-raceis clean (especially for cached registries, atomic-pointer configs, worker pools). - Documentation explains the assumption the optimization makes ("strategies must be stateless to share", "reload via SIGHUP", "config must exist at boot").
- CI regression test (
benchstat) compares against the baseline. - Code review has signed off on the trade-off (especially for API-shape changes like generics or PGO).
- The "When NOT to do this" condition from the relevant exercise has been checked.
If any item is missing, the optimization isn't ready.
17. Summary¶
The interface-strategy in Go is already fast: ~1.6 ns and zero allocations per call. Most optimizations in this file save 10-1000 ns and 1-5 allocations. That matters at 100k QPS. It does not matter at 100 QPS.
The wins worth shipping cluster in seven areas:
- Resolve registry lookups once, not per call (Exercise 3) — 15× faster, removes hash + lock from the hot path. Pure win when the strategy name is stable.
- Replace
reflectcapability checks with type assertion (Exercise 4) — 17× faster, no allocation. Pure win when the optional interface is known at compile time. - Pre-build strategy slices at boot (Exercise 8) — 5× faster, zero alloc. Pure win when the strategy list is stable.
- Cache parsed config in the strategy (Exercise 12) — 12,000× faster when the strategy was re-reading a file. Pure win.
- Hoist closure captures outside the hot loop (Exercise 1) — moves the switch from per-element to per-call. Real win for tight loops.
- Replace
reflect.DeepEqualwith hand-rolled equality (Exercise 11) — 70× faster on millions of compares; not worth it for handful-of-test situations. - Reuse mock strategies in benchmarks (Exercise 5) — removes setup noise. The benchmark now measures the code being tested.
The wins that don't always pay off:
- Generics for devirtualization (Exercise 2) — only when the concrete type is statically known and isn't a pointer.
- PGO devirtualization (Exercise 2, 7) — requires a profile-build workflow your CI may not support.
- Flattening decorator chains (Exercise 7) — kills composability; only ship when the chain is truly hot and the pattern is stable.
- Worker pools for "concurrent" strategies (Exercise 10) — wins only when work is I/O-bound or items are numerous and CPU-bound.
- Wide-to-narrow interface segregation (Exercise 6) — small perf win, mainly a correctness/API improvement.
- Direct setters over closure-list strategies (Exercise 9) — loses framework-style extensibility.
Always benchmark. Always check -race. Always confirm the optimization survives a Go version bump. Most production codebases need none of these optimizations; the pattern is fine as written in junior.md and middle.md.
Further reading¶
- Go 1.21+ PGO: https://go.dev/doc/pgo
sync.Map: https://pkg.go.dev/sync#Mapatomic.Pointer[T]: https://pkg.go.dev/sync/atomic#Pointer- Escape analysis: https://github.com/golang/go/wiki/CompilerOptimizations
benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstatgo-cmp(typed equality): https://pkg.go.dev/github.com/google/go-cmp/cmp- Sibling: middle.md — variant choices
- Sibling: junior.md — the baseline shape
- Related: ../02-builder-pattern/optimize.md — same file shape for builder
- Related: ../04-decorator-pattern/ — when decorators are themselves the hot path
- Inspiration (zero-allocation strategies): https://github.com/valyala/fasthttp
- Inspiration (interface segregation):
io.Reader,io.Writer,io.Closerin stdlib