Factory Pattern — Optimization¶
1. How to use this file¶
Twelve scenarios where factory code is slower than it needs to be. Each:
- Scenario — the inefficiency.
- Before — measured-slow code with realistic benchmark numbers.
- After (collapsible) — optimised version with benchmark comparison.
- Why faster — what changed at the runtime level.
- Trade-offs — what you lose by optimising.
- When NOT to do this — the cases where the optimisation isn't worth it.
The honest answer for most factory "optimisations": they don't matter. A factory call is typically 5-50 ns of overhead. Unless you're constructing >100k objects per second, the dispatch cost is below the noise. Benchmarks here are illustrative — qualitative direction (allocs vs no allocs) matters more than absolute ns/op. Go 1.22, amd64, GOMAXPROCS=8.
2. Table of Contents¶
- How to use this file
- Table of Contents
- Exercise 1 — Factory called per-request when the result is cacheable
- Exercise 2 — Registry lookup per call instead of resolve-once-at-boot
- Exercise 3 — Lazy init via mutex instead of sync.Once
- Exercise 4 — Factory returning interface forces heap allocation
- Exercise 5 — Reflect-based factory replaced by type-specific factories
- Exercise 6 — Factory recompiling regex per call
- Exercise 7 — Map-based dispatch in factory replaced by switch for small N
- Exercise 8 — PGO devirtualization for factory call sites
- Exercise 9 — Factory using fmt.Sprintf in hot path
- Exercise 10 — Generic factory with closure list replaced by direct field setters
- Exercise 11 — Factory pool with sync.Pool for transient objects
- Exercise 12 — Pre-warmed factory cache vs cold-start
- When NOT to optimize
- Summary
Exercise 1 — Factory called per-request when the result is cacheable¶
Scenario: A handler calls NewParser(cfg) on every request, but cfg is identical across requests. The constructor does non-trivial work (validation, schema build, child allocation).
Before:
func NewParser(cfg Config) *Parser {
p := &Parser{cfg: cfg}
p.validate() // checks 20 fields
p.schema = buildSchema() // allocates ~512 B
return p
}
func handle(w http.ResponseWriter, r *http.Request) {
p := NewParser(defaultCfg) // same cfg every request
p.Parse(r.Body)
}
Benchmark:
At 50k QPS, that's ~45 MB/s of allocation pressure for an object that never changes.
After
Cache the parser at package init since the config is static:var defaultParser = NewParser(defaultCfg)
func handle(w http.ResponseWriter, r *http.Request) {
defaultParser.Parse(r.Body)
}
Exercise 2 — Registry lookup per call instead of resolve-once-at-boot¶
Scenario: A factory dispatches by string key from a registry. Each request hashes the key and traverses the bucket.
Before:
var registry = map[string]func() Driver{
"postgres": newPostgres,
"mysql": newMySQL,
"sqlite": newSQLite,
}
func Open(name string) Driver {
fn, ok := registry[name]
if !ok {
panic("unknown driver: " + name)
}
return fn()
}
// Hot path:
func query() {
d := Open("postgres") // string hash + map lookup every call
d.Exec(...)
}
The cost: hash the 8-byte string, probe the map, return the function pointer, call it. The string "postgres" is constant — the lookup result will never change.
After
Resolve once at boot:var pgDriver = registry["postgres"] // resolved at init
func query() {
d := pgDriver()
d.Exec(...)
}
Exercise 3 — Lazy init via mutex instead of sync.Once¶
Scenario: A factory creates an expensive singleton lazily, guarded by a mutex.
Before:
type ClientFactory struct {
mu sync.Mutex
client *Client // expensive: TLS handshake, DNS, pool warm-up
}
func (f *ClientFactory) Get() *Client {
f.mu.Lock()
defer f.mu.Unlock()
if f.client == nil {
f.client = newClient() // 50 ms first call
}
return f.client
}
Every call acquires the mutex, even after client is set. Under contention from many goroutines, the mutex becomes a serialization point.
After
Use `sync.Once`:type ClientFactory struct {
once sync.Once
client *Client
}
func (f *ClientFactory) Get() *Client {
f.once.Do(func() { f.client = newClient() })
return f.client
}
Exercise 4 — Factory returning interface forces heap allocation¶
Scenario: Factory function returns an interface. Escape analysis is forced to put the value on the heap because it cannot prove the interface doesn't escape.
Before:
type Encoder interface {
Encode(v any) ([]byte, error)
}
type jsonEncoder struct{ buf []byte }
func (j *jsonEncoder) Encode(v any) ([]byte, error) { /* ... */ }
func NewEncoder() Encoder {
return &jsonEncoder{} // escapes to heap
}
func encodeBatch(items []Item) {
for _, item := range items {
e := NewEncoder() // heap alloc per iteration
_, _ = e.Encode(item)
}
}
Run escape analysis:
go build -gcflags='-m -m' . 2>&1 | grep -E '(escapes|NewEncoder)'
# Output: ./main.go:18:9: &jsonEncoder{} escapes to heap
After
Return the concrete type — let escape analysis stack-allocate:func NewEncoder() *jsonEncoder {
return &jsonEncoder{}
}
func encodeBatch(items []Item) {
for _, item := range items {
e := NewEncoder() // may stack-allocate now
_, _ = e.Encode(item)
}
}
func encodeBatch(items []Item) {
enc := jsonEncoder{} // stack value
for _, item := range items {
_, _ = enc.Encode(item)
}
}
Exercise 5 — Reflect-based factory replaced by type-specific factories¶
Scenario: A "generic" factory uses reflection to construct objects from a type registry.
Before:
var typeRegistry = map[string]reflect.Type{
"user": reflect.TypeOf(User{}),
"order": reflect.TypeOf(Order{}),
"product": reflect.TypeOf(Product{}),
}
func New(name string) any {
t, ok := typeRegistry[name]
if !ok {
return nil
}
return reflect.New(t).Interface()
}
func handle() {
u := New("user").(*User)
u.Name = "alice"
}
Reflection is slow: type descriptor lookup, allocation via the runtime, boxing into any, then a type assertion on the way out.
After
Use type-specific factories with a typed dispatcher:func newUser() *User { return &User{} }
func newOrder() *Order { return &Order{} }
func newProduct() *Product { return &Product{} }
// If you need string dispatch, type-switch the result:
type Entity interface{ kind() string }
var registry = map[string]func() Entity{
"user": func() Entity { return newUser() },
"order": func() Entity { return newOrder() },
"product": func() Entity { return newProduct() },
}
Exercise 6 — Factory recompiling regex per call¶
Scenario: A factory creates a validator that compiles a regex on every construction.
Before:
type EmailValidator struct {
re *regexp.Regexp
}
func NewEmailValidator() *EmailValidator {
return &EmailValidator{
re: regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`),
}
}
// Hot path: handler creates a new validator per request
func handle(email string) bool {
v := NewEmailValidator()
return v.re.MatchString(email)
}
Regex compilation is the dominant cost: tokenize, build NFA/DFA, allocate the program.
After
Compile once as a package var, share the validator (or just the regex):var emailRE = regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`)
type EmailValidator struct{} // stateless
func (EmailValidator) Validate(email string) bool {
return emailRE.MatchString(email)
}
// Or skip the wrapper entirely:
func ValidateEmail(s string) bool { return emailRE.MatchString(s) }
var regexCache sync.Map // map[string]*regexp.Regexp
func compileCached(pattern string) (*regexp.Regexp, error) {
if v, ok := regexCache.Load(pattern); ok {
return v.(*regexp.Regexp), nil
}
re, err := regexp.Compile(pattern)
if err != nil {
return nil, err
}
actual, _ := regexCache.LoadOrStore(pattern, re)
return actual.(*regexp.Regexp), nil
}
Exercise 7 — Map-based dispatch in factory replaced by switch for small N¶
Scenario: Factory uses a map[string]func() T to dispatch. The number of keys is small (3-6) and fixed.
Before:
var shapeFactories = map[string]func() Shape{
"circle": func() Shape { return &Circle{} },
"square": func() Shape { return &Square{} },
"triangle": func() Shape { return &Triangle{} },
}
func NewShape(name string) Shape {
fn, ok := shapeFactories[name]
if !ok {
return nil
}
return fn()
}
Map cost is hash + bucket walk + closure call.
After
Switch for small, fixed N: ~6× speedup for small N. **Why faster:** String hashing is ~20-30 ns. A switch on string compares against each case label directly — for ≤8 cases, that's a few branches. The compiler may also generate jump tables or perfect-hash dispatch for larger switches. **Trade-offs:** Adding a new shape requires editing the function (not registering at runtime). Less dynamic. You also can't iterate the supported names without a separate list. **When NOT to do this:** When the set of names is large (≥20) or determined at runtime by user code (plugin registration). The map handles those well. Also, when extensibility matters more than 50 ns/call. Rule of thumb: | N (number of cases) | Best dispatch | |---|---| | 1-8 | switch | | 9-30 | either; profile | | 30+ | map |Exercise 8 — PGO devirtualization for factory call sites¶
Scenario: A factory returns an interface that's called frequently. The actual concrete type in production is almost always the same one.
Before:
type Codec interface {
Encode(v any) ([]byte, error)
}
func NewCodec(name string) Codec {
switch name {
case "json": return &jsonCodec{}
case "protobuf": return &protoCodec{}
case "msgpack": return &msgpackCodec{}
}
return nil
}
// In production, 95% of calls use "json".
func handle(req Request) {
c := NewCodec(req.Format) // almost always &jsonCodec{}
c.Encode(req.Payload)
}
c.Encode is an interface dispatch — itab lookup, indirect call. The CPU's branch predictor can't see through the indirection.
After (with PGO)
Collect a profile from production-representative load:# 1. Build instrumented binary, run under realistic load, collect profile
go test -bench=. -cpuprofile=cpu.pprof -benchtime=10s
# Or capture from a running service:
curl -o cpu.pprof http://localhost:6060/debug/pprof/profile?seconds=30
# 2. Move profile to default location and rebuild with PGO
mv cpu.pprof default.pgo
go build -pgo=auto . # picks up default.pgo automatically
Exercise 9 — Factory using fmt.Sprintf in hot path¶
Scenario: A factory builds object identifiers or labels using fmt.Sprintf on every construction.
Before:
type Worker struct {
id string
name string
}
func NewWorker(pool string, n int) *Worker {
return &Worker{
id: fmt.Sprintf("%s-%d", pool, n),
name: fmt.Sprintf("worker-%s-%05d", pool, n),
}
}
func spawn(pool string, count int) []*Worker {
workers := make([]*Worker, count)
for i := 0; i < count; i++ {
workers[i] = NewWorker(pool, i)
}
return workers
}
fmt.Sprintf parses the format string, dispatches via reflection per verb, allocates the result buffer.
After
Use `strings.Builder` + `strconv` for fast, allocation-controlled formatting:func NewWorker(pool string, n int) *Worker {
var idBuf, nameBuf strings.Builder
idBuf.Grow(len(pool) + 12)
idBuf.WriteString(pool)
idBuf.WriteByte('-')
idBuf.WriteString(strconv.Itoa(n))
nameBuf.Grow(len(pool) + 14)
nameBuf.WriteString("worker-")
nameBuf.WriteString(pool)
nameBuf.WriteByte('-')
// zero-pad to 5 digits
s := strconv.Itoa(n)
for k := len(s); k < 5; k++ {
nameBuf.WriteByte('0')
}
nameBuf.WriteString(s)
return &Worker{id: idBuf.String(), name: nameBuf.String()}
}
Exercise 10 — Generic factory with closure list replaced by direct field setters¶
Scenario: A factory uses the functional-options pattern with []Option (each Option is a closure). Constructing involves walking the slice and invoking each closure.
Before:
type Server struct {
addr string
timeout time.Duration
maxConn int
tlsCfg *tls.Config
}
type Option func(*Server)
func WithAddr(a string) Option { return func(s *Server) { s.addr = a } }
func WithTimeout(d time.Duration) Option { return func(s *Server) { s.timeout = d } }
func WithMaxConn(n int) Option { return func(s *Server) { s.maxConn = n } }
func WithTLS(c *tls.Config) Option { return func(s *Server) { s.tlsCfg = c } }
func NewServer(opts ...Option) *Server {
s := &Server{timeout: 30 * time.Second, maxConn: 1000}
for _, opt := range opts {
opt(s)
}
return s
}
// Caller in a hot path (e.g. per-tenant server spin-up)
func spinUp() *Server {
return NewServer(
WithAddr(":8080"),
WithTimeout(5*time.Second),
WithMaxConn(500),
WithTLS(myTLS),
)
}
Each Option is a closure value — that's a heap allocation for the captured args. The variadic opts ...Option is a slice — another allocation. Then iterating the slice and calling each.
After
If construction is on a hot path and the option set is fixed, use a config struct:type ServerConfig struct {
Addr string
Timeout time.Duration
MaxConn int
TLSCfg *tls.Config
}
func NewServer(cfg ServerConfig) *Server {
if cfg.Timeout == 0 {
cfg.Timeout = 30 * time.Second
}
if cfg.MaxConn == 0 {
cfg.MaxConn = 1000
}
return &Server{
addr: cfg.Addr,
timeout: cfg.Timeout,
maxConn: cfg.MaxConn,
tlsCfg: cfg.TLSCfg,
}
}
func spinUp() *Server {
return NewServer(ServerConfig{
Addr: ":8080",
Timeout: 5 * time.Second,
MaxConn: 500,
TLSCfg: myTLS,
})
}
Exercise 11 — Factory pool with sync.Pool for transient objects¶
Scenario: A factory creates short-lived objects (request scratch buffers, parsers, encoders) and discards them. GC pressure rises.
Before:
type RequestCtx struct {
buf []byte
headers map[string]string
fields []Field
}
func NewRequestCtx() *RequestCtx {
return &RequestCtx{
buf: make([]byte, 0, 4096),
headers: make(map[string]string, 16),
fields: make([]Field, 0, 32),
}
}
func handle(w http.ResponseWriter, r *http.Request) {
ctx := NewRequestCtx()
process(ctx, r)
// ctx discarded, eligible for GC
}
At 50k QPS:
That's ~250 MB/s of allocation. GC pauses become visible in p99 latency.
After
Pool the object:var ctxPool = sync.Pool{
New: func() any {
return &RequestCtx{
buf: make([]byte, 0, 4096),
headers: make(map[string]string, 16),
fields: make([]Field, 0, 32),
}
},
}
func acquireCtx() *RequestCtx {
return ctxPool.Get().(*RequestCtx)
}
func releaseCtx(c *RequestCtx) {
// Reset before returning to pool
c.buf = c.buf[:0]
for k := range c.headers {
delete(c.headers, k)
}
c.fields = c.fields[:0]
ctxPool.Put(c)
}
func handle(w http.ResponseWriter, r *http.Request) {
ctx := acquireCtx()
defer releaseCtx(ctx)
process(ctx, r)
}
Exercise 12 — Pre-warmed factory cache vs cold-start¶
Scenario: A factory caches expensive constructions in a map, but the first request after restart pays the full cost. With many keys, the warm-up tail is long.
Before:
type SchemaFactory struct {
mu sync.RWMutex
cache map[string]*Schema
}
func (f *SchemaFactory) Get(name string) *Schema {
f.mu.RLock()
if s, ok := f.cache[name]; ok {
f.mu.RUnlock()
return s
}
f.mu.RUnlock()
f.mu.Lock()
defer f.mu.Unlock()
if s, ok := f.cache[name]; ok {
return s
}
s := buildSchema(name) // 5 ms each
f.cache[name] = s
return s
}
Cold start, first 100 requests:
Steady state, after warm-up:
The cold-start tail kills SLOs after deploys, scale-out events, or container restarts.
After
Pre-warm at startup, before serving traffic:func (f *SchemaFactory) Prewarm(ctx context.Context, names []string) error {
var (
wg sync.WaitGroup
sem = make(chan struct{}, runtime.NumCPU())
errs []error
mu sync.Mutex
)
for _, name := range names {
wg.Add(1)
sem <- struct{}{}
go func(n string) {
defer wg.Done()
defer func() { <-sem }()
if _, err := f.getOrBuild(ctx, n); err != nil {
mu.Lock()
errs = append(errs, err)
mu.Unlock()
}
}(name)
}
wg.Wait()
if len(errs) > 0 {
return errors.Join(errs...)
}
return nil
}
// In main, before listening:
func main() {
f := &SchemaFactory{cache: make(map[string]*Schema)}
names := loadKnownSchemaNames() // from config, DB, or last-seen-keys log
if err := f.Prewarm(context.Background(), names); err != nil {
log.Fatalf("prewarm: %v", err)
}
http.ListenAndServe(":8080", handler(f))
}
// In a Kubernetes liveness/readiness handler:
func readinessHandler(ready *atomic.Bool) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
if !ready.Load() {
http.Error(w, "not ready", http.StatusServiceUnavailable)
return
}
w.WriteHeader(http.StatusOK)
}
}
// Alternative: request-coalescing for cold misses
import "golang.org/x/sync/singleflight"
type SchemaFactory struct {
g singleflight.Group
cache sync.Map // name -> *Schema
}
func (f *SchemaFactory) Get(name string) (*Schema, error) {
if v, ok := f.cache.Load(name); ok {
return v.(*Schema), nil
}
v, err, _ := f.g.Do(name, func() (any, error) {
s := buildSchema(name)
f.cache.Store(name, s)
return s, nil
})
if err != nil {
return nil, err
}
return v.(*Schema), nil
}
When NOT to optimize¶
Most factory-related optimisations are micro-optimisations. They matter only if:
- Profiling shows the factory is a bottleneck. Run
go tool pprofand verify before optimising. - The QPS is high enough to matter. A 100 ns saving × 10 QPS = 1 microsecond/sec. Irrelevant.
- The clarity loss is acceptable. Most optimisations make code harder to read.
The right order: measure → identify hot paths → optimise selectively → measure again.
go test -bench=. -cpuprofile=cpu.pprof -memprofile=mem.pprof
go tool pprof -top -cum cpu.pprof
go test -bench=. -count=10 > before.txt # apply change, then re-run
benchstat before.txt after.txt
Premature optimisation of factories is a classic time-waster. The pattern is already efficient — Go's compiler handles the common cases well. The exceptions almost always worth it without measurement:
sync.Oncefor lazy init (cheaper than mutex; no downside).- Pre-compile regexes / templates at package init.
- Resolve registry lookups at construction time, not per-call (Exercise 2).
var _ Iface = (*ConcreteFactory)(nil)compile-time check.
Everything else: measure first.
Summary¶
Wins that always ship: - sync.Once for lazy init (Exercise 3). - Pre-compile regexes at package level (Exercise 6). - Resolve registry lookups once at boot (Exercise 2). - Compile-time interface check (var _ Iface = (*Factory)(nil)).
Wins behind a profile: - Cache cacheable factory results (Exercise 1). - Return concrete types where the abstraction isn't needed (Exercise 4). - Replace reflect-based factories with type-specific ones (Exercise 5). - Replace fmt.Sprintf with strings.Builder/strconv (Exercise 9). - Pool transient factory outputs with sync.Pool (Exercise 11). - Pre-warm caches before serving traffic (Exercise 12).
Wins that trade off flexibility: - Functional options → config struct (Exercise 10). - Map dispatch → switch for small N (Exercise 7).
Rarely worth it without measurement: PGO devirtualization (Exercise 8) — only for hot services with stable workloads.
Most factory performance work is avoiding allocations and moving cost off the hot path. The three patterns most engineers hit first — per-request factory with static config (Exercise 1), registry lookup per call (Exercise 2), regex compiled per call (Exercise 6) — fix the majority of factory-related hotspots seen in real services with no measurement needed.