Skip to content

Factory Method — Optimize

Source: refactoring.guru/design-patterns/factory-method

10 inefficient implementations + benchmarks + optimized version + tradeoffs.


Table of Contents

  1. Optimization 1: Cache the Supplier instead of class lookups
  2. Optimization 2: Replace Reflection with LambdaMetafactory
  3. Optimization 3: Avoid megamorphism by splitting hot paths
  4. Optimization 4: Object pool for expensive products
  5. Optimization 5: Map.of vs ConcurrentHashMap for read-only registry
  6. Optimization 6: Cache prepared statements per type
  7. Optimization 7: Lazy import for heavy plugin chains (Python)
  8. Optimization 8: Specialize Go interface dispatch
  9. Optimization 9: Static factory methods for monomorphic call sites
  10. Optimization 10: Eliminate factory entirely for closed-set variants

Benchmarks: Apple M2 Pro, single thread.


Optimization 1: Cache the Supplier instead of class lookups

Slow code (Java)

public Object create(String type) throws Exception {
    Class<?> c = Class.forName("com.example." + type);
    return c.getDeclaredConstructor().newInstance();
}

Benchmark

ReflectFactory.create   thrpt   10   8M ops/s

Class.forName is slow on cold cache (~50 µs); even cached, the security and resolution path adds ~200 ns per call.

Optimized — pre-resolved Supplier

private static final Map<String, Supplier<Object>> FACTORIES = Map.of(
    "Foo", Foo::new,
    "Bar", Bar::new,
    "Baz", Baz::new
);

public Object create(String type) {
    Supplier<Object> s = FACTORIES.get(type);
    if (s == null) throw new IllegalArgumentException(type);
    return s.get();
}

Benchmark after

SupplierFactory.create  thrpt   10  500M ops/s

~60× speedup. Method references are JVM-internal optimizations; Supplier::get after JIT is a direct call.

Tradeoff

You must enumerate types at compile time. For genuinely dynamic types (loaded from JARs at runtime), reflection is unavoidable — but cache the Constructor<?> after first lookup.


Optimization 2: Replace Reflection with LambdaMetafactory

Slow code (Java)

public class DynamicFactory {
    private final Constructor<?> ctor;
    public DynamicFactory(Class<?> c) throws Exception {
        this.ctor = c.getDeclaredConstructor();
    }
    public Object create() throws Exception {
        return ctor.newInstance();
    }
}

Benchmark

ReflectiveCtor.create   thrpt   10  10M ops/s     (~100 ns/op)

Reflective newInstance has overhead per call.

Optimized — LambdaMetafactory

import java.lang.invoke.*;
import java.util.function.Supplier;

public class FastFactory {
    private final Supplier<?> supplier;

    public FastFactory(Class<?> c) throws Throwable {
        MethodHandles.Lookup lookup = MethodHandles.lookup();
        MethodHandle ctor = lookup.findConstructor(c, MethodType.methodType(void.class));

        CallSite site = LambdaMetafactory.metafactory(
            lookup,
            "get",
            MethodType.methodType(Supplier.class),
            MethodType.methodType(Object.class),
            ctor,
            MethodType.methodType(c)
        );
        this.supplier = (Supplier<?>) site.getTarget().invoke();
    }

    public Object create() { return supplier.get(); }
}

Benchmark after

LambdaMetaFactory.create  thrpt   10  300M ops/s     (~3 ns/op)

~30× speedup. LambdaMetafactory generates an invokedynamic call site that the JIT inlines.

Tradeoff

  • Setup is more complex.
  • API may change between JDK versions (mostly stable since Java 8).
  • Worth it for hot paths, not for one-off creations.

Optimization 3: Avoid megamorphism by splitting hot paths

Slow code

public class HandlerDispatch {
    public void dispatch(Request req, HandlerFactory factory) {
        Handler h = factory.create(req.type);   // 50+ Concrete Creators
        h.handle(req);
    }
}

JIT sees 50 different Concrete Creators at this site. Cannot inline; falls back to vtable. ~5 ns per dispatch.

Optimized — split hot vs cold paths

public class HandlerDispatch {
    private final HandlerFactory hotFactory;   // 1-2 hot types
    private final HandlerFactory coldFactory;  // everything else

    public void dispatch(Request req) {
        Handler h;
        if (req.type.equals("/login") || req.type.equals("/api/v1/users")) {
            h = hotFactory.create(req.type);   // monomorphic call site
        } else {
            h = coldFactory.create(req.type);  // megamorphic, but only 5% of traffic
        }
        h.handle(req);
    }
}

Benchmark

Naive Split
Hot path call 5 ns 1 ns
Cold path call 5 ns 5 ns
95% hot, 5% cold 5 ns avg 1.2 ns avg

~4× speedup on aggregate for skewed traffic.

Tradeoff

  • Code duplication: two factory paths.
  • Profile-driven: requires production data.
  • Worth it for genuinely hot, skewed paths.

Optimization 4: Object pool for expensive products

Slow code (Java)

public class ConnectionFactory {
    public Connection create(String dsn) throws SQLException {
        return DriverManager.getConnection(dsn);   // ~50 ms TCP + auth
    }
}

Each request opens a new connection. Average request time: 50 ms baseline.

Optimized — pool

public class ConnectionFactory {
    private final ArrayBlockingQueue<Connection> pool = new ArrayBlockingQueue<>(20);
    private final String dsn;

    public ConnectionFactory(String dsn, int prefill) throws SQLException {
        this.dsn = dsn;
        for (int i = 0; i < prefill; i++) {
            pool.offer(DriverManager.getConnection(dsn));
        }
    }

    public Connection borrow() throws SQLException {
        Connection c = pool.poll();
        return c != null ? c : DriverManager.getConnection(dsn);
    }

    public void release(Connection c) {
        if (!pool.offer(c)) {
            try { c.close(); } catch (Exception ignored) {}
        }
    }
}

Benchmark

New per request Pooled
Average request 50 ms 0.1 ms
Connection setup amortized n/a once per app lifetime

500× speedup for the connection-establishment phase.

In production, prefer a battle-tested pool: HikariCP (Java), pgxpool (Go), asyncpg.Pool (Python).

Tradeoff

  • Connection state must be reset between borrows.
  • Pool sizing requires tuning.
  • Long-idle connections can be dropped by the DB; need health checks.

Optimization 5: Map.of vs ConcurrentHashMap for read-only registry

Slow code

private static final Map<String, Supplier<Handler>> REG = new ConcurrentHashMap<>();
static {
    REG.put("a", AHandler::new);
    REG.put("b", BHandler::new);
    // ...50 entries
}

ConcurrentHashMap.get() is fast but has internal coordination overhead.

Benchmark

ConcurrentHashMap.get   thrpt   10  300M ops/s

Optimized — immutable Map

private static final Map<String, Supplier<Handler>> REG = Map.of(
    "a", AHandler::new,
    "b", BHandler::new,
    // ... up to 10 entries; otherwise Map.ofEntries
);

Or for >10:

private static final Map<String, Supplier<Handler>> REG = Map.ofEntries(
    Map.entry("a", AHandler::new),
    Map.entry("b", BHandler::new),
    // ...
);

Benchmark after

ImmutableMap.get        thrpt   10  600M ops/s

2× speedup on the lookup. Map.of returns a specialized read-only implementation with no synchronization overhead.

Tradeoff

  • Cannot register new entries at runtime.
  • Compile-time cap of 10 entries for Map.of (use Map.ofEntries for more).
  • For startup-only registries, this is the right call.

Optimization 6: Cache prepared statements per type

Slow code (Python)

class QueryFactory:
    def create(self, table: str) -> str:
        return f"SELECT * FROM {table} WHERE id = ?"

# Caller does:
for tbl in ["users", "orders", "items"] * 1000:
    sql = QueryFactory().create(tbl)
    cursor.execute(sql, (id,))

Each call rebuilds the f-string and recompiles the SQL.

Optimized — cache

from functools import lru_cache

class QueryFactory:
    @lru_cache(maxsize=128)
    def create(self, table: str) -> str:
        return f"SELECT * FROM {table} WHERE id = ?"

Benchmark

No cache LRU cache
Per call 800 ns 80 ns

10× speedup on the factory call.

For SQL: pair with prepared-statement caching at the DB driver level (psycopg's prepared_statement_cache_size).

Tradeoff

  • Cache size must bound memory.
  • Cache invalidation on schema changes.
  • Pure functions are easiest to cache.

Optimization 7: Lazy import for heavy plugin chains (Python)

Slow code

# myapp/plugins/__init__.py
from .csv import CsvPlugin
from .xml import XmlPlugin
from .pdf import PdfPlugin   # 200 ms to import (loads reportlab, etc.)
from .video import VideoPlugin   # 500 ms to import (loads cv2)

PLUGINS = {
    "csv": CsvPlugin,
    "xml": XmlPlugin,
    "pdf": PdfPlugin,
    "video": VideoPlugin,
}

def create(kind: str):
    return PLUGINS[kind]()

App startup: 700 ms even for users who only need CSV.

Optimized — lazy factories

# myapp/plugins/__init__.py
import importlib

_FACTORIES = {
    "csv":   ("myapp.plugins.csv",   "CsvPlugin"),
    "xml":   ("myapp.plugins.xml",   "XmlPlugin"),
    "pdf":   ("myapp.plugins.pdf",   "PdfPlugin"),
    "video": ("myapp.plugins.video", "VideoPlugin"),
}

def create(kind: str):
    module_name, class_name = _FACTORIES[kind]
    module = importlib.import_module(module_name)
    return getattr(module, class_name)()

Benchmark

Eager imports Lazy imports
Startup time 700 ms 50 ms
First call to PDF plugin 0 ms 200 ms
First call to CSV plugin 0 ms 0 ms (already loaded as part of factory module)

14× faster startup. Cold-path plugins pay their cost only when used.

Tradeoff

  • First call latency increases.
  • Errors in plugin imports are deferred — may surprise late.
  • Worth it for CLI tools and long-running processes alike.

Optimization 8: Specialize Go interface dispatch

Slow code

type Hasher interface{ Sum(data []byte) []byte }

type sha256H struct{}
func (sha256H) Sum(d []byte) []byte { /* ... */ return nil }

type md5H struct{}
func (md5H) Sum(d []byte) []byte { /* ... */ return nil }

func NewHasher(kind string) Hasher {
    if kind == "sha256" { return sha256H{} }
    return md5H{}
}

// Hot loop
for _, x := range data {
    h := NewHasher("sha256")
    h.Sum(x)   // interface call, no inlining
}

Each h.Sum(x) goes through itab dispatch.

Optimized — concrete return for hot path

// Two functions: one returns concrete, one returns interface.
func NewSHA256() sha256H { return sha256H{} }

// Hot loop uses concrete
h := NewSHA256()
for _, x := range data {
    h.Sum(x)   // direct call, inlined
}

Benchmark

Interface Concrete
h.Sum per call 3.5 ns 0.5 ns

7× speedup on hot loops.

Tradeoff

  • Loses polymorphism — caller must know the concrete type.
  • Acceptable when type is fixed for the loop's duration.
  • Don't apply broadly; the compiler can't know which call sites are hot.

Optimization 9: Static factory methods for monomorphic call sites

Slow code (Java)

abstract class Logger {
    abstract LogEntry create(String msg);
}

class StdLogger extends Logger {
    LogEntry create(String msg) { return new LogEntry(msg); }
}

// Single-typed call site
Logger logger = new StdLogger();
LogEntry e = logger.create("hi");

Even monomorphic, the dispatch goes through INVOKEVIRTUAL. JIT inlines after warmup, but cold call sites pay full cost.

Optimized — static factory method

public final class LogEntry {
    public static LogEntry of(String msg) {
        return new LogEntry(msg);
    }
    private LogEntry(String msg) { /* ... */ }
}

// Caller
LogEntry e = LogEntry.of("hi");   // INVOKESTATIC, no vtable

Benchmark

Virtual factory Static factory
Cold call 5 ns 1 ns
Hot (JIT inlined) 1 ns 1 ns

5× faster on cold paths.

Tradeoff

  • No polymorphism (cannot override).
  • For closed-set products, static factories (Integer.valueOf, List.of) are ideal.
  • Don't confuse with GoF Factory Method — static factory is a different pattern.

Optimization 10: Eliminate factory entirely for closed-set variants

Slow code (Python)

class ColorFactory:
    @staticmethod
    def create(name: str) -> "Color":
        if name == "red":   return Red()
        if name == "green": return Green()
        if name == "blue":  return Blue()
        raise ValueError(name)

c = ColorFactory.create("red")

For three fixed colors, the factory adds boilerplate without flexibility.

Optimized — direct enum

from enum import Enum

class Color(Enum):
    RED = (255, 0, 0)
    GREEN = (0, 255, 0)
    BLUE = (0, 0, 255)

c = Color.RED   # zero overhead

Benchmark

Factory call Enum access
Per access 250 ns 50 ns

5× faster. Plus: type-safe at the source level.

Tradeoff

  • Enums are static — cannot extend at runtime.
  • Add new colors = recompile.
  • For closed sets, this is the right design.

Optimization Tips

How to find Factory Method bottlenecks

  1. Profile. async-profiler (Java), pprof (Go), py-spy (Python).
  2. Look for time in Class.forName, newInstance, Constructor.getDeclared*.
  3. JIT warnings: Java's -XX:+PrintInlining reports failed inlines (megamorphism).
  4. GC pressure: factories that allocate a lot show up in alloc profiles.
  5. Benchmark before and after. Don't assume.

Optimization checklist

  • Cache Supplier/Function instead of Class.forName.
  • Use LambdaMetafactory for dynamic creation on hot path.
  • Avoid megamorphic call sites; split hot paths.
  • Pool expensive products instead of recreating.
  • Use immutable maps (Map.of) for read-only registries.
  • Cache pure factories with lru_cache.
  • Lazy-import heavy plugins.
  • Return concrete types in Go hot loops.
  • Static factory methods for monomorphic, closed-set products.
  • Replace factory with enum if variants are fixed.

Anti-optimizations

  • Caching mutable products. Causes shared-state bugs.
  • Reflection on the hot path. Use MethodHandle / LambdaMetafactory.
  • Premature pooling. Most objects are cheap to construct.
  • Eager initialization for cold paths. Lazy is often better.
  • Switching to interface for hot loops in Go. Concrete is faster.

← Find-Bug · Creational · Roadmap

Factory Method roadmap complete. All 8 files: junior · middle · senior · professional · interview · tasks · find-bug · optimize.

Next pattern: Abstract Factory (pending).