Strategy — Professional Level¶

Source: refactoring.guru/design-patterns/strategy Prerequisite: Senior

Table of Contents¶

Introduction
JIT and Inline Caches
Megamorphic Call Sites
Lambda Performance vs Class
Closure Allocation
Strategy in Hot Paths — Branch-free Alternatives
Thread-safe Hot Swaps
Memory & Cache Locality
Microbenchmark Anatomy
Cross-Language Comparison
Strategy in Async/Reactive Pipelines
Diagrams
Related Topics

Introduction¶

A Strategy at the professional level is examined for what the runtime makes of it: how the JIT inlines monomorphic call sites, how it falls back to vtable dispatch on megamorphic ones, how lambdas allocate (or don't), and where the inevitable performance cliffs are.

For high-throughput services — payment routing, request dispatch, encoders in tight loops — the Strategy machinery itself can be measurable. This document quantifies it.

JIT and Inline Caches¶

Monomorphic call site¶

final PaymentStrategy s = new StripeStrategy();
for (Order o : orders) {
    s.pay(o);   // s.getClass() is always StripeStrategy here
}

The HotSpot JIT records the type seen at this call site. After warmup (usually ~10K invocations), it sees one type → builds a monomorphic inline cache. Result: the call is dispatched as a direct call. If the method body is small, it's inlined entirely.

After C2 compiles, this loop is identical in cost to:

for (Order o : orders) { stripe_pay_inlined(o); }

Bimorphic call site¶

PaymentStrategy s = pickByOrder(o);   // returns Stripe or Adyen
s.pay(o);

Two types observed → bimorphic inline cache. Cost: a single type check + direct call. Still very fast (~1 ns).

Polymorphic / megamorphic¶

3+ types → falls back to vtable lookup. Cost: ~2-3 ns per call. The body cannot be inlined into the caller. Branch prediction can still help, but you've lost the JIT's ability to fuse the strategy into the surrounding code.

Megamorphic Call Sites¶

When 10+ strategies funnel through one site (e.g., dispatcher.handle(req) for many handler types), the JIT gives up on inlining. Symptoms:

CPU profile shows time inside virtual dispatch.
Adding a new strategy slightly slows the entire path.

Mitigations:

1. Split call sites¶

// Bad: one site, megamorphic
strategies.forEach(s -> s.handle(req));

// Better: separate call sites if you can identify groups
fastStrategies.forEach(s -> s.handle(req));
slowStrategies.forEach(s -> s.handle(req));

Each site sees fewer types → re-monomorphic.

2. Type-specialized dispatch¶

switch (req.kind()) {
    case CARD:    cardStrategy.pay(req); break;
    case CRYPTO:  cryptoStrategy.pay(req); break;
    case BANK:    bankStrategy.pay(req); break;
}

Each branch is monomorphic. JIT inlines per branch. Faster than virtual dispatch — but couples the dispatcher to the family.

3. Code generation¶

For ML pipelines or query engines, generate a specialized dispatcher per query / model. Apache Drill, Spark Catalyst, and ART (Android Runtime) do this — runtime code gen avoids vtable cost entirely.

Numbers (rough, per JVM, JDK 21, x86)¶

Site type	ns/call	Inlinable
Direct call	0.3	Yes
Monomorphic Strategy	0.3	Yes
Bimorphic	0.6	Yes
Polymorphic (3-7)	1-2	Sometimes
Megamorphic (8+)	2-4	No

For business code: irrelevant. For 10-million-ops loops: matters.

Lambda Performance vs Class¶

Java 8+ lambdas¶

A Function<T, R> lambda is desugared into an invokedynamic-backed instance. The first call constructs an anonymous class via the LambdaMetafactory. Subsequent calls reuse the cached instance.

Function<Order, Money> price = order -> Money.of(order.cents());

Cost after warmup: identical to a hand-written class. The JIT inlines through the lambda just as it would through a class.

Closure capture¶

double rate = config.discountRate();
Function<Money, Money> discount = m -> m.times(1 - rate);   // captures rate

Each invocation of the outer method allocates a new lambda instance. If you're in a hot loop creating closures, you're allocating.

// Bad — allocates per call
list.forEach(item -> price(item, rate));

// Better — hoist
Function<Item, Money> priceWithRate = item -> price(item, rate);
list.forEach(priceWithRate);

Non-capturing lambdas¶

A non-capturing lambda is singleton-ish: the LambdaMetafactory returns the same instance each time. No allocation pressure.

list.sort((a, b) -> a.price() - b.price());   // captures nothing → cached

Other languages¶

Kotlin — inline fun + lambda compiles to no allocation at all; the lambda body is inlined into the caller.
C# — closures over locals box those locals onto the heap. Hot paths should avoid capture.
C++ lambdas — stack-allocated by default; std::function boxes (allocates).
Go — closures allocate on the heap via escape analysis.
Python — closures are objects; per-call dispatch is dict-lookup-based, ~50× slower than C-equivalent.

Closure Allocation¶

def make_strategies():
    return [lambda x, r=rate: x * r for rate in [0.1, 0.2, 0.3]]

Three lambda objects, each capturing r. In Python: ~3 small objects, negligible. In Java/Kotlin in a hot loop: measurable. Profile before optimizing.

Strategy in Hot Paths — Branch-free Alternatives¶

If profiling shows the Strategy dispatch itself is the bottleneck (rare but real in interpreters, packet processing, game engines):

1. Function table indexed by enum¶

typedef int (*op_t)(int);
static op_t ops[] = { op_add, op_sub, op_mul };
ops[op](x);

One indirect call. No vtable. No instance.

2. SIMD / vectorized strategies¶

For data-parallel work, the "strategy" is a kernel that runs over arrays. Branchless dispatch across N items via SIMD instead of per-item dispatch.

3. Code-generated dispatchers¶

Apache Arrow / DuckDB generate per-query code with the operators inlined. The "strategies" are picked at compile time of the query, not at execution.

Trade-offs¶

These give up Strategy's polymorphic flexibility for raw speed. Use only when profiling demands it.

Thread-safe Hot Swaps¶

class StrategyHolder {
    private volatile Strategy current = new DefaultStrategy();

    public void swap(Strategy s) { current = s; }
    public Result run(Input i)   { return current.run(i); }
}

volatile gives sequential-consistent reads/writes for the reference. Readers always see a fully-published strategy.

Memory ordering caveats¶

// Strategy with config:
class ConfiguredStrategy implements Strategy {
    final Config cfg;
    ConfiguredStrategy(Config c) { this.cfg = c; }
}

Because cfg is final, the JMM guarantees that any thread observing the strategy reference also observes a fully-initialized cfg. Without final, you'd need either volatile on every field or careful synchronization.

CAS-based atomic swap¶

class StrategyHolder {
    private final AtomicReference<Strategy> ref = new AtomicReference<>(new DefaultStrategy());
    public boolean trySwap(Strategy expected, Strategy newOne) {
        return ref.compareAndSet(expected, newOne);
    }
}

Useful when only one of N writers should win — e.g., during a deploy where multiple sources try to set the strategy.

Strategy lifecycle¶

If swapping a strategy means closing the old one (e.g., it holds a connection pool), don't just swap — drain. Pattern:

Atomically swap reference.
Wait for in-flight calls to drain (counter or Phaser).
Close the old instance.

Memory & Cache Locality¶

A Strategy adds one indirection: context → strategy → method body. In CPU-cache terms:

Strategy reference — usually hot in L1.
Strategy object — usually hot in L1/L2 if it's a singleton.
Method body — depends on how often this strategy runs.

The penalty is real but tiny for any code where the strategy is reused. For one-shot strategies (allocate, call once, discard), allocation cost dominates dispatch.

For ultra-hot paths, struct-of-arrays (data-oriented design) often beats Strategy: store all data flat, pick the algorithm once, iterate. ECS games engines do this.

Microbenchmark Anatomy¶

A correct JMH benchmark¶

@State(Scope.Benchmark)
public class StrategyBench {
    Strategy mono = new ConcreteA();
    Strategy[] poly = { new ConcreteA(), new ConcreteB(), new ConcreteC() };
    int i;

    @Benchmark
    public int monomorphic() {
        return mono.run(42);
    }

    @Benchmark
    public int megamorphic() {
        return poly[(i++) % poly.length].run(42);
    }
}

Run with warmup. Compare. The difference is the dispatch cost — usually small.

Pitfalls¶

Constant folding — if the benchmark always uses the same input, the JIT can fold the entire call. Use Blackhole.consume() or vary inputs.
Dead code elimination — return values that aren't used get optimized away.
Insufficient warmup — early measurements include interpreter cost.

For real measurements, JMH (Java), benchstat (Go), Criterion (Rust), pytest-benchmark (Python).

Cross-Language Comparison¶

Language	Strategy Dispatch	Inlining	Hot-swap
Java/JVM	vtable / invokedynamic	Yes (warmup-based)	`volatile` ref
Kotlin	vtable; `inline` removes it	Yes (compile-time for inline)	`@Volatile` ref
C#	vtable / interface dispatch	Yes (tiered JIT)	`volatile` field
Go	itable lookup (interface)	Limited (compiler-driven)	`atomic.Value`
Rust	`dyn Trait` is a vtable; static dispatch with generics inlines	Yes (with monomorphization)	`ArcSwap`
C++	virtual = vtable; templates = static dispatch	Static is inlined	Atomic shared_ptr
Python	dict lookup (everything is dynamic)	None (interpreter)	Just reassign attribute
JavaScript	hidden classes + IC; megamorphic site degenerates	Yes (V8 TurboFan)	Just reassign

In statically dispatched languages (Rust generics, C++ templates), Strategy compiles to zero overhead — the algorithm is monomorphized into the caller. The trade-off: code bloat (one specialization per type).

In dynamically dispatched languages, Strategy is just normal method calls; runtime does the bookkeeping.

In interpreted languages (Python, Ruby), every call is dynamic anyway; Strategy is essentially free in relative terms but expensive in absolute terms — choose carefully in hot loops.

Strategy in Async/Reactive Pipelines¶

interface PaymentStrategy {
    suspend fun pay(amount: Money, ctx: PaymentContext): PaymentResult
}

Async strategies introduce more concerns:

Backpressure — a slow strategy stalls a stream; pick concurrent strategies carefully.
Cancellation — does the strategy honor cancellation? Stuck I/O calls don't.
Errors — async errors propagate differently. Ensure each strategy normalizes failures into the contract.

Reactive (Project Reactor)¶

public interface ReactiveStrategy {
    Mono<Result> run(Input input);
}

A reactive Strategy returns a Mono / Flux. The Context can: - flatMap to chain. - switchMap to swap mid-flight (advanced; usually not what you want). - Apply timeouts, retries via Mono operators on the strategy's output.

Pitfall: blocking inside an async Strategy¶

If a strategy does blocking I/O on the main reactor thread, it stalls everything. Wrap with subscribeOn(Schedulers.boundedElastic()) or refactor to true async.

Diagrams¶

Inline cache evolution¶

Hot-swap with drain¶

sequenceDiagram participant Loader participant Holder participant Old participant New Loader->>Holder: swap(New) Holder->>Holder: ref = New (atomic) Note right of Holder: in-flight calls still on Old Holder->>Old: wait for drain Old-->>Holder: drained Holder->>Old: close()

Template Method — performance trade-offs vs Strategy
Visitor — double dispatch costs
Polymorphism vs branching
JIT & inline caching
Async patterns

← Senior · Interview →