Skip to content

Composing Methods — Optimize

12 cases where a refactoring is functionally correct but introduces a subtle performance issue. Identify the cost and propose a fix that preserves clarity.


Optimize 1 — Replace Temp with Query in a hot loop (Java)

Functionally correct:

class PriceCalculator {
    private List<Item> items;

    double total() {
        double t = 0;
        for (Item i : items) {
            if (subtotal() > 100) t += i.price() * 0.9;
            else t += i.price();
        }
        return t;
    }

    private double subtotal() {
        double s = 0;
        for (Item i : items) s += i.price();
        return s;
    }
}
Cost & Fix `subtotal()` is called once per loop iteration — O(N) calls, each O(N) — total **O(N²)** instead of the original O(N). **Fix:** Compute once outside the loop.
double total() {
    double sub = subtotal();
    boolean discount = sub > 100;
    double t = 0;
    for (Item i : items) t += discount ? i.price() * 0.9 : i.price();
    return t;
}
Replace Temp with Query is great for clarity; it can be a perf nightmare in inner loops.

Optimize 2 — Extract Method that breaks JIT inlining (Java)

double sum(double[] xs) {
    double s = 0;
    for (double x : xs) s += transform(x);
    return s;
}

private double transform(double x) {
    return Math.sqrt(x) + Math.log(x + 1) + Math.exp(-x) + Math.sin(x);
}

This was originally:

double sum(double[] xs) {
    double s = 0;
    for (double x : xs) s += Math.sqrt(x) + Math.log(x + 1) + Math.exp(-x) + Math.sin(x);
    return s;
}
Cost & Fix Extracting `transform` is good for clarity. In hot benchmarks, `transform`'s ~50 bytes is well under HotSpot's inline threshold, so it's inlined. **Usually no perf cost.** But: if `transform` grows over time to >325 bytes (FreqInlineSize), inlining stops, and the loop suddenly becomes 5–10× slower. **Fix (preventive):** keep `transform` small. If it must grow, accept that the hot path won't inline it; consider: - Specializing the helper for the inner loop (`sumWithTransform`). - Using `@HotSpotIntrinsicCandidate` patterns. - Verifying with `-XX:+PrintInlining`. **Verify, don't guess:**
java -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining ...
Look for `transform inline (hot)` in the output. If you see `transform too big`, the refactor crossed a threshold.

Optimize 3 — Method Object that escapes (Java)

class Account {
    int gamma(int iv, int q, int ytd) {
        Gamma g = new Gamma(iv, q, ytd);
        cache.put(iv, g);   // ❌ stores reference
        return g.compute();
    }
}
Cost & Fix By storing `g` in the cache, the JIT can no longer prove `g` doesn't escape the method. **Escape analysis fails**, scalar replacement doesn't apply, every `gamma` call allocates a heap object. **Fix:** Don't cache the Method Object. Cache the *result*.
class Account {
    int gamma(int iv, int q, int ytd) {
        Integer cached = cache.get(iv);
        if (cached != null) return cached;
        int result = new Gamma(iv, q, ytd).compute();
        cache.put(iv, result);
        return result;
    }
}
Now `Gamma` doesn't escape — it's stack-allocatable.

Optimize 4 — Extract Method introduces interface dispatch (Go)

type Reducer interface {
    Reduce(float64, float64) float64
}

type Sum struct{}
func (Sum) Reduce(a, b float64) float64 { return a + b }

func Total(xs []float64, r Reducer) float64 {
    var t float64
    for _, x := range xs {
        t = r.Reduce(t, x)   // virtual call
    }
    return t
}

Original:

func Total(xs []float64) float64 {
    var t float64
    for _, x := range xs { t += x }
    return t
}
Cost & Fix The "extract for flexibility" added an interface dispatch per iteration. Without PGO, Go can't inline through the interface — every call is an itab lookup. **Fix options:** 1. **Don't generalize prematurely.** If you have one reducer, don't introduce an interface. 2. **Use generics (Go 1.18+):**
func Total[T any](xs []T, r func(acc, x T) T, zero T) T {
    acc := zero
    for _, x := range xs { acc = r(acc, x) }
    return acc
}
The compiler instantiates a specialized version; `r` is a concrete function pointer. 3. **PGO (Go 1.21+):** capture a profile, rebuild — Go can devirtualize hot interface calls. Lesson: Extract Method that *introduces a polymorphism boundary* in a hot loop pays a per-call cost.

Optimize 5 — Inline Method that bloats a hot caller (Java)

double process(double[] xs, double[] ys) {
    double s = 0;
    for (int i = 0; i < xs.length; i++) {
        // inlined fragment from the formerly-called combiner:
        double a = xs[i] * 2 + 1;
        double b = ys[i] * 3 - 1;
        double c = a * b + Math.sqrt(a + b) - Math.log(b + 1);
        // ... 30 more lines ...
        s += c;
    }
    return s;
}
Cost & Fix By inlining a 50-line helper into the hot caller, the caller may exceed `FreqInlineSize` (325 bytes) — its callers stop inlining *it*, propagating the perf hit upward. **Fix:** Re-extract. The original Extract Method version is often faster precisely because it's small enough to inline. Verify: `-XX:+PrintInlining` shows whether your hot method is being inlined into its callers. If yes, leave it small.

Optimize 6 — Extract Variable allocates per call (Python)

def is_promo_eligible(user, cart):
    countries = {"US", "CA", "GB", "AU", "FR", "DE"}   # ❌ allocated each call
    is_country = user.country in countries
    return is_country and cart.total > 30
Cost & Fix Every call re-allocates the set literal. For ~10M calls/day, this is millions of pointless allocations. **Fix:** Hoist to module scope.
_PROMO_COUNTRIES = frozenset({"US", "CA", "GB", "AU", "FR", "DE"})

def is_promo_eligible(user, cart):
    return user.country in _PROMO_COUNTRIES and cart.total > 30
`frozenset` makes intent explicit and is hash-cached. CPython 3.12+ does optimize some constant collections, but explicit module-level is the safe bet.

Optimize 7 — Substitute Algorithm with allocation in the hot path (Java)

Original:

boolean containsAny(String s, char[] needles) {
    for (int i = 0; i < s.length(); i++) {
        for (char n : needles) if (s.charAt(i) == n) return true;
    }
    return false;
}

"Refactored":

boolean containsAny(String s, char[] needles) {
    Set<Character> set = new HashSet<>();
    for (char n : needles) set.add(n);
    for (int i = 0; i < s.length(); i++) {
        if (set.contains(s.charAt(i))) return true;
    }
    return false;
}
Cost & Fix Allocates a `HashSet`, plus N `Character` boxes per `needles` element, plus N `Character` boxes per `s.charAt(i)` boxing for `contains`. For tiny `needles` (2–4 chars) the original linear scan is **dramatically faster**. **Fix options:** 1. Keep linear scan for small `needles`. 2. Pre-build the set once and reuse:
private static final Set<Character> NEEDLES_CACHE = ...;
boolean containsAny(String s) {
    for (int i = 0; i < s.length(); i++) {
        if (NEEDLES_CACHE.contains(s.charAt(i))) return true;
    }
    return false;
}
3. Use a primitive bitset for ASCII chars:
boolean containsAny(String s, char[] needles) {
    long mask0 = 0, mask1 = 0;
    for (char c : needles) {
        if (c < 64) mask0 |= 1L << c;
        else if (c < 128) mask1 |= 1L << (c - 64);
    }
    for (int i = 0; i < s.length(); i++) {
        char c = s.charAt(i);
        if (c < 64 ? ((mask0 >> c) & 1) != 0 : c < 128 && ((mask1 >> (c - 64)) & 1) != 0)
            return true;
    }
    return false;
}
Faster than both — but only worth it for hot paths.

Optimize 8 — Extract Method captures collection by-reference, then mutates (Java)

List<Order> filterRecent(List<Order> orders) {
    return removeOld(orders);
}

private List<Order> removeOld(List<Order> orders) {
    orders.removeIf(o -> o.date().isBefore(cutoff));   // ❌ mutates caller's list
    return orders;
}
Cost & Fix The "extracted" method mutates the input — a hidden side effect. Functionally it works for many test cases but breaks when callers don't expect the input to change. (Also throws `UnsupportedOperationException` if caller passes `List.of(...)`.) **Fix:** Return a new list.
private List<Order> removeOld(List<Order> orders) {
    return orders.stream()
                 .filter(o -> !o.date().isBefore(cutoff))
                 .toList();
}
Trade-off: allocates a new list. For very large inputs, the mutating version is faster — but expose that with a clear name (`removeOldInPlace`).

Optimize 9 — Replace Temp with Query that defeats GC (Java)

class Report {
    String generate(Data data) {
        return rows(data) + "\n" + summary(data);
    }
    private String rows(Data data) {
        StringBuilder sb = new StringBuilder();
        for (Row r : data.rows()) sb.append(r).append('\n');
        return sb.toString();
    }
    private String summary(Data data) {
        return "Total: " + data.total();
    }
}
Cost & Fix Each call allocates a fresh `StringBuilder` + a `String`. Then the orchestrator concatenates with `+`, allocating *another* StringBuilder + String. For a 1MB rows output, you've doubled the peak heap. **Fix:** Stream-write to a single buffer.
String generate(Data data) {
    StringBuilder sb = new StringBuilder(estimateSize(data));
    appendRows(data, sb);
    sb.append('\n').append("Total: ").append(data.total());
    return sb.toString();
}
private void appendRows(Data data, StringBuilder sb) {
    for (Row r : data.rows()) sb.append(r).append('\n');
}
Now there's one buffer; helpers append to it.

Optimize 10 — Inline Temp losing memoization (Python)

Original:

def cost(item, ctx):
    rate = ctx.tax_rate(item.region)   # cached internally
    return item.price * (1 + rate)

"Refactored":

def cost(item, ctx):
    return item.price * (1 + ctx.tax_rate(item.region))
Cost & Fix If `ctx.tax_rate` is cached, the inline is fine. If it's *not*, the temp was the cache. Inlining is a no-op here, but in:
def total(items, ctx):
    return sum(item.price * (1 + ctx.tax_rate(item.region)) for item in items)
`tax_rate` is called once per item. The temp would have helped only if hoisted outside the loop:
def total(items, ctx):
    rate_for = ctx.tax_rate    # bind once
    return sum(item.price * (1 + rate_for(item.region)) for item in items)
Or, if all items share a region:
def total(items, ctx, region):
    rate = ctx.tax_rate(region)
    return sum(item.price * (1 + rate) for item in items)
Lesson: Inline Temp is safe for *cheap* expressions. For expensive ones, the temp may be load-bearing — and its scope matters.

Optimize 11 — Method Object reused across requests (Java)

class OrderProcessor {
    private final Gamma g = new Gamma();   // ❌

    int process(int iv, int q, int ytd) {
        g.set(iv, q, ytd);
        return g.compute();
    }
}
Cost & Fix The processor field-caches a Gamma to avoid allocation. **But** Gamma is single-use — its fields are partial state during one computation. Concurrent calls now race on shared state, producing wrong results. **Fix:** Allocate per call (escape analysis will erase it anyway in JIT-compiled code).
int process(int iv, int q, int ytd) {
    return new Gamma(iv, q, ytd).compute();
}
Or, if you really insist on reuse, use `ThreadLocal` — but escape analysis is almost always cheaper.

Optimize 12 — Extract Method that captures this in a lambda (Java)

Stream<Order> filterValid() {
    return orders.stream().filter(this::isValid);   // captures `this`
}

private boolean isValid(Order o) {
    return o.amount() > 0 && customer.isActive(o.customerId());   // uses field
}
Cost & Fix `this::isValid` captures `this` (a method reference). For a small extracted helper, the allocation is per-stream, not per-element — fine for most cases. For high-throughput hot paths, you might prefer:
Stream<Order> filterValid() {
    Customer c = this.customer;
    return orders.stream().filter(o -> o.amount() > 0 && c.isActive(o.customerId()));
}
This way, the lambda captures only the field used, and the JIT can sometimes eliminate the capture entirely. In practice, **measure first**. The JIT inlines most of these; the lambda allocation is one-time. Bigger lesson: `this::method` references vs. inline lambdas have subtly different escape characteristics. For 99% of code, this doesn't matter. For the 1% that does, profile.

Patterns of perf regression

Refactor Risk Mitigation
Replace Temp with Query Recompute Cache the orchestrator-level result
Extract Method Inlining cliff Keep methods small
Method Object that escapes Allocation Don't store, don't return
Extract via interface Virtual dispatch Generics or PGO
Inline that bloats caller Inlining cliff Re-extract
Extract that allocates collection Per-call alloc Hoist constants

Next