Composing Methods — Optimize¶
12 cases where a refactoring is functionally correct but introduces a subtle performance issue. Identify the cost and propose a fix that preserves clarity.
Optimize 1 — Replace Temp with Query in a hot loop (Java)¶
Functionally correct:
class PriceCalculator {
private List<Item> items;
double total() {
double t = 0;
for (Item i : items) {
if (subtotal() > 100) t += i.price() * 0.9;
else t += i.price();
}
return t;
}
private double subtotal() {
double s = 0;
for (Item i : items) s += i.price();
return s;
}
}
Cost & Fix
`subtotal()` is called once per loop iteration — O(N) calls, each O(N) — total **O(N²)** instead of the original O(N). **Fix:** Compute once outside the loop. Replace Temp with Query is great for clarity; it can be a perf nightmare in inner loops.Optimize 2 — Extract Method that breaks JIT inlining (Java)¶
double sum(double[] xs) {
double s = 0;
for (double x : xs) s += transform(x);
return s;
}
private double transform(double x) {
return Math.sqrt(x) + Math.log(x + 1) + Math.exp(-x) + Math.sin(x);
}
This was originally:
double sum(double[] xs) {
double s = 0;
for (double x : xs) s += Math.sqrt(x) + Math.log(x + 1) + Math.exp(-x) + Math.sin(x);
return s;
}
Cost & Fix
Extracting `transform` is good for clarity. In hot benchmarks, `transform`'s ~50 bytes is well under HotSpot's inline threshold, so it's inlined. **Usually no perf cost.** But: if `transform` grows over time to >325 bytes (FreqInlineSize), inlining stops, and the loop suddenly becomes 5–10× slower. **Fix (preventive):** keep `transform` small. If it must grow, accept that the hot path won't inline it; consider: - Specializing the helper for the inner loop (`sumWithTransform`). - Using `@HotSpotIntrinsicCandidate` patterns. - Verifying with `-XX:+PrintInlining`. **Verify, don't guess:** Look for `transform inline (hot)` in the output. If you see `transform too big`, the refactor crossed a threshold.Optimize 3 — Method Object that escapes (Java)¶
class Account {
int gamma(int iv, int q, int ytd) {
Gamma g = new Gamma(iv, q, ytd);
cache.put(iv, g); // ❌ stores reference
return g.compute();
}
}
Cost & Fix
By storing `g` in the cache, the JIT can no longer prove `g` doesn't escape the method. **Escape analysis fails**, scalar replacement doesn't apply, every `gamma` call allocates a heap object. **Fix:** Don't cache the Method Object. Cache the *result*. Now `Gamma` doesn't escape — it's stack-allocatable.Optimize 4 — Extract Method introduces interface dispatch (Go)¶
type Reducer interface {
Reduce(float64, float64) float64
}
type Sum struct{}
func (Sum) Reduce(a, b float64) float64 { return a + b }
func Total(xs []float64, r Reducer) float64 {
var t float64
for _, x := range xs {
t = r.Reduce(t, x) // virtual call
}
return t
}
Original:
Cost & Fix
The "extract for flexibility" added an interface dispatch per iteration. Without PGO, Go can't inline through the interface — every call is an itab lookup. **Fix options:** 1. **Don't generalize prematurely.** If you have one reducer, don't introduce an interface. 2. **Use generics (Go 1.18+):** The compiler instantiates a specialized version; `r` is a concrete function pointer. 3. **PGO (Go 1.21+):** capture a profile, rebuild — Go can devirtualize hot interface calls. Lesson: Extract Method that *introduces a polymorphism boundary* in a hot loop pays a per-call cost.Optimize 5 — Inline Method that bloats a hot caller (Java)¶
double process(double[] xs, double[] ys) {
double s = 0;
for (int i = 0; i < xs.length; i++) {
// inlined fragment from the formerly-called combiner:
double a = xs[i] * 2 + 1;
double b = ys[i] * 3 - 1;
double c = a * b + Math.sqrt(a + b) - Math.log(b + 1);
// ... 30 more lines ...
s += c;
}
return s;
}
Cost & Fix
By inlining a 50-line helper into the hot caller, the caller may exceed `FreqInlineSize` (325 bytes) — its callers stop inlining *it*, propagating the perf hit upward. **Fix:** Re-extract. The original Extract Method version is often faster precisely because it's small enough to inline. Verify: `-XX:+PrintInlining` shows whether your hot method is being inlined into its callers. If yes, leave it small.Optimize 6 — Extract Variable allocates per call (Python)¶
def is_promo_eligible(user, cart):
countries = {"US", "CA", "GB", "AU", "FR", "DE"} # ❌ allocated each call
is_country = user.country in countries
return is_country and cart.total > 30
Cost & Fix
Every call re-allocates the set literal. For ~10M calls/day, this is millions of pointless allocations. **Fix:** Hoist to module scope. `frozenset` makes intent explicit and is hash-cached. CPython 3.12+ does optimize some constant collections, but explicit module-level is the safe bet.Optimize 7 — Substitute Algorithm with allocation in the hot path (Java)¶
Original:
boolean containsAny(String s, char[] needles) {
for (int i = 0; i < s.length(); i++) {
for (char n : needles) if (s.charAt(i) == n) return true;
}
return false;
}
"Refactored":
boolean containsAny(String s, char[] needles) {
Set<Character> set = new HashSet<>();
for (char n : needles) set.add(n);
for (int i = 0; i < s.length(); i++) {
if (set.contains(s.charAt(i))) return true;
}
return false;
}
Cost & Fix
Allocates a `HashSet`, plus N `Character` boxes per `needles` element, plus N `Character` boxes per `s.charAt(i)` boxing for `contains`. For tiny `needles` (2–4 chars) the original linear scan is **dramatically faster**. **Fix options:** 1. Keep linear scan for small `needles`. 2. Pre-build the set once and reuse:private static final Set<Character> NEEDLES_CACHE = ...;
boolean containsAny(String s) {
for (int i = 0; i < s.length(); i++) {
if (NEEDLES_CACHE.contains(s.charAt(i))) return true;
}
return false;
}
boolean containsAny(String s, char[] needles) {
long mask0 = 0, mask1 = 0;
for (char c : needles) {
if (c < 64) mask0 |= 1L << c;
else if (c < 128) mask1 |= 1L << (c - 64);
}
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c < 64 ? ((mask0 >> c) & 1) != 0 : c < 128 && ((mask1 >> (c - 64)) & 1) != 0)
return true;
}
return false;
}
Optimize 8 — Extract Method captures collection by-reference, then mutates (Java)¶
List<Order> filterRecent(List<Order> orders) {
return removeOld(orders);
}
private List<Order> removeOld(List<Order> orders) {
orders.removeIf(o -> o.date().isBefore(cutoff)); // ❌ mutates caller's list
return orders;
}
Cost & Fix
The "extracted" method mutates the input — a hidden side effect. Functionally it works for many test cases but breaks when callers don't expect the input to change. (Also throws `UnsupportedOperationException` if caller passes `List.of(...)`.) **Fix:** Return a new list. Trade-off: allocates a new list. For very large inputs, the mutating version is faster — but expose that with a clear name (`removeOldInPlace`).Optimize 9 — Replace Temp with Query that defeats GC (Java)¶
class Report {
String generate(Data data) {
return rows(data) + "\n" + summary(data);
}
private String rows(Data data) {
StringBuilder sb = new StringBuilder();
for (Row r : data.rows()) sb.append(r).append('\n');
return sb.toString();
}
private String summary(Data data) {
return "Total: " + data.total();
}
}
Cost & Fix
Each call allocates a fresh `StringBuilder` + a `String`. Then the orchestrator concatenates with `+`, allocating *another* StringBuilder + String. For a 1MB rows output, you've doubled the peak heap. **Fix:** Stream-write to a single buffer. Now there's one buffer; helpers append to it.Optimize 10 — Inline Temp losing memoization (Python)¶
Original:
def cost(item, ctx):
rate = ctx.tax_rate(item.region) # cached internally
return item.price * (1 + rate)
"Refactored":
Cost & Fix
If `ctx.tax_rate` is cached, the inline is fine. If it's *not*, the temp was the cache. Inlining is a no-op here, but in: `tax_rate` is called once per item. The temp would have helped only if hoisted outside the loop: Or, if all items share a region: Lesson: Inline Temp is safe for *cheap* expressions. For expensive ones, the temp may be load-bearing — and its scope matters.Optimize 11 — Method Object reused across requests (Java)¶
class OrderProcessor {
private final Gamma g = new Gamma(); // ❌
int process(int iv, int q, int ytd) {
g.set(iv, q, ytd);
return g.compute();
}
}
Cost & Fix
The processor field-caches a Gamma to avoid allocation. **But** Gamma is single-use — its fields are partial state during one computation. Concurrent calls now race on shared state, producing wrong results. **Fix:** Allocate per call (escape analysis will erase it anyway in JIT-compiled code). Or, if you really insist on reuse, use `ThreadLocalOptimize 12 — Extract Method that captures this in a lambda (Java)¶
Stream<Order> filterValid() {
return orders.stream().filter(this::isValid); // captures `this`
}
private boolean isValid(Order o) {
return o.amount() > 0 && customer.isActive(o.customerId()); // uses field
}
Cost & Fix
`this::isValid` captures `this` (a method reference). For a small extracted helper, the allocation is per-stream, not per-element — fine for most cases. For high-throughput hot paths, you might prefer: This way, the lambda captures only the field used, and the JIT can sometimes eliminate the capture entirely. In practice, **measure first**. The JIT inlines most of these; the lambda allocation is one-time. Bigger lesson: `this::method` references vs. inline lambdas have subtly different escape characteristics. For 99% of code, this doesn't matter. For the 1% that does, profile.Patterns of perf regression¶
| Refactor | Risk | Mitigation |
|---|---|---|
| Replace Temp with Query | Recompute | Cache the orchestrator-level result |
| Extract Method | Inlining cliff | Keep methods small |
| Method Object that escapes | Allocation | Don't store, don't return |
| Extract via interface | Virtual dispatch | Generics or PGO |
| Inline that bloats caller | Inlining cliff | Re-extract |
| Extract that allocates collection | Per-call alloc | Hoist constants |
Next¶
- Recap: interview.md
- Spot the bug: find-bug.md
- Theory: professional.md