Method Chaining — Optimization¶

Twelve before/after exercises focused on chain performance, allocation reduction, and JIT-friendliness.

Optimization 1 — Use `final` on builder¶

Before:

public class HttpRequestBuilder { ... }

After:

public final class HttpRequestBuilder { ... }

Why: the JIT can fully devirtualize and inline setters when the type is final. Builders are rarely subclassed.

Optimization 2 — Cache non-capturing lambdas¶

Before:

list.stream().filter(s -> !s.isEmpty()).toList();

(Re-allocates the lambda? Actually no — non-capturing lambdas are cached after the first use.) But:

list.stream().filter(s -> s.startsWith(this.prefix)).toList();

This does capture and may allocate per call.

After:

private static final Predicate<String> NOT_EMPTY = s -> !s.isEmpty();
list.stream().filter(NOT_EMPTY).toList();

For capturing lambdas in a hot loop:

String p = this.prefix;
Predicate<String> pred = s -> s.startsWith(p);   // capture once
for (var x : work) x.stream().filter(pred).count();

Optimization 3 — Use `toList()` over `Collectors.toList()`¶

Before:

List<String> result = stream.collect(Collectors.toList());

After:

List<String> result = stream.toList();

Why: toList() (Java 16+) directly collects into an unmodifiable list, skipping the Collector machinery. ~20% faster on small streams.

Optimization 4 — Pre-size collections in builders¶

Before:

Builder b = builder();
for (int i = 0; i < 1_000_000; i++) b.add(items.get(i));

After:

Builder b = builder().withCapacity(1_000_000);

If the builder uses an internal ArrayList, sizing it upfront avoids 20+ resize allocations.

Optimization 5 — Avoid intermediate collection in stream chain¶

Before:

list.stream()
    .filter(p1)
    .collect(Collectors.toList())
    .stream()
    .filter(p2)
    .toList();

After:

list.stream()
    .filter(p1)
    .filter(p2)
    .toList();

Why: the intermediate .collect().stream() materializes the full list, then iterates again. Fusing eliminates one full pass.

Optimization 6 — Replace `Stream` with `for` in hot loops¶

Before:

public int sumPrices(List<Item> items) {
    return items.stream().mapToInt(Item::price).sum();
}

After (when called millions of times):

public int sumPrices(List<Item> items) {
    int s = 0;
    for (int i = 0, n = items.size(); i < n; i++) s += items.get(i).price();
    return s;
}

Why: stream pipelines have per-element overhead from lambda dispatch and Sink wrapping. Hand loops can vectorize and use registers efficiently. 2-10× speedup on tight numeric kernels.

Optimization 7 — Use `parallelStream` only when justified¶

Before:

list.parallelStream().filter(...).map(...).toList();

After:

list.stream().filter(...).map(...).toList();

Why: parallelStream only pays off when (a) the workload per element is substantial, (b) the source is splittable (ArrayList yes, LinkedList no), (c) the order of results doesn't matter or is preserved. Otherwise overhead exceeds parallelism gain.

Optimization 8 — Reuse builders for same configuration¶

Before:

for (Request r : requests) {
    HttpRequest req = HttpRequest.builder(r.url())
        .timeout(Duration.ofSeconds(5))
        .header("Auth", token)
        .build();
    send(req);
}

After (when many fields are fixed):

HttpRequest.Builder template = HttpRequest.builder()
    .timeout(Duration.ofSeconds(5))
    .header("Auth", token);

for (Request r : requests) {
    HttpRequest req = template.copy().url(r.url()).build();
    send(req);
}

Why: less per-iteration allocation. Note: the builder's copy() must not share mutable internal state.

Optimization 9 — Use mutating builder over immutable copy chain¶

Before:

User u = new User("alice", 30);
u = u.withName("Alice").withAge(31).withEmail("a@b");   // 3 records allocated

After (when many changes happen together):

User u = new User.Builder().name("Alice").age(31).email("a@b").build();   // 1 builder + 1 record

Why: chained withX allocates an intermediate record per call. For 1-2 changes, fine. For many, builder is cheaper.

Optimization 10 — JFR allocation profile¶

java -XX:StartFlightRecording=duration=60s,filename=app.jfr -jar app.jar
jfr print --events jdk.ObjectAllocationInNewTLAB app.jfr | sort | uniq -c | sort -nr | head -20

Look for: - Lambda hidden classes allocated frequently - Builder allocations in tight loops - Stream pipeline node allocations (StatelessOp, etc.)

Fix the top hotspots first.

Optimization 11 — Stream `forEachOrdered` vs `forEach` for parallel¶

For parallel streams, forEachOrdered enforces source order, which serializes the terminal step. If order doesn't matter, use forEach for true parallelism.

parallelStream().forEach(this::process);          // unordered, parallel
parallelStream().forEachOrdered(this::process);   // ordered, partly serialized

Optimization 12 — Avoid `Optional.of` when null is normal¶

Before:

return Optional.ofNullable(map.get(key))
    .map(Value::transform)
    .orElse(default);

After (in hot path):

Value v = map.get(key);
return v == null ? default : v.transform();

Why: Optional has a small allocation cost (the wrapper) and a small dispatch cost (map/orElse calls). For hot paths called millions of times, plain null checks are faster. For cold or boundary code, Optional is fine.

Tools cheat sheet¶

Tool	Purpose
`-XX:+PrintInlining`	Inlining decisions
`async-profiler -e alloc`	Allocation flame graph
`jmh`	Microbenchmark stream vs loop
`-XX:+UnlockDiagnosticVMOptions -XX:+PrintEliminateAllocations`	EA decisions
JFR + JMC	Allocation, GC, JIT events

When chain optimization is worth it¶

Profile shows allocation hotspot in stream pipeline
Inner loops processing millions of elements
High-throughput services where every ns matters
Builder chains in request paths

When it isn't¶

Cold paths (initialization, config)
Code clarity matters more than tiny speedup
The JIT already collapses the chain (verify with PrintInlining)
The chain is bounded (called once per request, etc.)

Memorize this: chains are JIT-friendly when monomorphic and lambdas are stable. The main allocation costs are intermediate Stream nodes, capturing lambdas, and builder objects — usually eliminated by EA but not always. Profile, then optimize the top hotspot.

Method Chaining — Optimization¶

Optimization 1 — Use final on builder¶

Optimization 2 — Cache non-capturing lambdas¶

Optimization 3 — Use toList() over Collectors.toList()¶

Optimization 4 — Pre-size collections in builders¶

Optimization 5 — Avoid intermediate collection in stream chain¶

Optimization 6 — Replace Stream with for in hot loops¶

Optimization 7 — Use parallelStream only when justified¶

Optimization 8 — Reuse builders for same configuration¶

Optimization 9 — Use mutating builder over immutable copy chain¶

Optimization 10 — JFR allocation profile¶

Optimization 11 — Stream forEachOrdered vs forEach for parallel¶

Optimization 12 — Avoid Optional.of when null is normal¶

Tools cheat sheet¶

When chain optimization is worth it¶

When it isn't¶

Optimization 1 — Use `final` on builder¶

Optimization 3 — Use `toList()` over `Collectors.toList()`¶

Optimization 6 — Replace `Stream` with `for` in hot loops¶

Optimization 7 — Use `parallelStream` only when justified¶

Optimization 11 — Stream `forEachOrdered` vs `forEach` for parallel¶

Optimization 12 — Avoid `Optional.of` when null is normal¶