Iterator — Optimize¶

Source: refactoring.guru/design-patterns/iterator

Each section presents an Iterator that works but is wasteful. Profile, optimize, measure.

Table of Contents¶

Optimization 1: Stream rows from DB instead of loading all
Optimization 2: Lazy generators over eager lists
Optimization 3: Indexed loop over Iterator for primitives
Optimization 4: Batch processing for I/O iterators
Optimization 5: Cursor pagination over offset
Optimization 6: Custom Spliterator for parallel stream
Optimization 7: Replace LinkedList iteration with array-backed list
Optimization 8: Backpressure with request(n)
Optimization 9: Drop unnecessary stream allocation
Optimization 10: Vectorized batch iteration
Optimization Tips

Optimization 1: Stream rows from DB instead of loading all¶

Before¶

List<Order> all = jdbcTemplate.query("SELECT * FROM orders", rowMapper);   // 10M rows → OOM
all.forEach(this::process);

OOM on big tables.

After¶

try (Stream<Order> stream = jdbcTemplate.queryForStream("SELECT * FROM orders", rowMapper)) {
    stream.forEach(this::process);
}

Or with raw JDBC:

ps.setFetchSize(1000);
try (ResultSet rs = ps.executeQuery()) {
    while (rs.next()) process(mapRow(rs));
}

Measurement. Memory: O(N) → O(1). Latency to first item: huge → small.

Lesson: Streaming Iterators bound memory regardless of dataset size. Always for big result sets.

Optimization 2: Lazy generators over eager lists¶

Before¶

def big_data():
    return [process(x) for x in range(10_000_000)]   # eager; 10M items in memory

for x in big_data():
    if condition(x): break

Allocates 10M items even though we may break early.

After¶

def big_data():
    for x in range(10_000_000):
        yield process(x)

for x in big_data():
    if condition(x): break   # stops processing immediately

Measurement. Memory: O(N) → O(1). Total work: huge → minimal (break short-circuits).

Lesson: When the consumer might short-circuit (break, take, takeWhile), generators win.

Optimization 3: Indexed loop over Iterator for primitives¶

Before¶

List<Integer> nums = ...;
long sum = 0;
for (int n : nums) sum += n;   // boxed Integer iteration

Boxing per element.

After¶

int[] nums = ...;
long sum = 0;
for (int i = 0; i < nums.length; i++) sum += nums[i];

Or IntStream:

long sum = IntStream.of(nums).sum();

Measurement. ~5-10× speedup for primitive math. Allocation-free.

Lesson: For primitive math, use primitive arrays / streams (IntStream, LongStream). Boxed Iterator<Integer> allocates.

Optimization 4: Batch processing for I/O iterators¶

Before¶

for (Order o : ordersIterator) {
    db.save(o);   // one INSERT per call
}

Round-trip per row.

After¶

List<Order> batch = new ArrayList<>(100);
for (Order o : ordersIterator) {
    batch.add(o);
    if (batch.size() == 100) {
        db.saveBatch(batch);
        batch.clear();
    }
}
if (!batch.isEmpty()) db.saveBatch(batch);

Or with utility:

import com.google.common.collect.Iterators;
Iterators.partition(ordersIterator, 100)
    .forEachRemaining(db::saveBatch);

Measurement. Throughput rises ~10-100× (depending on per-call latency).

Lesson: I/O-heavy iteration benefits from batching. Group items; do I/O once per batch.

Optimization 5: Cursor pagination over offset¶

Before¶

SELECT * FROM events ORDER BY id LIMIT 100 OFFSET 1000000;

Postgres scans 1M rows to skip them. Each successive page is slower.

After¶

SELECT * FROM events WHERE id > $last_id ORDER BY id LIMIT 100;

Server uses the index; O(log N + page_size) regardless of position.

Measurement. Page latency stays constant. At deep offsets, ~100× speedup.

Lesson: Cursor pagination is the default for any growing dataset. Offset is acceptable only for tiny tables.

Optimization 6: Custom Spliterator for parallel stream¶

Before¶

List<Long> data = LongStream.range(0, 10_000_000).boxed().collect(toList());
long sum = data.stream().mapToLong(Long::longValue).sum();   // serial

One core; ~150ms.

After¶

long sum = LongStream.range(0, 10_000_000).parallel().sum();   // built-in
// or with custom spliterator:
long sum = StreamSupport.longStream(new RangeSpliterator(0, 10_000_000), true).sum();

Measurement. ~min(cores, work / overhead) speedup. ~30 ms on 8-core.

Lesson: parallel() over splittable sources scales with cores. Custom Spliterator for non-standard data structures.

Optimization 7: Replace LinkedList iteration with array-backed list¶

Before¶

LinkedList<Integer> list = new LinkedList<>();
for (Integer x : list) sum += x;   // pointer chasing; cache miss per node

Cache-hostile.

After¶

ArrayList<Integer> list = new ArrayList<>();
for (Integer x : list) sum += x;   // sequential memory; cache-friendly

Measurement. ~10-100× speedup on iteration-heavy workloads (varies by element size and cache state).

Lesson: For iteration, array-backed structures dominate. LinkedList is rarely the right choice in Java — name is misleading.

Optimization 8: Backpressure with `request(n)`¶

Before¶

publisher.subscribe(new BaseSubscriber<>() {
    public void hookOnSubscribe(Subscription s) { s.request(Long.MAX_VALUE); }
    public void hookOnNext(Item i) { slowProcess(i); }
});

Publisher emits as fast as possible; slow processor's queue grows; OOM.

After¶

publisher.subscribe(new BaseSubscriber<>() {
    public void hookOnSubscribe(Subscription s) { s.request(10); }
    public void hookOnNext(Item i) { slowProcess(i); request(1); }
});

Measurement. Memory bounded; throughput now matches the slow processor's rate.

Lesson: Reactive Streams' explicit request(n) is backpressure. Use it; don't request Long.MAX_VALUE for a slow subscriber.

Optimization 9: Drop unnecessary stream allocation¶

Before¶

List<String> strings = ...;
boolean any = strings.stream().anyMatch(s -> s.contains("foo"));

For small lists, the stream allocation costs more than the loop.

After¶

boolean any = false;
for (String s : strings) {
    if (s.contains("foo")) { any = true; break; }
}

Measurement. Microseconds saved per call. For tight inner loops over small lists, observable.

Lesson: Streams aren't free. For small or hot paths, plain loops can be faster. Don't over-stream.

Optimization 10: Vectorized batch iteration¶

Before (row-at-a-time)¶

for (Row r : rows) {
    if (r.amount > 100) result.add(r);
}

Per-row overhead; cache miss on each row's columns.

After (batch / columnar)¶

Iterator<Batch> batches = source.batches(1024);
while (batches.hasNext()) {
    Batch b = batches.next();
    boolean[] mask = new boolean[b.size()];
    for (int i = 0; i < b.size(); i++) mask[i] = b.amount[i] > 100;   // SIMD-friendly
    // Use mask to copy selected rows.
}

Measurement. ~10× throughput for analytical workloads. Apache Arrow / DuckDB / ClickHouse use this internally.

Lesson: For analytical data processing, batch / columnar iteration beats row-by-row. Modern columnar engines are built around it.

Optimization Tips¶

Stream large datasets. Bounded memory regardless of size.
Lazy iteration when consumer might short-circuit. Avoids wasted computation.
Indexed loops for primitives. Avoid boxing.
Batch I/O. Group calls; massive speedup.
Cursor pagination over offset. Stable, fast, predictable.
Parallel streams for splittable, large sources. Scale with cores.
Array-backed lists for iteration. Cache-friendly.
request(n) for reactive backpressure. Match consumer rate to publisher.
Streams aren't free. For tight, small loops, plain for may be faster.
Batch / columnar for analytics. Per-row dispatch dominates.
Profile. Iteration cost is rarely the issue; usually it's the body.

← Find Bug · Behavioral patterns home

Iterator — Optimize¶

Table of Contents¶

Optimization 1: Stream rows from DB instead of loading all¶

Before¶

After¶

Optimization 2: Lazy generators over eager lists¶

Before¶

After¶

Optimization 3: Indexed loop over Iterator for primitives¶

Before¶

After¶

Optimization 4: Batch processing for I/O iterators¶

Before¶

After¶

Optimization 5: Cursor pagination over offset¶

Before¶

After¶

Optimization 6: Custom Spliterator for parallel stream¶

Before¶

After¶

Optimization 7: Replace LinkedList iteration with array-backed list¶

Before¶

After¶

Optimization 8: Backpressure with request(n)¶

Before¶

After¶

Optimization 9: Drop unnecessary stream allocation¶

Before¶

After¶

Optimization 10: Vectorized batch iteration¶

Before (row-at-a-time)¶

After (batch / columnar)¶

Optimization Tips¶

Optimization 8: Backpressure with `request(n)`¶