Skip to content

Organizing Data — Optimize

12 cases where the refactor is correct but the perf cost is real.


Optimize 1 — Encapsulate Collection allocates per call (Java)

public List<Order> orders() { return List.copyOf(orders); }

For 10K req/s, each calling orders() once: 10K list copies/s.

Cost & Fix Allocates a new immutable list per call. For ~50-element lists, ~400 bytes/call → 4MB/s of garbage. **Fix:** Return an unmodifiable view (no copy):
public List<Order> orders() { return Collections.unmodifiableList(orders); }
Or expose a Stream:
public Stream<Order> orders() { return orders.stream(); }
Caller doesn't get O(N) copy; just iterates lazily. Mutating the underlying isn't possible during iteration without exception.

Optimize 2 — Replace Data Value with Object on a hot path (Java)

class Transaction {
    private Money amount;   // was double
}
class Money {
    private final double value;
    private final Currency currency;
}

In a batch processing job, 10M transactions per minute.

Cost & Fix Each `Money` is a heap object: header + 2 fields (~24 bytes). 10M Money objects per minute → ~240 MB/min. If escape analysis fails (e.g., Money stored in fields, returned, etc.), GC pressure rises. **Fix options:** 1. For batch / numerical hot paths: keep primitives (`double amount; Currency currency`). 2. Use a flyweight `Currency` (interned, singletons per code) — already common. 3. Wait for Project Valhalla's value classes. For domain code (one Money per request), the cost is invisible.

Optimize 3 — Replace Type Code with Class breaks switch optimization (Java)

enum Status { ACTIVE, INACTIVE, PENDING }
switch (status) {
    case ACTIVE -> ...;
    case INACTIVE -> ...;
    case PENDING -> ...;
}

vs. the old int status with switch(status).

Cost & Fix Both compile to `tableswitch` — same dispatch cost. **No regression.** ✓ The myth "enums are slower than int switch" is from C-era thinking; in modern JVMs, enums compile to fast switch. For very hot paths over millions of items, the enum's heap allocation matters slightly (each value reachable through a static reference, but no per-item allocation), but switch dispatch itself is identical. **No fix needed.** Don't avoid enums for performance.

Optimize 4 — Replace Magic Number adds runtime lookup (Python)

TAX_RATE = 0.07
def total(amount): return amount * (1 + TAX_RATE)

vs. old def total(amount): return amount * 1.07.

Cost & Fix CPython looks up `TAX_RATE` in the module's namespace per call — small overhead. PyPy / Cython optimize this away. For ~1M calls/sec in tight loops, this can show. **Fix:** Bind to local in hot loop:
_TAX_RATE = 0.07
def total_many(amounts):
    rate = _TAX_RATE     # local lookup is faster than module lookup
    return [a * (1 + rate) for a in amounts]
Or compile the constant at function-bind time:
def total(amount, _r=0.07): return amount * (1 + _r)
For most code: irrelevant. For numerical hot loops: convert to NumPy vectorized op.

Optimize 5 — Encapsulate Field's accessor not inlined by JIT (Java)

class Account {
    private double balance;
    public double balance() { return balance; }   // ✓ free in steady state
}

Question: when isn't this free?

Cost & Fix Cost cases: - `balance()` is overridden in many subclasses (megamorphic) → inline cache costly. - The class is huge and `balance()` happens to be in the cold portion that doesn't compile. - Reflection-based access (Spring DI proxies) wraps the call. Mitigations: - Mark `balance()` `final` if not meant for override. - For hot fields, expose a direct package-private field for internal use, public accessor for outside. Generally: 99% of time, no fix needed. Encapsulate Field's runtime cost is zero.

Optimize 6 — Replace Array with Object adds per-instance memory (Java)

String[] row = {"Alice", "Eng", "30"};   // ~40 bytes (3 refs + array header)

vs.

record Employee(String name, String title, int age) {}   // header + 2 refs + int = ~24 bytes

For 10M instances: - Array form: 400 MB. - Record form: 240 MB.

Records win.

Cost & Fix Records are typically *better* than arrays for memory. Type safety + comparable footprint. The exception: when you have hundreds of fields and the record allocates more. Then column-store / NumPy / pandas / `Arrow` is the better choice for analytical workloads. **Fix:** Use records by default. For columnar data, use proper columnar storage.

Optimize 7 — Bidirectional with serialization loop (Java + Jackson)

class Customer { @JsonManagedReference List<Order> orders; }
class Order { @JsonBackReference Customer customer; }
Cost & Fix Without the annotations, JSON serialization recurses infinitely → stack overflow. With them: Customer serializes orders; orders' customer is suppressed. **Correct, but the JSON omits customer reference on orders.** For an API consumer that needs `order.customer.id`, the form is wrong. **Fix:** Use DTOs for serialization. Don't expose entities directly.
class CustomerDto {
    String id;
    String name;
    List<OrderDto> orders;
}
class OrderDto {
    String id;
    String customerId;   // just the id, not the full customer
    Money total;
}
Lesson: Bidirectional + entity serialization is a perf footgun. Always project to DTOs at the boundary.

Optimize 8 — Encapsulate Collection with concurrent mutation (Java)

class Customer {
    private final List<Order> orders = new ArrayList<>();
    public List<Order> orders() { return Collections.unmodifiableList(orders); }
    public synchronized void add(Order o) { orders.add(o); }
}

// Caller:
for (Order o : customer.orders()) { ... }   // iteration
// Concurrently: anotherThread.add(...)
Cost & Fix Iteration over the unmodifiable view while another thread mutates the underlying list → `ConcurrentModificationException`. **Fix options:** 1. **`CopyOnWriteArrayList`** — write is O(N), read is concurrent and snapshot-stable. 2. **Snapshot copy on read:**
public List<Order> orders() { synchronized(this) { return new ArrayList<>(orders); } }
3. **Immutable collections (Guava `ImmutableList`):** copy-on-add, share-on-read. Choose by read/write ratio. For read-mostly: CopyOnWriteArrayList. For write-heavy: snapshot copies.

Optimize 9 — Replace Type Code with State allocates per transition (Java)

class Order {
    private OrderStatus status;
    public void submit() { status = new SubmittedStatus(); }
    public void ship() { status = new ShippedStatus(); }
}
Cost & Fix Each transition allocates a new state object. For long-lived orders, this is fine. For a high-throughput system with millions of state transitions per minute, it adds GC pressure. **Fix:** Use singletons for stateless states.
class Order {
    private OrderStatus status;
    public void submit() { status = SubmittedStatus.INSTANCE; }
}
class SubmittedStatus implements OrderStatus {
    static final SubmittedStatus INSTANCE = new SubmittedStatus();
    private SubmittedStatus() {}
    ...
}
Or use enum-implements-interface pattern:
enum OrderStatus implements Status {
    DRAFT { public void cancel() { ... } },
    SHIPPED { public void cancel() { throw ... } };
    public abstract void cancel();
}
Each enum value is a singleton. Zero per-transition allocation.

Optimize 10 — Encapsulate Field on a Python class without slots (Python)

class Account:
    def __init__(self, balance):
        self._balance = balance

    @property
    def balance(self):
        return self._balance

Each instance has a __dict__ (~280 bytes overhead).

Cost & Fix For 10M instances, that's 2.8 GB just in dict overhead. **Fix:**
class Account:
    __slots__ = ("_balance",)
    def __init__(self, balance):
        self._balance = balance

    @property
    def balance(self):
        return self._balance
Or `@dataclass(slots=True, frozen=True)`. Reduces per-instance overhead to ~50 bytes. For domain code with one instance per request: irrelevant. For batches / pipelines: critical.

Optimize 11 — Replace Reference to Value triggers expensive equals (Java)

public record Customer(String id, String name, String email, Address address) {}

Records auto-generate equals based on all fields. Address has its own auto-generated equals based on its fields.

HashMap<Customer, X> map = ...;
map.get(someCustomer);   // calls equals → walks all fields recursively

For complex nested values, equality is O(total field count).

Cost & Fix For lookups by Customer id, the entire address is compared. Wasteful. **Fix:** Cache hashCode in a final field, or use the id as map key:
HashMap<String, Customer> byId = ...;
byId.get(customer.id());
Or implement equals based on id only, with a documented warning that the record's "equals" is *not* the id-based one:
public record Customer(String id, String name, ...) {
    public static EqualBy idEquals = ...;
}
This is one place where Java records' default behavior may not be what you want; lean on the type system to enforce id-based comparison externally.

Optimize 12 — Magic Number Constant in JS / TypeScript (TypeScript)

const TAX_RATE = 0.07;
function total(x: number): number { return x * (1 + TAX_RATE); }

V8 inlines TAX_RATE as long as it's const and not exported (or if the bundler does dead-code elimination).

Cost & Fix For module-level `const`, V8 compiles to a constant load. In TypeScript, this is generally as fast as `0.07` literal. For `export const`, V8 must re-resolve through the module exports table — slower in tight loops. **Fix:** for hot paths, pin the constant locally:
function makeCalculator(rate = TAX_RATE) {
  return (x: number) => x * (1 + rate);
}
Or just inline. JIT optimizers handle most of this; only matters in extreme hot paths.

Patterns

Refactor Risk
Encapsulate Collection (copy) Per-call alloc
Replace Data Value with Object Per-instance alloc + header
Encapsulate Field Almost free
Replace Type Code with State Per-transition alloc — use singletons
Bidirectional + JSON Stack overflow / wrong shape
ConcurrentMod over views Need CoW or snapshot
Records with deep equals Lookup cost
Python without slots Per-instance overhead

Next