Generics & Types — Optimize & Reconcile¶
Types are a contract checked by the compiler. The question is always: what survives to runtime, and what does the compiler pay to verify it? In TypeScript the answer is "nothing survives, but
tsccan choke." In Java it's "erasure survives asObjectplus autoboxing." In Go it's "monomorphization or a GC-shape dictionary." In Rust/C# it's "a fresh machine-code copy per type." Each model trades runtime cost, binary size, and build time differently. This file works through 12 scenarios where a type-safety decision has a measurable performance or build-cost consequence — and resolves each on principle, with numbers.
Table of Contents¶
List<Integer>autoboxing vsint[](Java)- Go generics: monomorphization vs GC-shape dictionary dispatch
- Rust/C# monomorphization code bloat
- TypeScript types are erased — but the compiler is not free
- Conditional/union type explosion and instantiation-depth limits (TS)
- Branded/nominal types are zero-cost at runtime (TS)
- Runtime validation (zod/io-ts) at boundaries vs trusting the types
- Specialized primitive collections — fastutil/Eclipse Collections (Java)
- Generic method dispatch: virtual vs monomorphic (Go interface vs type param)
- mypy / tsc CI time at scale
- Python generics are documentation —
TypeVarcosts nothing at runtime Optional<T>and wrapper allocation in hot loops (Java)- Rules of Thumb
- Related Topics
Scenario 1 — List<Integer> autoboxing vs int[] (Java)¶
You have a numeric-heavy path. Clean-code instinct says "use List<Integer>, it's generic and composes with the Collections API." Then a profile shows the loop is 4–6× slower than it should be and the heap is full of boxed integers.
// "Clean" generic version
List<Integer> values = new ArrayList<>();
for (int i = 0; i < 10_000_000; i++) values.add(i); // autobox: int -> Integer
long sum = 0;
for (int v : values) sum += v; // unbox: Integer -> int
// Primitive version
int[] values = new int[10_000_000];
for (int i = 0; i < values.length; i++) values[i] = i;
long sum = 0;
for (int v : values) sum += v;
Resolution
**Measurement.** An `Integer` on a 64-bit JVM with compressed oops is **16 bytes** (12-byte object header + 4-byte `int` + padding). `ArrayListScenario 2 — Go generics: monomorphization vs GC-shape dictionary dispatch¶
You wrote a generic Map[T, U any] helper. Coming from C++/Rust you assume Go fully monomorphizes (one specialized copy per type), so the call is as fast as a hand-written loop. A microbenchmark shows the generic version is slower than the non-generic equivalent for pointer/interface element types, and the disassembly shows an extra register holding a "dictionary."
func Map[T, U any](in []T, f func(T) U) []U {
out := make([]U, len(in))
for i, v := range in {
out[i] = f(v)
}
return out
}
Resolution
**How Go actually compiles generics.** Go does **not** monomorphize per concrete type. It monomorphizes per **GC shape** — a coarse equivalence class. All pointer-shaped types (`*T`, `[]T` headers, `string`, `interface`) share **one** instantiation; distinct value types of distinct sizes/layouts get their own. Inside a GC-shape instantiation, the function receives a hidden **dictionary** argument carrying the per-type metadata (method tables, type descriptors) it can't know statically. **The cost.** For pointer-shaped `T`, the dictionary means: - An extra implicit argument threaded through every generic call (register pressure, slightly larger stack frames). - Method calls on a type-parameter value go through the dictionary's itable — effectively an **indirect call**, not inlined or devirtualized. This is similar in cost to calling through an interface: roughly **a few ns per call** and, crucially, **not inlinable**, which blocks downstream optimizations. Measured: for trivial `f` over pointer types, generic `Map` can be **10–30% slower** than a non-generic hand-rolled loop because the closure call plus dictionary overhead dominates. For `int`/`float64` element types (distinct GC shapes), the body gets a more specialized instantiation and the gap narrows or vanishes. **Principled resolution.** - **Do not assume Go generics are zero-cost.** They are cheap, but for pointer-shaped types they behave closer to interface dispatch than to C++ templates. Verify with `go build -gcflags='-m'` (inlining decisions) and `go test -bench` + `-benchmem`. - For genuinely hot inner loops over a *single* concrete type, a **non-generic specialized function** can still win — and it's perfectly clean to keep a generic public API plus a specialized fast path. - The dictionary cost is real but small; the bigger trap is the **non-inlinable closure** `f`. If `f` is a hot, simple transform, a monomorphic loop with the transform inlined beats any generic abstraction. Reach for that only when a profile points at this exact loop. The type safety of `Map[T, U]` is excellent and the dispatch cost is usually invisible. Know the model so you don't mis-attribute a regression.Scenario 3 — Rust/C# monomorphization code bloat¶
The opposite of Go: Rust and C# (for value-type generics) fully monomorphize. Your generic data structure is instantiated with 40 different concrete types across the codebase, and now your release binary is megabytes larger and the build takes noticeably longer.
// One generic definition...
pub fn process<T: Serialize + Clone>(items: &[T]) -> Vec<u8> { /* ~300 instrs */ }
// ...instantiated 40 times across the crate graph:
process::<User>(&users);
process::<Order>(&orders);
process::<Invoice>(&invoices);
// ... 37 more
Resolution
**The mechanism.** Rust generates a **separate compiled copy** of `process` for every distinct `T`. Forty instantiations of a 300-instruction function = forty 300-instruction functions in the binary, each optimized independently. C# does the same for **value-type** generic instantiations (`Listpub fn process<T: Serialize + Clone>(items: &[T]) -> Vec<u8> {
// tiny generic shim: converts to a uniform representation, then calls the
// single monomorphic worker. Only this shim is duplicated per T.
let bytes: Vec<Box<dyn erased_serde::Serialize>> = /* ... */;
process_erased(&bytes) // one shared copy does the real work
}
fn process_erased(items: &[Box<dyn erased_serde::Serialize>]) -> Vec<u8> { /* ~300 instrs, ONE copy */ }
Scenario 4 — TypeScript types are erased — but the compiler is not free¶
A teammate worries that the elaborate generic types in the codebase will "slow down the app." They want to delete them for performance.
type DeepReadonly<T> = T extends (infer E)[]
? readonly DeepReadonly<E>[]
: T extends object
? { readonly [K in keyof T]: DeepReadonly<T[K]> }
: T;
function freeze<T>(x: T): DeepReadonly<T> { /* ... */ return x as DeepReadonly<T>; }
Resolution
**Runtime cost: exactly zero.** TypeScript types are **fully erased** by the compiler. `tsc` (and Babel/esbuild/swc in transpile-only mode) strip every type annotation, interface, `type` alias, generic parameter, and `as` cast before emitting JavaScript. `DeepReadonlyScenario 5 — Conditional/union type explosion and instantiation-depth limits (TS)¶
A "clever" utility type works on small inputs, then tsc either takes 40 seconds on one file or fails outright with TS2589.
// Builds a union of every dotted path through an object — quadratic-to-exponential.
type Paths<T, Prev extends string = ""> = {
[K in keyof T]: T[K] extends object
? Paths<T[K], `${Prev}${K & string}.`> | `${Prev}${K & string}`
: `${Prev}${K & string}`;
}[keyof T];
type AllPaths = Paths<EntireApplicationConfig>; // 200-field nested config
Resolution
**Why it explodes.** Each level of nesting multiplies the union. A config with `b` branching factor and `d` depth yields on the order of `b^d` member strings. The checker must materialize and dedupe this union; union types over a few thousand members make every assignability check that touches `AllPaths` slow, because assignability against a union is **per-member**. Template-literal types compound it — concatenating a 1,000-member union with a 50-member union is a 50,000-member cross product. **The hard ceiling.** TypeScript caps recursive type instantiation depth (historically around **50 levels** for the recursion guard, with a separate ~**100M** instantiation-count circuit breaker) and emits `TS2589: Type instantiation is excessively deep and possibly infinite`. This is a safety valve, not a bug — without it the checker could loop forever. **Measurement.** Use `tsc --extendedDiagnostics`: watch `Instantiations` and `Check time`. A healthy project is in the low millions of instantiations; a single explosive type can push it to tens of millions and add **5–40 s** to a previously sub-second check. `--generateTrace` will show `Paths` dominating the flame chart. **Principled resolution.** - **Cap the recursion explicitly.** Add a depth counter as a tuple and bail at a fixed level: - **Prefer a generated concrete type** for huge fixed shapes. If `AllPaths` is over a *known* config, generate the literal union at build time (codegen) instead of computing it in the type system on every check. Codegen moves the cost off the hot `tsc` path entirely. - **Narrow the input.** Computing paths over a 200-field config is almost always more than callers need; compute paths over the specific subtree a function touches. - If the type is fundamentally needed, **memoize via intermediate named aliases** so the checker caches instantiations rather than recomputing them at each use site. Type safety is achievable here; the unbounded version is just an O(b^d) algorithm running inside the type-checker. Bound it like any other algorithm.Scenario 6 — Branded/nominal types are zero-cost at runtime (TS)¶
You want to stop mixing up UserId and OrderId (both string) without paying for wrapper objects in a hot path that processes millions of IDs.
// Object-wrapper approach — allocates
class UserId { constructor(public readonly value: string) {} }
class OrderId { constructor(public readonly value: string) {} }
// Branded approach — zero allocation
type Brand<T, B extends string> = T & { readonly __brand: B };
type UserId = Brand<string, "UserId">;
type OrderId = Brand<string, "OrderId">;
const asUserId = (s: string): UserId => s as UserId; // erased cast, no runtime op
Resolution
**The runtime difference.** The `class` version creates a real object per ID: a heap allocation, a property access (`.value`) on every use, and GC pressure. Over 10M IDs that's 10M allocations. The **branded** type `BrandScenario 7 — Runtime validation (zod/io-ts) at boundaries vs trusting the types¶
A service parses every incoming request body with zod, and also re-validates the same objects on internal function calls "to be safe." A profile shows validation is a top CPU consumer.
const User = z.object({
id: z.string().uuid(),
email: z.string().email(),
roles: z.array(z.enum(["admin", "user", "guest"])),
createdAt: z.string().datetime(),
});
type User = z.infer<typeof User>;
// At the boundary — correct:
const user = User.parse(req.body);
// Deep in a hot internal loop — wasteful:
function score(u: unknown): number {
const user = User.parse(u); // re-validating already-validated data
return /* ... */;
}
Resolution
**Why this is needed at all.** TypeScript types are erased (Scenario 4), so `req.body as User` is a **lie** — nothing checks it at runtime. zod/io-ts/valibot exist to turn an untrusted `unknown` into a value whose shape is *actually verified*, then hand you a statically-typed result via `z.infer`. This is the correct way to bridge the erased-types gap at a trust boundary. **The cost, with numbers.** Runtime validation is real work: per field it does type checks, regex (`.email()`, `.uuid()`, `.datetime()`), array iteration, and object construction. Benchmarks routinely show schema validators processing on the order of **hundreds of thousands to a few million simple objects/sec**, dropping sharply with regex-heavy or deeply nested schemas. For a complex object with email+uuid+datetime regexes, a single `parse` can cost **microseconds to tens of microseconds**. Multiply by request volume and by *redundant* internal re-validation and it becomes a measurable slice of CPU. (valibot and `@sinclair/typebox` are markedly faster than zod for the same schema, often several-fold, because of tree-shakable, lower-overhead designs.) **Principled resolution — validate at the boundary, then trust the type.** - **Validate exactly once, at the edge** (HTTP handler, queue consumer, DB read of untrusted data). After `User.parse(req.body)` succeeds, the value *is* a `User`; downstream code takes `User`, not `unknown`, and **must not re-validate**. The boundary is the only place the data was untrusted. - This is the [Parse, Don't Validate](https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/) discipline: parsing at the boundary produces a value whose *type* now encodes the invariant, so internal functions get safety from the type system at **zero runtime cost**. - Use `.parse` for unknown input; use `.safeParse` to avoid exception cost on expected-invalid paths; choose a faster validator (typebox/valibot) when the boundary is genuinely hot. - For ultra-hot ingestion, consider **compiled validators** (typebox compiles a schema to a specialized JS function via JIT-friendly code) which can be **an order of magnitude faster** than interpreted schema walking. The reconciliation: runtime validation is the *price of erasure*, and it is worth paying — **once, at the boundary**. Everywhere inside that boundary, the erased type is free and you trust it. Re-validating internally pays the price repeatedly for a guarantee you already have.Scenario 8 — Specialized primitive collections — fastutil/Eclipse Collections (Java)¶
A graph algorithm keeps adjacency lists as Map<Integer, List<Integer>>. It's correct and generic, but for 50M edges it's slow and uses several GB.
Map<Integer, List<Integer>> adjacency = new HashMap<>(); // boxes every vertex id
adjacency.computeIfAbsent(u, k -> new ArrayList<>()).add(v);
Resolution
**The boxing tax, compounded.** Every key and every list element is a boxed `Integer` (16 bytes + reference; Scenario 1). `HashMap.Node` itself is an object (~32–48 bytes) per entry. For 50M edges across millions of vertices you pay boxing on keys, boxing on every neighbor, node objects, and ArrayList headers — easily **2–4× the memory** of a primitive representation and far worse cache behavior, because every lookup chases references to scattered boxed objects. **Specialized collections.** Libraries like **fastutil** and **Eclipse Collections** provide `Int2ObjectOpenHashMap`, `IntArrayList`, `IntOpenHashSet`, etc., backed by **primitive arrays** with open-addressing (no per-entry node objects, no boxing):import it.unimi.dsi.fastutil.ints.*;
Int2ObjectMap<IntArrayList> adjacency = new Int2ObjectOpenHashMap<>();
adjacency.computeIfAbsent(u, k -> new IntArrayList()).add(v); // no boxing, contiguous storage
Scenario 9 — Generic method dispatch: virtual vs monomorphic (Go interface vs type param)¶
You need a Sum over anything addable. Two clean designs: an interface (interface{ Add(x) }) or a type-parameter constraint ([T Number]). The interface version is slower in a tight loop.
// Interface-based — dynamic dispatch per call
type Adder interface{ Value() float64 }
func SumIface(xs []Adder) float64 {
var s float64
for _, x := range xs { s += x.Value() } // virtual call, not inlined
return s
}
// Type-parameter constrained — specialized by GC shape
type Number interface{ ~int | ~int64 | ~float64 }
func Sum[T Number](xs []T) T {
var s T
for _, x := range xs { s += x } // direct add, inlinable for value shapes
return s
}
Resolution
**The dispatch difference.** The interface version makes an **indirect (virtual) call** through the itable for every element — the compiler can't inline `Value()` because it doesn't know the concrete type, and indirect calls also defeat branch prediction in tight loops. The generic `Sum[T Number]` over a concrete value type (`int`, `float64`) gets a GC-shape instantiation where `s += x` is a **direct primitive add** with **no dispatch and full inlining**. **Numbers.** For a trivial per-element operation, eliminating the virtual call and enabling inlining typically yields **2–10×** on the loop, because the operation itself is one instruction and the call overhead dominated. Verify with `go test -bench -benchmem` and inspect inlining via `-gcflags='-m'`. Note the nuance from Scenario 2: if `T` is **pointer-shaped**, the generic version routes through a dictionary and the advantage over an interface shrinks — generics help most for **distinct value shapes**. **Principled resolution.** - For **homogeneous numeric/value loops**, prefer the **type-parameter constraint** — it monomorphizes to direct, inlinable code and is also more precise (`Sum[int]` returns `int`, not `interface{}`). - For **genuinely heterogeneous** collections (mixed concrete types behind one behavior), an interface is the correct *and* idiomatic tool — the dynamic dispatch is the feature, not a bug. Don't contort generics to avoid an interface that models the domain. - Don't reach for either until the loop is shown hot. For a 100-element slice the dispatch cost is noise. Same principle as Scenario 2/3: generics buy speed when they let the compiler specialize and inline; they buy nothing (or cost a dictionary) when the type is uniform-pointer-shaped.Scenario 10 — mypy / tsc CI time at scale¶
Type checking is correctness insurance, but on a large monorepo tsc --noEmit and mypy each take 6–12 minutes and gate every PR. Engineers start typing # type: ignore and any to "make CI faster."
Resolution
**Reframe: this is a build-throughput problem, not a reason to abandon types.** The runtime is unaffected either way (erasure). Slow type-checking is fixable with caching and incrementality, not by deleting types. **TypeScript.** - **`tsc --incremental`** (and project references / `--build` mode) persists a `.tsbuildinfo` so unchanged projects are skipped. On large repos this turns a cold 10-minute check into a warm **tens-of-seconds** check. - Split the monorepo into **TypeScript project references**; each package type-checks independently and caches independently, so a one-line change rechecks one project, not the world. - Diagnose with `tsc --extendedDiagnostics` (look at `Check time`, `Instantiations`, `Memory used`) and `--generateTrace` to find the expensive types (Scenario 5). One pathological generic can dominate the whole check. - `skipLibCheck: true` skips checking `.d.ts` in `node_modules`, often a large, safe time saver. **mypy.** - Enable the **incremental cache** (on by default) and persist `.mypy_cache` between CI runs — cold vs warm mypy is frequently a **5–10×** difference. - For very large codebases, **`mypy --daemon` (`dmypy`)** keeps an in-memory state and re-checks only changed files in **seconds**; pyright (the engine behind Pylance, in Node) is also substantially faster on many large codebases. - Type-check **per package** in parallel CI jobs rather than one monolithic invocation. **Numbers.** Realistic outcomes: incremental + caching commonly cuts a multi-minute check to **under a minute** warm; daemon mode brings interactive checks to **single-digit seconds**. The fix is engineering the build, not weakening the types. **Principled resolution.** Never trade *type safety* for *CI seconds* — `any`/`# type: ignore` move the cost from a fast machine to a slow incident. Instead: enable incremental caching, use project references / daemon, parallelize per package, and fix the handful of pathological types that dominate the trace. The point of the checker is to be the cheap place a bug dies; keep it fast so the team keeps it on.Scenario 11 — Python generics are documentation — TypeVar costs nothing at runtime¶
A team hesitates to add TypeVar/Generic[T]/list[int] annotations to a hot numerical service, fearing runtime overhead.
from typing import TypeVar, Generic
T = TypeVar("T")
class Stack(Generic[T]):
def __init__(self) -> None:
self._items: list[T] = []
def push(self, x: T) -> None: self._items.append(x)
def pop(self) -> T: return self._items.pop()
Resolution
**Runtime cost: effectively zero for the annotations themselves.** Python type hints are **not enforced at runtime** by the interpreter. `list[T]`, `Generic[T]`, and `x: T` are checked only by **mypy/pyright** at lint time. At runtime `Stack[int]()` and `Stack()` behave identically — Python stores the parametrization on `__orig_class__` but does no checking. With `from __future__ import annotations` (or Python 3.14's deferred evaluation), annotations are stored as strings and never even evaluated unless something introspects them, so the import/definition cost is negligible. **The genuine micro-costs to know.** - Evaluating an annotation expression *at module load* (without lazy annotations) costs a little; `TypeVar`/`Generic` subscription builds small objects once at class-definition time, not per instance. This is a one-time, sub-millisecond cost — irrelevant to steady-state throughput. - `typing.get_type_hints()` and `@runtime_checkable` Protocols **do** cost at runtime if you actually call them in a hot path. Reflection-based dispatch (e.g., pydantic v1's per-field validation, `functools.singledispatch`) is where Python "type" machinery shows on a profile — and pydantic v2 moved its core to Rust precisely to cut that cost. **Principled resolution.** - **Annotate freely.** Static types in Python are documentation + a free correctness net via mypy; they don't slow the running program. The fear is misplaced. - If a profile shows type-related runtime cost, it is almost always **runtime validation/reflection** (pydantic, `dataclasses` with validators, `get_type_hints`), *not* the annotations — the same boundary-validation trade-off as Scenario 7. Validate at the edge, trust internally. - For numeric hot loops the real lever is representation (numpy arrays, not `list[int]`) — analogous to Java's `int[]` vs `ListScenario 12 — Optional<T> and wrapper allocation in hot loops (Java)¶
A clean API returns Optional<Customer> everywhere to avoid nulls. A profile of a 10M-iteration lookup loop shows Optional allocation as a measurable cost.
public Optional<Customer> find(long id) {
Customer c = cache.get(id);
return Optional.ofNullable(c); // allocates an Optional per call
}
// Hot loop:
for (long id : ids) { // 10M ids
find(id).ifPresent(this::process);
}
Resolution
**The cost.** Each `Optional.ofNullable` may allocate a wrapper object (`Optional.empty()` is a shared singleton, but every *present* value is a fresh `Optional`). Over 10M present lookups that's up to 10M short-lived allocations — minor-GC churn, even though escape analysis *can* scalarize an `Optional` that doesn't escape. The trouble: when the `Optional` is **returned** from `find`, it escapes the method, so EA often can't elide it, and the allocation is real. **Measurement.** For a tight loop dominated by trivial work, the per-call `Optional` allocation can add meaningful young-gen pressure and a measurable percentage to the loop; JMH plus `-prof gc` shows the allocation rate. If EA *does* scalarize (when `find` is inlined into the loop and the `Optional` provably doesn't escape), the cost vanishes — so always measure before reacting. **Principled resolution.** - **Keep `Optional` as the public API contract.** It's the right design: it makes absence explicit in the type, killing a class of NPEs. The chapter's whole point is that this type-safety is worth it. - For a **proven-hot internal** path, drop to a representation that doesn't wrap: return the nullable reference directly (documented `@Nullable`), or use a primitive-specialized variant. For primitive results, `OptionalInt`/`OptionalLong`/`OptionalDouble` avoid the *additional* boxing that `OptionalRules of Thumb¶
- Types are usually free at runtime. In TypeScript and Python they are fully erased — deleting them for "app speed" gains nothing. Optimize the compiler/checker, not the program.
- The three real costs of generics are: boxing, monomorphization bloat, and compile time. Name which one you're facing before touching anything.
- Boxing is a representation problem, not a type problem.
List<Integer>is correct;int[]/ fastutil is the right representation for bulk numerics. Same forlist[int]vs numpy in Python. - Branded/nominal types give maximal compile-time safety at zero runtime cost. Prefer them over wrapper objects in hot paths when you only need identity, not behavior.
- Validate at the boundary, trust the type inside it. zod/pydantic pay the price of erasure once, at the edge. Re-validating internally re-pays for a guarantee you already hold.
- Go generics are not C++ templates. For pointer-shaped types they dispatch through a dictionary (interface-like cost); they win for distinct value shapes by inlining. Verify with
-gcflags='-m'. - Rust/C# monomorphization trades binary size and build time for speed. When many instantiations bloat the binary, outline the heavy body behind one erased worker. Measure with
cargo llvm-lines/cargo bloat. - Cap recursive types. A conditional/mapped type with unbounded depth is an O(b^d) algorithm inside the checker; bound it or generate the concrete type at build time. Heed
TS2589. - Never trade type safety for CI seconds. Fix slow checking with incremental caches, project references, and daemons (
dmypy,tsc --build) — not withanyand# type: ignore. - Always measure before optimizing a type-related cost. Escape analysis may already elide
Optional/Coord; the JIT may already specialize. Profile (JFR,-benchmem,--extendedDiagnostics) before trading clarity for speed.
Related Topics¶
- find-bug.md — locate the type-safety hole or perf regression in a generic implementation before optimizing it.
- professional.md — production practices for generic APIs, type design, and build-pipeline hygiene.
- Chapter README — the positive rules for generics and types.
- Objects and Data Structures — when to expose data vs. behavior, which decides whether a wrapper type carries methods (Scenario 6) or just identity.
- Functional Programming — parse-don't-validate, immutability, and totality, which underpin boundary validation (Scenario 7) and branded types (Scenario 6).
In this topic