Designing for Testability — Optimize & Reconcile¶

Testable design buys decoupling, fast feedback, and deterministic tests — and the recurring worry is that it costs runtime performance. It almost never does. Interface dispatch is a single-digit-nanosecond concern that the JIT or escape analysis usually erases; DI containers cost startup time, not steady-state throughput; and the real performance story runs the other direction — a testable design is what turns an 8-minute integration suite into a 40-second unit suite. This file reconciles the two. For each scenario: the design choice, the measurement (concrete numbers), and the principled resolution. The rule throughout: testable design is nearly free at runtime; optimize the rare measured hot-path indirection only, and never trade away suite speed to chase a nanosecond.

Table of Contents¶

Scenario 1 — Interface dispatch vs concrete call in a hot loop (Go)
Scenario 2 — Spring reflection startup vs compile-time DI (Java)
Scenario 3 — Pure functional core tests in microseconds (Python)
Scenario 4 — In-memory fake vs Testcontainers: 8 min → 40 s (Go)
Scenario 5 — The Humble Object keeps the slow boundary out of the fast suite (Java)
Scenario 6 — A seam adds an allocation in a hot path (Go)
Scenario 7 — Over-mocking costs nothing at runtime but rots the suite (Python)
Scenario 8 — Fake clock removes sleep-based flakiness and wall-clock waste (Go)
Scenario 9 — interface{} / dynamic dispatch defeats JIT monomorphism (Java)
Scenario 10 — God constructor does I/O, so every test pays for it (Python)
Scenario 11 — Function injection vs struct-of-interfaces in a tight path (Go)
Scenario 12 — Test parallelism unlocked by removing global state (Java)
Scenario 13 — unittest.mock import & patch overhead at collection time (Python)
Rules of Thumb
Related Topics

Scenario 1 — Interface dispatch vs concrete call in a hot loop (Go)¶

You introduce an interface so the dependency can be faked in tests. A reviewer objects: "interface calls are slower than concrete calls — this is in a loop that runs millions of times."

type Hasher interface {
    Sum(b []byte) uint64
}

type fnvHasher struct{}
func (fnvHasher) Sum(b []byte) uint64 { /* ... */ }

// Hot path — called per record in a 50M-row scan:
func bucketize(records [][]byte, h Hasher) []int {
    out := make([]int, len(records))
    for i, r := range records {
        out[i] = int(h.Sum(r) % 1024)   // interface dispatch per record
    }
    return out
}

Resolution

**Measurement first.** A Go interface call is an indirect call through the itab (a pointer load + an indirect `CALL`). On modern x86 with a warm branch-target predictor, that is roughly **1–2 ns** of overhead over a direct call, and the indirect branch is *monomorphic* here (only `fnvHasher` is ever passed at runtime), so the CPU predicts it perfectly. Benchmark it:

func BenchmarkConcrete(b *testing.B) { for i := 0; i < b.N; i++ { fnvHasher{}.Sum(data) } }
func BenchmarkIface(b *testing.B)    { var h Hasher = fnvHasher{}; for i := 0; i < b.N; i++ { h.Sum(data) } }

Typical result: `Concrete 3.1 ns/op`, `Iface 4.4 ns/op`. The hash itself dominates; the dispatch is ~1.3 ns. Over 50M records that is ~65 ms — once, per full scan — against a hash that already costs ~150 ms. The interface is **not** the bottleneck; memory bandwidth from streaming 50M rows is. **Principled resolution:** keep the interface. It is the seam that lets the test inject a deterministic stub hasher and assert bucket distribution without a real corpus. If profiling (`go test -bench -cpuprofile`, then `pprof`) ever proves dispatch is the top frame — it will not be here — provide a *concrete fast path beside* the injectable one rather than deleting the seam:

// Production calls the concrete version; tests call bucketizeWith(h).
func bucketize(records [][]byte) []int       { return bucketizeWith(records, fnvHasher{}) }
func bucketizeWith(records [][]byte, h Hasher) []int { /* loop */ }

Go's compiler can also devirtualize and inline `bucketizeWith(records, fnvHasher{})` when the concrete type is known at the call site (`-gcflags=-m` shows `devirtualizing h.Sum`). You keep testability and the inlined fast path simultaneously.

Scenario 2 — Spring reflection startup vs compile-time DI (Java)¶

Your service uses Spring with classpath scanning. Each test that boots @SpringBootTest takes ~6 s before the first assertion; a 300-test integration suite spends most of its wall time in container startup.

@SpringBootTest                      // boots the full ApplicationContext
class OrderServiceTest {
    @Autowired OrderService service;  // wired by reflection at runtime
    @Test void placesOrder() { /* ... */ }
}

Resolution

**Measurement.** Spring's runtime DI does classpath scanning + reflective bean instantiation + proxy generation. A medium service's context cold-starts in **3–8 s**; with 300 `@SpringBootTest` classes that don't share a context, you pay it repeatedly — easily **20–30 minutes** of pure startup. Steady-state request handling is unaffected (beans are wired once at boot), so the cost is *startup*, not throughput. Three levers, cheapest-impact first: 1. **Share the context.** Spring caches the `ApplicationContext` per unique configuration. Stop customizing config per test (no gratuitous `@MockBean` variations) so all tests reuse one cached context. This alone often cuts suite time by 5–10×. 2. **Slice it.** `@WebMvcTest` / `@DataJpaTest` boot a *sub-context* (~0.5–1 s) instead of the whole app. 3. **Don't boot Spring at all for unit tests.** This is the testability payoff: if `OrderService` takes its collaborators via *constructor injection*, you instantiate it with plain `new` and fakes — **0 ms of framework startup**, microsecond test:

class OrderServiceTest {
    OrderService service = new OrderService(new InMemoryOrders(), new FakeClock());
    @Test void placesOrder() { /* runs in < 1 ms, no context */ }
}

**Compile-time DI as the structural fix.** Frameworks that wire at compile time eliminate reflection startup entirely: **Dagger** (Java/Kotlin) generates `@Inject` factories at build time; **Go's `wire`** generates an `Injector` function. A Dagger graph that Spring would build reflectively in seconds is built in **microseconds** because it is just generated constructor calls — no scanning, no reflection. Quarkus/Micronaut apply the same idea to make native images start in ~tens of milliseconds. **Principle:** constructor injection makes the *unit* test framework-free (fastest possible); compile-time DI makes the *integration* boot cheap. Reflection-based DI is convenient but you pay for it in startup, repeatedly, in the test suite.

Scenario 3 — Pure functional core tests in microseconds (Python)¶

A pricing rule was tangled with the database read and the HTTP response. Tests required a live DB. You extract a pure function.

# Before: untestable without a DB and a request object.
def handle_quote(request, db):
    customer = db.fetch_customer(request.customer_id)
    discount = 0.0
    if customer.tier == "gold": discount = 0.15
    elif customer.tier == "silver": discount = 0.07
    if request.qty > 100: discount += 0.05
    db.log_quote(...)
    return Response(price=request.unit * request.qty * (1 - discount))

# After: pure core, no I/O.
def quote_price(unit: float, qty: int, tier: str) -> float:
    discount = {"gold": 0.15, "silver": 0.07}.get(tier, 0.0)
    if qty > 100: discount += 0.05
    return unit * qty * (1 - discount)

Resolution

**Measurement.** A test against the original path had to start a DB connection or a Testcontainers Postgres — **2–5 s** of setup per test class, plus per-test query latency (~1–10 ms). The pure-core test:

def test_gold_bulk():
    assert quote_price(10.0, 200, "gold") == 10.0 * 200 * 0.80

runs in **~3 microseconds**. That is a **~1,000,000× speedup per test** versus the DB-backed version, and zero flakiness — no network, no fixtures, no ordering. **Why this is the central optimization of the whole chapter.** Test speed is not a runtime property of production code; it is a property of *what each test has to set up*. Pushing logic into a pure core (Functional Core, Imperative Shell) means the high-branch-count logic — exactly the part with the most cases to cover — is tested in microseconds with no doubles at all. You then write a *handful* of thin integration tests for the shell that does the I/O wiring. **Runtime cost of the refactor: zero.** `quote_price` is plain arithmetic; pulling it out of the handler removed nothing and added no indirection. The shell still calls it directly. This is the ideal case: testability improved, runtime identical, suite time collapsed. **Property testing becomes affordable.** Because the core is pure and fast, you can run thousands of generated inputs per test (`hypothesis`) in the time the DB version took to run one. See the [property-based testing](../../refactoring/README.md) angle in the broader suite.

Scenario 4 — In-memory fake vs Testcontainers: 8 min → 40 s (Go)¶

Every repository test spins a real Postgres via Testcontainers. The suite takes 8 minutes; developers stop running it locally and CI is the only place it runs.

func TestOrderRepo(t *testing.T) {
    ctx := context.Background()
    pg, _ := postgres.RunContainer(ctx)   // ~2-4s to pull/boot + migrate
    defer pg.Terminate(ctx)
    repo := NewOrderRepo(connect(pg))
    // ... assertions
}

Resolution

**Measurement.** A Testcontainers Postgres costs **~2–4 s** to start (image pull amortized, container boot + readiness + schema migration not). With ~120 repository tests each booting their own container, that is **~8 minutes** dominated entirely by container lifecycle, not by the SQL under test. **The seam makes the fast path possible.** Define the dependency as an interface (a *port*) and provide two implementations:

type OrderStore interface {
    Save(ctx context.Context, o Order) error
    ByID(ctx context.Context, id string) (Order, error)
}

// Production: PostgresStore (real SQL).
// Tests of *business logic*: an in-memory fake.
type memStore struct{ m map[string]Order }
func (s *memStore) Save(_ context.Context, o Order) error { s.m[o.ID] = o; return nil }
func (s *memStore) ByID(_ context.Context, id string) (Order, error) { return s.m[id], nil }

Now **~110** tests that exercise *service logic* use `memStore` (microseconds each), and **~10** tests that verify the *SQL itself* — query correctness, constraints, migrations — keep the real container. Suite drops from **8 min → ~40 s**, and the fast tests are runnable on every save. **Don't fake away coverage you need.** The in-memory fake does not test SQL syntax, indexes, transaction isolation, or `ON CONFLICT` behavior. Keep a thin layer of real-DB tests (a "contract test" run against both the fake and Postgres guards the fake from drifting from real behavior). Faking is for *isolating logic*, not for pretending the database does not exist. **Runtime cost:** the `OrderStore` interface adds one interface dispatch per DB call — utterly invisible next to a network round-trip of ~0.2–2 ms. The seam is free in production and worth ~7 minutes in the suite.

Scenario 5 — The Humble Object keeps the slow boundary out of the fast suite (Java)¶

A scheduled job mixes Quartz triggering, JDBC, and the reconciliation algorithm in one class. The only way to test the algorithm is to fire the real trigger against a real DB.

class ReconciliationJob implements Job {
    @Override public void execute(JobExecutionContext ctx) {
        List<Txn> bank = jdbc.query("SELECT ... ");      // I/O
        List<Txn> ledger = jdbc.query("SELECT ... ");    // I/O
        // ... 200 lines of matching logic, the part with all the bugs ...
        jdbc.update("INSERT INTO breaks ...");           // I/O
    }
}

Resolution

**Measurement.** Testing the matching logic via `execute` required a Quartz scheduler + DB: **~5 s** setup, brittle, serial. The matching logic — the bug-dense part — had ~30 branches and deserved 50 cases, but each case cost seconds. **Humble Object pattern.** Make the boundary class *humble*: it does only the untestable plumbing and immediately delegates to a pure, testable object.

// Humble: thin, almost no logic, not unit-tested (covered by 1-2 integration tests).
class ReconciliationJob implements Job {
    @Override public void execute(JobExecutionContext ctx) {
        var bank = repo.loadBank();
        var ledger = repo.loadLedger();
        var breaks = new Reconciler().match(bank, ledger);   // pure
        repo.saveBreaks(breaks);
    }
}

// Testable: pure, no framework, no I/O.
class Reconciler {
    List<Break> match(List<Txn> bank, List<Txn> ledger) { /* 200 lines, all branches */ }
}

Now the 50 matching-logic cases run against `Reconciler.match` as plain data-in/data-out tests — **microseconds each, no Quartz, no DB**. The humble `execute` gets *one or two* integration tests proving the wiring. **This is the same idea as MVP/MVVM's "humble view"** and Feathers' boundary seams: behavior lives in objects you can call directly; the parts that are awkward to test (frameworks, drivers, UI) are made so thin there is almost nothing in them to test. **Runtime: unchanged** — `execute` still does exactly the same work; you only moved code across a method boundary the JIT inlines anyway.

Scenario 6 — A seam adds an allocation in a hot path (Go)¶

You inject a logger interface for testability. Profiling a packet-processing loop shows the injected call is allocating — interface{} boxing on every log argument, in a path that runs per packet.

type Logger interface { Debugf(format string, args ...any) }

func (p *Processor) handle(pkt Packet, log Logger) {
    log.Debugf("recv seq=%d len=%d", pkt.Seq, pkt.Len) // args... boxes ints to interface{} every call
    // ... actual work
}

Resolution

**Measurement.** `args ...any` forces `pkt.Seq` and `pkt.Len` to be *boxed* into `interface{}` (a heap allocation each for non-pointer types that escape) on **every** call — even when debug logging is disabled and the message is discarded. `go test -bench -benchmem` shows `2 allocs/op` and ~40 ns just for the boxing, per packet, at line rate. The interface dispatch is negligible; the *variadic boxing* is the cost. **The fix is not to remove the seam — it is to guard the expensive part with a level check, and provide a fast path:**

type Logger interface {
    Enabled(Level) bool
    Debugf(format string, args ...any)
}

func (p *Processor) handle(pkt Packet, log Logger) {
    if log.Enabled(Debug) {                         // cheap bool check, no boxing
        log.Debugf("recv seq=%d len=%d", pkt.Seq, pkt.Len)
    }
    // ... actual work
}

When debug is off (production default), the boxing never happens: **0 allocs/op**, ~1 ns for the `Enabled` call. The seam survives for tests, which set the level to `Debug` and assert on a fake logger. **General principle — concrete fast path beside the injectable one.** When a seam genuinely costs an allocation in a measured hot path, do not delete the abstraction; (a) gate the expensive work behind a cheap predicate, or (b) keep the interface for the cold path and call a non-allocating concrete method on the hot path. Structured loggers (`zap`, `slog`) take exactly approach (a): `log.Debug(...)` checks the level before evaluating fields. You keep testability *and* zero hot-path allocation — measured, not assumed.

Scenario 7 — Over-mocking costs nothing at runtime but rots the suite (Python)¶

A test mocks every collaborator and asserts on the exact sequence of calls. It passes, runs in 1 ms, and costs nothing at runtime — yet it is a performance problem for the team.

def test_checkout():
    cart = Mock(); pricing = Mock(); tax = Mock(); inv = Mock(); pay = Mock()
    pricing.subtotal.return_value = 100
    tax.compute.return_value = 8
    svc = Checkout(cart, pricing, tax, inv, pay)
    svc.run("order-1")
    pricing.subtotal.assert_called_once_with(cart)
    tax.compute.assert_called_once_with(100)
    inv.reserve.assert_called_once()      # asserts internal call order/shape
    pay.charge.assert_called_once_with(108)

Resolution

**Runtime cost: literally none** — mocks are in-memory; the test runs in ~1 ms. The cost is **maintenance velocity**, which is the suite's real performance metric over a project's life. **The measurement is change-amplification.** This test asserts *how* `run` works, not *what* it returns. Rename `subtotal` to `computeSubtotal`, reorder two internal calls, or push tax into pricing — all behavior-preserving — and this test breaks even though nothing observable changed. On a 2,000-test suite where ~40% are call-sequence mocks, a routine refactor can red-light **hundreds** of tests, turning a 1-hour refactor into a day of test surgery. That is the suite "rotting": it raises the cost of every future change. **Resolution — test outcomes, fake at real seams only.** Mock at *architectural boundaries* (the payment gateway, the clock) and let internal collaborators be real or simple in-memory fakes:

def test_checkout_charges_total_with_tax():
    inv = InMemoryInventory(stock={"sku-1": 5})
    pay = FakeGateway()                          # the one true external boundary
    svc = Checkout(RealCart(["sku-1"]), RealPricing(), RealTax(rate=0.08), inv, pay)
    svc.run("order-1")
    assert pay.last_charge == 108               # asserts the *result*, not the call graph
    assert inv.stock["sku-1"] == 4

This survives any internal refactor that preserves behavior, and it covers `Pricing`/`Tax` for free instead of stubbing their answers. **Principle:** over-mocking is invisible to the profiler and lethal to the suite's maintainability — mock roles, not collaborators; assert state and return values, not call order.

Scenario 8 — Fake clock removes `sleep`-based flakiness and wall-clock waste (Go)¶

A cache-expiry test calls the real clock and time.Sleeps past the TTL. It is flaky under load and wastes real seconds.

func TestExpiry(t *testing.T) {
    c := NewCache(50 * time.Millisecond)   // TTL
    c.Set("k", "v")
    time.Sleep(60 * time.Millisecond)      // wall-clock wait
    _, ok := c.Get("k")
    if ok { t.Fatal("should have expired") }
}

Resolution

**Measurement — two distinct costs.** 1. **Wall-clock waste:** each such test burns ~60 ms of real time. A suite with 200 timing tests wastes **~12 s** doing nothing but sleeping, serially, every run. 2. **Flakiness:** the 10 ms margin between TTL (50 ms) and sleep (60 ms) is not enough under a loaded CI runner where the goroutine may not be scheduled for 20+ ms. The test fails intermittently — and a flaky test that fails 1% of the time on a 200-test suite fails *some* test ~1 in 5 runs, eroding trust and triggering reruns (more wall-clock waste). **Inject the clock — make time a dependency.**

type Clock interface{ Now() time.Time }

type Cache struct{ ttl time.Duration; clk Clock; /* ... */ }

func NewCache(ttl time.Duration, clk Clock) *Cache { return &Cache{ttl: ttl, clk: clk} }

func TestExpiry(t *testing.T) {
    clk := &FakeClock{t: time.Unix(0, 0)}
    c := NewCache(50*time.Millisecond, clk)
    c.Set("k", "v")
    clk.Advance(60 * time.Millisecond)   // instant, deterministic
    if _, ok := c.Get("k"); ok { t.Fatal("should have expired") }
}

The test now runs in **microseconds**, is **100% deterministic** (no scheduler dependence), and tests the *exact* boundary (`Advance(49ms)` still present, `Advance(50ms)` expired) which a sleep-based test could never assert precisely. Across the suite that is ~12 s of wall-clock recovered and an entire class of flakiness eliminated. **Runtime cost in production:** `clk.Now()` is one interface call (~1–2 ns) replacing `time.Now()` (a vDSO call costing ~15–25 ns). The injected clock is, if anything, *not slower*. In production you pass a real clock whose `Now()` calls `time.Now()`. Determinism in tests; zero meaningful cost in production. The same pattern injects randomness (seeded RNG) and UUID generation.

Scenario 9 — `interface{}` / dynamic dispatch defeats JIT monomorphism (Java)¶

A reviewer claims your testability interface will prevent the JIT from inlining and tank a hot method. You need to know when that is actually true.

interface PriceRule { long apply(long cents); }

// Hot accounting loop over 100M line items:
long total = 0;
for (LineItem li : items) {
    total += rule.apply(li.cents());   // virtual/interface call
}

Resolution

**How HotSpot actually behaves.** The JIT profiles call sites and classifies them: - **Monomorphic** (one implementing class ever seen here): the JIT inlines the target directly with a guard. Cost ≈ a direct call. This is the common case — most injected dependencies have exactly one production implementation. - **Bimorphic** (two classes): still inlined with two guarded branches. - **Megamorphic** (≥3 classes at the site): falls back to a vtable/itable lookup, **no inlining**. Cost ~2–5 ns + the lost inlining (which can be the bigger loss, ~10–30% on a tiny hot method). **Measurement.** With one production `PriceRule`, JMH shows the interface loop within **noise** of a concrete loop — the JIT inlined it (confirm with `-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining`, look for `inline (hot)`). The interface is free. **When it bites and how to keep testability anyway.** The site goes megamorphic only if **production** calls the loop with ≥3 different rule classes through the same site. Resolutions, in order: 1. **Confirm it's real.** Profile production, not a microbenchmark with artificial polymorphism. Usually it's monomorphic and there is nothing to do. 2. **Hoist the dispatch out of the loop** — resolve the rule once, then loop on the concrete:

if (rule instanceof FlatRule fr) {        // pattern match to concrete
    for (LineItem li : items) total += fr.applyFlat(li.cents());  // monomorphic, inlined
} else {
    for (LineItem li : items) total += rule.apply(li.cents());
}

3. **Provide a concrete fast path** the production code uses while the interface remains for tests (same structure as Scenario 1/6). **Python note:** every method call is a dict lookup on the type — dynamic dispatch is the baseline, so an injected interface adds *nothing*; there is no JIT to defeat. **Go note:** see Scenario 1 — itab dispatch ~1–2 ns, devirtualized when the concrete type is known. **Principle:** the seam is free at monomorphic sites (the vast majority); only measured megamorphism warrants the concrete fast path, and even then you keep the interface for tests.

Scenario 10 — God constructor does I/O, so every test pays for it (Python)¶

ReportService.__init__ opens a DB connection, reads a config file, and warms a cache. Every test that needs a ReportService — even ones testing pure formatting — pays for all of it.

class ReportService:
    def __init__(self):
        self.db = psycopg2.connect(DSN)          # network on construction
        self.cfg = yaml.safe_load(open("/etc/report.yaml"))
        self.cache = self._warm_cache()           # more queries
    def format_row(self, row): ...                # pure, but you can't reach it cheaply

Resolution

**Measurement.** Constructing `ReportService` costs a DB connect (~30–100 ms), a file read, and cache-warming queries (~hundreds of ms). A test of `format_row` — pure string work — should run in microseconds but instead pays ~**200 ms** of construction, *and* fails entirely on machines without the DB or `/etc/report.yaml`. A 100-test suite inherits ~20 s of pure construction overhead and is unrunnable offline. **Separate construction from work (no I/O in constructors).** The constructor should only *assign collaborators*; do the work lazily or in an explicit method, and inject the collaborators:

class ReportService:
    def __init__(self, db, cfg, cache):     # plain assignment, no I/O
        self.db, self.cfg, self.cache = db, cfg, cache
    def format_row(self, row): ...

    @classmethod
    def bootstrap(cls):                     # the I/O lives in a factory, used by main()
        db = psycopg2.connect(DSN)
        cfg = yaml.safe_load(open("/etc/report.yaml"))
        return cls(db, cfg, warm_cache(db))

def test_format_row():
    svc = ReportService(db=None, cfg={"locale": "en"}, cache={})  # no I/O
    assert svc.format_row({"amt": 1200}) == "$12.00"             # microseconds

The pure test no longer touches the DB or the filesystem: **~200 ms → ~5 µs**, and it runs offline. The I/O is concentrated in `bootstrap`, exercised by a single integration test. **Runtime cost in production:** none — `bootstrap` does exactly the I/O the old constructor did, once, at startup. You only moved *where* it happens. **Principle:** a constructor that does real work makes every test (and every alternate-mode use) pay for I/O it may not need; inject collaborators, push side effects into a factory.

Scenario 11 — Function injection vs struct-of-interfaces in a tight path (Go)¶

To make a routine testable you debate two seam styles: pass a Notifier interface, or pass a func(Event) callback. Someone worries the closure allocates.

// Style A: interface
type Notifier interface{ Notify(Event) }
func process(events []Event, n Notifier) { for _, e := range events { n.Notify(e) } }

// Style B: function value
func process(events []Event, notify func(Event)) { for _, e := range events { notify(e) } }

Resolution

**Measurement.** Both compile to an indirect call (~1–2 ns). Allocation differs only at the *call site*, not in the loop: - A `func(Event)` that **captures variables** becomes a heap-allocated closure — **1 alloc** when created (once, before the loop), 0 per iteration. `-gcflags=-m` prints `func literal escapes to heap`. - A method value `n.Notify` or a non-capturing function is a static reference — **0 allocs**. - The interface (Style A) allocates only if the concrete value must be boxed into the interface and escapes — again **once**, not per iteration. So per-iteration cost is identical (~1–2 ns dispatch, 0 allocs) for both; the only difference is a single setup allocation if you use a *capturing* closure. Over a 1M-event loop, that one allocation is ~0.0001% of the work. **Resolution.** Choose on *design*, not on this non-difference: - Use the **interface** when the seam has multiple methods or you want a named role testers fake (`FakeNotifier` recording calls). - Use the **function value** when it is a single operation and a test can pass `func(e Event){ recorded = append(recorded, e) }` — the lightest possible double, no type to declare. If the rare hot path truly cannot afford even the setup allocation, pass a non-capturing function or a method value and keep state in the receiver. **Principle:** both seams are free per-iteration; pick for clarity and test ergonomics, and only avoid *capturing* closures when a measured hot path's one-time allocation matters (it almost never does).

Scenario 12 — Test parallelism unlocked by removing global state (Java)¶

The suite runs serially and takes 9 minutes. The reason tests can't run in parallel is a global singleton they each mutate — a testability defect that also caps throughput on a 16-core CI box.

class CurrentTenant {                       // global mutable state
    private static String tenant;
    static void set(String t) { tenant = t; }
    static String get() { return tenant; }
}
// Tests each call CurrentTenant.set(...) and read it back — they leak into each other.

Resolution

**Measurement.** With the global, two tests running concurrently clobber each other's `tenant`, so the suite must stay serial: **9 minutes** on one core while 15 cores idle. The global is simultaneously a *correctness* hazard (test interdependence, order-dependent failures) and a *performance* ceiling. **Inject the tenant; delete the global.** Pass it as a dependency (constructor or method parameter / a request-scoped object):

class OrderService {
    private final TenantContext tenant;
    OrderService(TenantContext tenant) { this.tenant = tenant; }   // injected, per-instance
}

Each test constructs its own `TenantContext` — no shared state. Now the runner can parallelize:

// JUnit 5
@Execution(ExecutionMode.CONCURRENT)
class OrderServiceTest { /* each test isolated */ }

On 16 cores with no shared mutable state, the wall-clock drops from **9 min → ~45 s** (near-linear, bounded by the longest single class). The same change that made the tests *isolated* (the testability goal) is what made them *parallelizable* (the speed goal) — they are the same property. **Runtime cost in production:** an instance field read instead of a static read — both ~nanoseconds, the field arguably more cache-friendly. If you genuinely need ambient context, use a `ScopedValue`/`ThreadLocal` owned by the request boundary, not a mutable static. **Principle:** global mutable state is the classic testability anti-pattern, and its hidden tax is a serial suite; removing it buys isolation *and* parallel throughput at once.

Scenario 13 — `unittest.mock` import & patch overhead at collection time (Python)¶

A 3,000-test pytest suite feels sluggish to start even before any test runs. Profiling collection shows heavy @patch usage and broad imports inflating startup.

from unittest.mock import patch
@patch("app.services.payments.stripe")     # patches at decoration/collection
@patch("app.services.email.smtplib")
@patch("app.services.storage.boto3")
def test_thing(mock_boto, mock_smtp, mock_stripe): ...

Resolution

**Measurement.** `@patch("module.target")` *imports the target module* to locate the attribute. With hundreds of patch decorators referencing heavy SDKs (`boto3`, `stripe`), collection drags in those SDK import trees — **boto3 alone imports in ~0.5–1 s**. Multiply across modules and pytest *collection* (before a single assertion) takes **10–20 s**. Each patch's setup/teardown also adds ~tens of microseconds per test — small individually, but 3,000× with multiple patches each is a measurable slice of run time. **This is the over-mock tax in a different guise — and the fix is the same seam.** If services took collaborators by injection, you would pass lightweight fakes and never import the SDK in unit tests at all:

def test_thing():
    svc = OrderService(payments=FakeGateway(), email=FakeMailer(), storage=FakeStore())
    # no patching, no boto3 import, no SDK import tree

`FakeGateway` is ~10 lines; constructing it is microseconds and pulls in nothing. Collection no longer imports `boto3`, so startup drops back to ~1–2 s, and tests don't pay per-`@patch` setup. **When you must patch** (legacy code with hard-wired imports you can't yet refactor): patch at the *narrowest* target, prefer `patch.object` on an already-imported symbol over a string path that forces a fresh import, and scope patches with a fixture so the import cost is paid once. But the durable fix is the seam: code designed for testability is patched rarely, so it neither rots (Scenario 7) nor pays import/patch overhead here. **Principle:** heavy `@patch` usage signals missing injection seams; the same dependency injection that improves design also strips SDK imports and patch overhead out of the suite's startup and per-test cost.

Rules of Thumb¶

Measure before de-abstracting. An interface call is ~1–2 ns (Go itab), inlined-to-free at monomorphic JIT sites (HotSpot), and the baseline in Python. Never delete a seam on the assumption it is slow — profile and find it is not.
Testable design is nearly free at runtime. Constructor injection, ports, pure cores, and humble objects move where code runs, not how much. The JIT inlines the extra method boundaries; escape analysis erases short-lived seams.
The performance win runs toward the test suite. Pure cores test in microseconds vs seconds for I/O-backed tests; in-memory fakes turn an 8-minute container suite into ~40 s; fake clocks reclaim wall-clock waste and kill sleep flakiness.
DI cost is startup, not throughput — and it lands on the test suite hardest. Reflection-based wiring (Spring) costs seconds per booted context; share the context, slice it, prefer plain new for unit tests, and reach for compile-time DI (Dagger, Go wire) when boot time matters.
Optimize the rare measured hot-path seam only — and keep it. When a seam truly costs in a profiled hot path, add a concrete fast path beside the injectable one (Scenario 1/6), gate expensive work behind a cheap predicate, or hoist dispatch out of the loop. Don't delete the seam.
Watch variadic/boxing, not dispatch. In hot paths the allocation from args ...any / interface{} boxing dwarfs the dispatch — guard it with a level/predicate check.
Over-mocking has zero runtime cost and a steep maintenance cost. Call-sequence mocks rot the suite by amplifying every refactor into test surgery; mock roles at real boundaries, assert state and return values.
Fake the clock, randomness, and UUIDs. Determinism removes flakiness and wall-clock sleeps; the injected clock is no slower than time.Now() in production.
No I/O in constructors. A god constructor makes every test pay for network/disk it may not need; inject collaborators, push side effects into a factory.
Removing global mutable state buys isolation and parallelism. The same property that isolates tests lets the runner use every core — often a 9 min → 45 s suite win on its own.

flowchart TD A[Testability change suspected of hurting performance?] --> B{Which axis?} B -->|Runtime| C[Profile the hot path] B -->|Test-suite speed| D[Testable design HELPS speed] C --> E{Seam is the top frame?} E -->|No, almost always| F[Keep the seam — it is free] E -->|Yes, measured| G[Concrete fast path beside injectable one /<br/>guard with cheap predicate /<br/>hoist dispatch out of loop] G --> H[Keep the seam for tests] D --> I[Pure core: µs tests, no I/O] D --> J[In-memory fake: 8min container suite to 40s] D --> K[Humble object: slow boundary out of fast suite] D --> L[Fake clock: no sleep flakiness, no wall-clock waste] D --> M[No global state: parallel suite, 9min to 45s] D --> N[Inject deps: no SDK import / patch overhead] F --> Z[Testable AND fast] H --> Z I --> Z J --> Z K --> Z L --> Z M --> Z N --> Z

find-bug.md — testability defects to spot in code under review (hidden deps, god constructors, untestable boundaries).
professional.md — the professional discipline of designing for testability up front.
Chapter README — the positive rules: injected abstractions, seams, pure cores, the humble object.
Unit Tests — writing the fast, isolated tests this design makes possible.
Refactoring — extracting pure cores and introducing seams are refactorings; property-based testing rides on a fast pure core.

Designing for Testability — Optimize & Reconcile¶

Table of Contents¶

Scenario 1 — Interface dispatch vs concrete call in a hot loop (Go)¶

Scenario 2 — Spring reflection startup vs compile-time DI (Java)¶

Scenario 3 — Pure functional core tests in microseconds (Python)¶

Scenario 4 — In-memory fake vs Testcontainers: 8 min → 40 s (Go)¶

Scenario 5 — The Humble Object keeps the slow boundary out of the fast suite (Java)¶

Scenario 6 — A seam adds an allocation in a hot path (Go)¶

Scenario 7 — Over-mocking costs nothing at runtime but rots the suite (Python)¶

Scenario 8 — Fake clock removes sleep-based flakiness and wall-clock waste (Go)¶

Scenario 9 — interface{} / dynamic dispatch defeats JIT monomorphism (Java)¶

Scenario 10 — God constructor does I/O, so every test pays for it (Python)¶

Scenario 11 — Function injection vs struct-of-interfaces in a tight path (Go)¶

Scenario 12 — Test parallelism unlocked by removing global state (Java)¶

Scenario 13 — unittest.mock import & patch overhead at collection time (Python)¶

Rules of Thumb¶

Related Topics¶

Scenario 8 — Fake clock removes `sleep`-based flakiness and wall-clock waste (Go)¶

Scenario 9 — `interface{}` / dynamic dispatch defeats JIT monomorphism (Java)¶

Scenario 13 — `unittest.mock` import & patch overhead at collection time (Python)¶