The Three Laws of TDD — Optimization Drills¶

Category: Craftsmanship Disciplines — write production code only to make a failing test pass, in the tightest possible loop.

10 drills to sharpen your TDD loop and your test suite. The optimization target here is the feedback loop — keeping the nano-cycle in seconds, keeping the suite fast, trustworthy, and non-brittle. A slow or brittle suite makes the three laws impossible to follow, so optimizing the suite is optimizing TDD.

Table of Contents¶

Drill 1: Make a Slow Test Suite Fast
Drill 2: Remove a Brittle Implementation-Coupled Test
Drill 3: De-Mock Toward Sociable Tests
Drill 4: Tighten the Cycle (Smaller Steps)
Drill 5: Replace a Real Boundary with a Fast Fake
Drill 6: Collapse Triangulated Tests into a Table
Drill 7: Kill a Flaky Test
Drill 8: Add the Missing Assertion (Coverage → Verification)
Drill 9: Re-Tier — Move Slow Tests Out of the Unit Suite
Drill 10: Parallelize and Isolate
Optimization Tips
Summary

Drill 1: Make a Slow Test Suite Fast¶

Before — every test spins up the whole application context¶

@SpringBootTest                          // boots the FULL context: DB, web, beans
class DiscountTest {
    @Autowired DiscountCalculator calc;  // ~4s startup PER test class
    @Test void goldTierGetsTenPercent() {
        assertThat(calc.forOrder(goldOrder)).isEqualTo(money(10));
    }
}

After — test the pure logic with a plain object, no framework¶

class DiscountTest {
    DiscountCalculator calc = new DiscountCalculator();   // microseconds
    @Test void goldTierGetsTenPercent() {
        assertThat(calc.forOrder(goldOrder)).isEqualTo(money(10));
    }
}

Gain: A real win — the per-class 4-second context boot disappears. DiscountCalculator is pure logic; it needs no Spring context. Reserve @SpringBootTest for the handful of integration tests that genuinely wire components. A unit suite that boots a framework per test cannot sustain a seconds-long nano-cycle.

Drill 2: Remove a Brittle Test¶

Before — asserts the call sequence; breaks on any refactor¶

@Test void processesPayment() {
    new Checkout(gateway, ledger).pay(order);
    InOrder inOrder = inOrder(gateway, ledger);
    inOrder.verify(gateway).authorize(any());
    inOrder.verify(gateway).capture(any());      // couples to exact internal steps
    inOrder.verify(ledger).record(any());
}

After — assert the observable outcome¶

@Test void successfulPaymentIsRecordedAsPaid() {
    var ledger = new InMemoryLedger();
    new Checkout(new FakeGateway(SUCCESS), ledger).pay(order);
    assertThat(ledger.entryFor(order).status()).isEqualTo(PAID);  // outcome, not steps
}

Gain: The test now survives any behavior-preserving refactor (merging authorize+capture, reordering internal calls). Brittle interaction tests turn the suite into an anchor that punishes refactoring — the opposite of what TDD is for. Assert what happened, not how.

Drill 3: De-Mock Toward Sociable Tests¶

Before — three mocks for collaborators that are fast and pure¶

def test_total():
    tax = Mock(); tax.compute.return_value = 8
    disc = Mock(); disc.compute.return_value = 5
    fmt = Mock(); fmt.render.side_effect = lambda x: f"${x}"
    cart = Cart(tax, disc, fmt)
    assert cart.total_str(100) == "$103"   # really just testing the mock wiring

After — use the real, fast collaborators (classicist / sociable)¶

def test_total():
    cart = Cart(TaxRule(0.08), PercentDiscount(0.05), MoneyFormatter())
    assert cart.total_str(100) == "$103"   # exercises the real computation

Gain: The mocked version tested that you wired mocks correctly; the sociable version tests the actual arithmetic end-to-end through real, fast objects. Fewer mocks means fewer false greens and a test that catches real bugs in TaxRule/PercentDiscount. Mock only slow/external/non-deterministic boundaries — not pure value logic. See Senior on classicist vs mockist.

Drill 4: Tighten the Cycle¶

Before — one giant leap (red to green took 25 minutes of debugging)¶

Test list (attempted in one step):
  [ ] parse a full config file with sections, comments, includes, and overrides
→ wrote 80 lines, ran, 6 failures, no idea which part is wrong, 25 min debugging

After — decompose into nano-sized steps¶

Test list (one failing test at a time):
  [ ] empty input -> empty config
  [ ] single "k=v" -> {k: v}
  [ ] comment line ignored
  [ ] two keys
  [ ] [section] prefixes keys
  [ ] include directive merges another file
→ each step: 2-5 lines, red→green in <60s, failures localized to the last step

Gain: No runtime change, but a massive loop improvement — debugging drops from 25 minutes to seconds because every failure is caused by the last 2–5 lines. The three laws force this; the optimization is recognizing when your step was too big and breaking it down. Big steps are the #1 reason TDD "feels slow."

Drill 5: Replace a Real Boundary with a Fast Fake¶

Before — the test sleeps to wait for a real timer/clock¶

func TestSessionExpires(t *testing.T) {
    s := NewSession(100 * time.Millisecond)
    time.Sleep(150 * time.Millisecond)        // real wall-clock wait — slow + flaky
    if !s.Expired() { t.Fail() }
}

After — inject the clock; advance it instantly¶

func TestSessionExpires(t *testing.T) {
    clock := &FakeClock{now: t0}
    s := NewSessionWithClock(100*time.Millisecond, clock)
    clock.Advance(150 * time.Millisecond)     // instant, deterministic
    if !s.Expired() { t.Fatal("should be expired") }
}

Gain: Eliminates a 150ms-per-run sleep and the flakiness of timing-dependent tests. Time, randomness, network, and the filesystem are the four boundaries that make tests slow and non-deterministic — inject them so tests control them. The design improves too: the session is now testable against any clock.

Drill 6: Collapse Triangulated Tests into a Table¶

Before — five near-identical tests left over from triangulation¶

def test_1(): assert fizzbuzz(1) == "1"
def test_2(): assert fizzbuzz(2) == "2"
def test_3(): assert fizzbuzz(3) == "Fizz"
def test_5(): assert fizzbuzz(5) == "Buzz"
def test_15(): assert fizzbuzz(15) == "FizzBuzz"

After — one parametrized test (a refactor of the TESTS, on green)¶

@pytest.mark.parametrize("n,expected", [
    (1,"1"), (2,"2"), (3,"Fizz"), (5,"Buzz"), (15,"FizzBuzz"),
])
def test_fizzbuzz(n, expected):
    assert fizzbuzz(n) == expected

Gain: Same coverage, far less duplication, and a single place to add the next case. Important sequencing: you write these one at a time during red-green (Law 2 forbids ten failing tests at once), then collapse them into the table during the refactor beat. The test suite is code — deduplicate it on green like any other code.

Drill 7: Kill a Flaky Test¶

Before — passes ~90% of the time, depends on real "now" and ordering¶

def test_recent_events():
    log_event("a")
    log_event("b")
    events = recent(within_seconds=1)
    assert events == ["b", "a"]   # flaky: depends on system clock + insertion timing

After — control time, assert on a stable property¶

def test_recent_events():
    clock = FakeClock(t0)
    log = EventLog(clock)
    log.record("a"); clock.advance(0.1)
    log.record("b")
    assert log.recent(within_seconds=1) == ["b", "a"]   # deterministic ordering

Gain: A flaky test is worse than no test — it trains the team to ignore red, which destroys the trust the whole loop depends on. Removing nondeterminism (injected clock, fixed seeds for RNG, no reliance on real timing/ordering) makes the test trustworthy. Treat flakes as sev events, not annoyances. See Professional.

Drill 8: Add the Missing Assertion¶

Before — coverage theater: runs the code, checks nothing¶

@Test void exportRuns() {
    exporter.export(report);   // green if it doesn't throw; verifies no behavior
}

After — assert the actual output¶

@Test void exportWritesCsvHeader() {
    var sink = new StringWriter();
    exporter.export(report, sink);
    assertThat(sink.toString()).startsWith("id,name,total\n");  // real verification
}

Gain: Converts a line-coverage placebo into a real test. The "before" version inflates coverage while a mutation test would show every mutant surviving (mutation score 0%). The optimization that matters is verification, not execution — assert behavior. See find-bug, Bug 2.

Drill 9: Re-Tier¶

Before — the "unit" suite is 9 minutes because tests hit Postgres¶

# tests/unit/test_repo.py  (mislabeled — really integration)
def test_save_and_find():
    repo = UserRepo(real_postgres())   # 600ms each, ×400 tests = minutes
    ...

After — split the tiers¶

# tests/unit/test_repo.py — fast, in-memory fake (milliseconds)
def test_save_and_find():
    repo = UserRepo(InMemoryStore())
    repo.save(User("alice")); assert repo.find("alice")

# tests/integration/test_repo_postgres.py — a FEW real-DB tests for SQL mapping
@pytest.mark.integration
def test_sql_mapping_round_trips():
    repo = UserRepo(real_postgres()); ...

Gain: The unit suite drops from 9 minutes to seconds, restoring the nano-cycle, while a small integration suite still verifies the real SQL. Developers run unit tests on every save again. This is the single highest-leverage fix when "TDD stopped working" on a team — the loop was killed by slow tests mislabeled as units. See Professional, Incident 1.

Drill 10: Parallelize and Isolate¶

Before — tests can't run in parallel because of shared global state¶

TEMP_DIR = "/tmp/test"          # shared by all tests → collisions in parallel

def test_writes_file():
    write(TEMP_DIR + "/out.txt", data)
    assert read(TEMP_DIR + "/out.txt") == data

After — per-test isolation enables parallelism¶

def test_writes_file(tmp_path):           # pytest gives each test a fresh dir
    out = tmp_path / "out.txt"
    write(out, data)
    assert read(out) == data

Gain: With no shared mutable state, the runner can execute tests in parallel (pytest -n auto, go test -parallel, JUnit parallel) — often a multiplicative speedup on a multicore machine. Isolation is a prerequisite: parallelism on shared state produces flakiness, not speed. Isolated tests are both faster (parallel) and more trustworthy (order-independent). See Test Design & Fixtures.

Optimization Tips¶

Where TDD optimization actually pays off¶

Suite speed is the master metric. The three laws require a seconds-long loop; anything that slows the unit suite (framework boot, real I/O, sleeps) directly attacks TDD. Optimize speed first.
Trust beats coverage. A fast, flake-free, assertion-rich suite that you believe is worth more than a slow one with a high coverage number. Audit with mutation testing, not the coverage gauge.
Brittleness is a hidden cost. Implementation-coupled and over-mocked tests block the refactoring the loop is supposed to enable — de-couple them to assert behavior.
Smaller steps localize failure. When red→green takes more than a minute, your step is too big; decompose the test list.

Optimization checklist¶

Anti-optimizations¶

❌ Chasing a coverage number — produces assertion-free tests; optimize trust (mutation score) instead.
❌ Mocking to make a slow test fast when the real object is already fast — you trade speed you didn't need for brittleness you don't want.
❌ Deleting "annoying" tests to speed up the suite — fix or re-tier them; don't lose the verification.
❌ One huge end-to-end test "to cover everything" — slow, brittle, and it can't drive the inner loop.

Summary¶

TDD optimization is overwhelmingly about the feedback loop: keep the unit suite fast enough that the nano-cycle stays in seconds. The big levers are removing framework boot and real I/O from unit tests (re-tier them), injecting boundaries (clock, RNG, network) so tests are fast and deterministic, de-coupling brittle implementation/mock-heavy tests to assert behavior instead, and isolating tests so they parallelize. The metric to chase is trust (mutation score, flake rate, suite wall-clock), never raw coverage — because a slow or untrustworthy suite makes the three laws unfollowable, and an unfollowable discipline delivers nothing.

← Find-Bug · Craftsmanship Disciplines · Roadmap

The Three Laws of TDD suite complete. All 8 files: junior · middle · senior · professional · interview · tasks · find-bug · optimize.

Next discipline: Test Design & Fixtures.