Table-Driven Tests — Senior¶
Table of Contents¶
- Where We Pick Up
- Designing Tables for Complex Domains
- The Anatomy of a Good Row Struct
- Nested Tables — Subtests Within Subtests
- Matrix Tests and Cross-Products
- Programmatic Table Generation
- Splitting a Table — When and How
- Tables for State Machines
- Tables for Sequence Tests
- Tables and
t.Parallelat Scale - Property-Based vs Table-Driven
- Composing Tables Across Test Files
- Table-Driven Benchmarks
- Table-Driven Fuzz Seeds
- BDD-style Frameworks — When and Why Not
- Worked Example — Compiler Lexer Tests
- Worked Example — Distributed System State Transitions
- Anti-Patterns at This Level
- Decision Checklist
- What to Read Next
- Self-Check
Where We Pick Up¶
At this level the mechanics are routine. What changes is judgment: when is a table the right shape, when do you split, when do you nest, when do you generate, when do you walk away and write five separate functions?
Concretely:
- You can already write parallel, golden-file, error-matching, helper-using tables.
- You will now design tables for compilers, state machines, distributed systems, and large-domain validators.
- You will encounter tables that have grown into 1000-line monsters and decide whether to refactor or accept the cost.
Designing Tables for Complex Domains¶
Three properties make a table-driven test good:
- Each row reads top-to-bottom in seconds. A reviewer should be able to look at
{name, in, want}and understand the case without scrolling to a helper function or another file. - The struct shape doesn't bend to accommodate outliers. If 19 rows need fields A, B, C and one needs A, B, C, D, E, F — that one row doesn't belong.
- Failures are uniquely attributable. When
TestFoo/case_42fails, the reader knows which row and which assertion. No combined messages, no shared mutable state.
In a simple domain (Abs, ParseInt), these properties come free. In a complex domain (a compiler frontend, an SQL planner), you have to design for them.
Example — designing for an SQL planner¶
You're testing Plan(sql string, schema Schema) (Tree, error). Each case has:
- Input SQL.
- Schema fixture.
- Expected tree shape (or substructure).
- Expected cost ranges.
- Optional: expected warnings.
A naive struct:
type planCase struct {
name string
sql string
schema Schema
wantTree string
wantCost float64
wantWarn []string
}
Pretty fast a Schema is 200 lines of Go literal. You don't want that inline in 50 rows. Solution: register schemas by name.
type planCase struct {
name string
sql string
schemaName string // looked up in schemaRegistry
wantTree string
wantCost float64
wantWarn []string
}
var schemaRegistry = map[string]Schema{
"users_orders": loadSchema("users_orders.yaml"),
"products_only": loadSchema("products.yaml"),
// ...
}
Now each row is one line. The schemas live in their own file. Reviewers reading the table see exactly the relevant inputs.
The Anatomy of a Good Row Struct¶
Pattern matrix for designing a row:
| Field role | Always have? | Example field name |
|---|---|---|
| Name | Yes | name string |
| Input | Yes | in T or args struct{...} |
| Expected result | Yes (or err) | want T |
| Expected error | If applicable | wantErr error or wantErrSubstr string |
| Pre-condition | Rarely | setup func() (smells; consider splitting) |
| Tags | Sometimes | tags []string for selective -run |
| Skip condition | Sometimes | skip func() bool |
Avoid:
actualorexpected— usegot/want.params— useargsor split into specific fields.- Booleans whose name doesn't suggest direction.
valid boolis fine;flag boolis not.
Field grouping¶
For wide tables, group related fields:
type roundtripCase struct {
name string
request struct {
method, path, body string
headers map[string]string
}
response struct {
code int
body string
}
}
Anonymous nested structs work great here. They make the row read like a config file. Use them when a row has 6+ fields.
Nested Tables — Subtests Within Subtests¶
A single t.Run is a one-level subtest. You can nest:
for _, group := range groups {
t.Run(group.name, func(t *testing.T) {
for _, tc := range group.cases {
t.Run(tc.name, func(t *testing.T) {
...
})
}
})
}
Output is hierarchical: TestX/group_a/case_1, TestX/group_a/case_2, TestX/group_b/case_1. You can filter with -run TestX/group_a to run a whole group.
When to nest:
- Genuine hierarchy — protocol versions × commands, time periods × operations.
- Shared per-group setup — the group's
t.Runbody builds a fixture; each case in the group reuses it. - Categorization — want to be able to run just the "validation" cases or just the "rendering" cases.
When not to nest:
- If groups exist only because the file got long. Splitting into separate
Test*functions is cleaner. - If
-run TestX/group_a/case_1is the only way users will navigate. The two-level depth adds typing cost.
Matrix Tests and Cross-Products¶
A matrix test exercises every combination of two or more dimensions:
drivers := []string{"sqlite", "postgres", "mysql"}
isolations := []sql.IsolationLevel{
sql.LevelReadCommitted,
sql.LevelRepeatableRead,
sql.LevelSerializable,
}
queries := []struct {
name string
sql string
}{
{"select_one", "SELECT 1"},
{"insert", "INSERT INTO t(v) VALUES(1)"},
{"update", "UPDATE t SET v=2 WHERE id=1"},
}
for _, driver := range drivers {
t.Run(driver, func(t *testing.T) {
for _, lvl := range isolations {
t.Run(lvl.String(), func(t *testing.T) {
for _, q := range queries {
t.Run(q.name, func(t *testing.T) {
runQuery(t, driver, lvl, q.sql)
})
}
})
}
})
}
This generates 3 × 3 × 3 = 27 subtests with paths like TestX/postgres/serializable/insert. Run all isolation levels for postgres:
Run all serializable cases across drivers:
Flattened vs nested matrix¶
Flat:
for _, driver := range drivers {
for _, lvl := range isolations {
for _, q := range queries {
name := fmt.Sprintf("%s/%s/%s", driver, lvl, q.name)
t.Run(name, func(t *testing.T) { runQuery(...) })
}
}
}
This produces the same names but in a single t.Run level. Loses the per-group setup hook (nested lets you build a per-driver client once and reuse for each isolation level).
Choose nested when there's per-group setup; flat when not.
Skipping unsupported combinations¶
t.Run(q.name, func(t *testing.T) {
if driver == "sqlite" && lvl == sql.LevelSerializable {
t.Skip("sqlite does not support serializable")
}
runQuery(...)
})
t.Skip reports the row as skipped (not failed). Good for combinations that are intentionally unsupported.
Programmatic Table Generation¶
Sometimes you want to generate cases:
Cross-product generator¶
type matrixRow struct {
name string
a, b int
op string
}
func gen() []matrixRow {
var rows []matrixRow
for _, a := range []int{-1, 0, 1, 100} {
for _, b := range []int{-1, 0, 1, 100} {
for _, op := range []string{"+", "-", "*", "/"} {
if op == "/" && b == 0 { continue }
rows = append(rows, matrixRow{
name: fmt.Sprintf("%d%s%d", a, op, b),
a: a, b: b, op: op,
})
}
}
}
return rows
}
func TestMatrix(t *testing.T) {
for _, tc := range gen() {
t.Run(tc.name, func(t *testing.T) { ... })
}
}
Generated cases from regulation/spec data¶
For tax, currency, locale, or other regulation-driven domains, the test cases come from official spec files. Generate Go code from the spec at build time with //go:generate:
The generated cases_gen.go is a slice literal, checked into the repo so reviewers can read it.
Random with fixed seed¶
func TestSortRandom(t *testing.T) {
rng := rand.New(rand.NewSource(42))
for i := 0; i < 100; i++ {
n := rng.Intn(100) + 1
input := make([]int, n)
for j := range input { input[j] = rng.Intn(1000) }
t.Run(fmt.Sprintf("rand_%03d", i), func(t *testing.T) {
cp := slices.Clone(input)
sort.Ints(cp)
if !sort.IntsAreSorted(cp) {
t.Errorf("not sorted: %v", cp)
}
})
}
}
Fixed seed means reproducible across runs. Reviewer can run -run TestSortRandom/rand_042 and get the same input every time.
Splitting a Table — When and How¶
Indicators that a table should split:
- Multiple concerns share one struct. Half the rows test validation, the other half test rendering. The struct has fields used by only one half.
- Setup divergence. Some rows need a Docker container, others are pure. Filtering with skip flags clutters the table.
- Length exceeds ~150 rows. Reviewers stop reading carefully.
- The table's name doesn't describe what it tests.
TestHandlerwith 50 unrelated rows is a god-table.
How to split cleanly:
// Before: one TestHandler with 50 rows
// After:
func TestHandler_Parsing(t *testing.T) { ... } // 15 rows
func TestHandler_Authorization(t *testing.T) { ... } // 10 rows
func TestHandler_RateLimit(t *testing.T) { ... } // 8 rows
func TestHandler_Rendering(t *testing.T) { ... } // 17 rows
Each smaller function:
- Has a focused name.
- Has a tightly scoped struct.
- Can use
-run TestHandler_Parsingto debug. - Has its own setup if needed.
If multiple split tests share fixtures, extract them to a helper in the same file:
Tables for State Machines¶
A state machine test exercises transitions. The natural table shape:
cases := []struct {
name string
from State
event Event
toWant State
errWant error
}{
{"new_to_ready", StateNew, EventValidate, StateReady, nil},
{"new_to_invalid", StateNew, EventReject, StateNew, ErrInvalid},
{"ready_to_active", StateReady, EventStart, StateActive, nil},
{"active_to_done", StateActive, EventFinish, StateDone, nil},
{"done_no_restart", StateDone, EventStart, StateDone, ErrTerminal},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
m := &Machine{state: tc.from}
err := m.Handle(tc.event)
if !errors.Is(err, tc.errWant) { t.Errorf("err = %v, want %v", err, tc.errWant) }
if m.state != tc.toWant { t.Errorf("state = %v, want %v", m.state, tc.toWant) }
})
}
The table is one-step-at-a-time. For multi-step sequences, see the next section.
Tables for Sequence Tests¶
When a single test case involves a sequence of operations, embed the sequence as a slice inside the row:
type step struct {
op string
arg int
wantOK bool
}
cases := []struct {
name string
steps []step
wantFinal int
}{
{
name: "simple_increment",
steps: []step{
{"add", 1, true},
{"add", 1, true},
{"add", 1, true},
},
wantFinal: 3,
},
{
name: "rollback_on_error",
steps: []step{
{"add", 5, true},
{"add", -1000, false}, // rejected
{"add", 1, true},
},
wantFinal: 6, // not 5 + -1000 + 1
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
c := NewCounter()
for i, s := range tc.steps {
got := c.Do(s.op, s.arg)
if got != s.wantOK {
t.Errorf("step %d (%s %d): got OK=%v, want %v", i, s.op, s.arg, got, s.wantOK)
}
}
if c.Value() != tc.wantFinal {
t.Errorf("final = %d, want %d", c.Value(), tc.wantFinal)
}
})
}
This pattern handles arbitrarily long sequences while keeping each case self-contained.
Tables and t.Parallel at Scale¶
For a 500-row table where each row takes 50ms, sequential = 25s. Parallel with 8 cores ≈ 3s.
Considerations:
- Resource contention. 500 parallel goroutines opening DB connections will exhaust a pool. Cap
-parallelbased on resource limits, not just CPU count. - Test isolation. Each parallel row must not write to shared state. Cloning fixtures per row beats sharing.
- Output interleaving.
go test -voutput from parallel subtests can interleave. Add-vonly when debugging; for CI use-jsonand post-process. t.Setenvrules out parallel. If even one row needst.Setenv, that row cannot be parallel — and you have to decide whether to remove parallel from the whole table or split that row out.
Hybrid parallel/sequential¶
for _, tc := range cases {
tc := tc
t.Run(tc.name, func(t *testing.T) {
if !tc.needsEnv {
t.Parallel()
}
if tc.needsEnv {
t.Setenv("FOO", tc.env)
}
...
})
}
The non-env rows run in parallel; the env-rows run sequentially in the post-parallel wave. Works, but adds complexity. Often cleaner to split into two tests.
Property-Based vs Table-Driven¶
Table-driven tests cover examples — specific inputs you wrote down. Property-based tests cover invariants — for all inputs of some shape, this property holds.
// Table
func TestReverse(t *testing.T) {
cases := []struct {
in string
want string
}{
{"", ""},
{"a", "a"},
{"abc", "cba"},
{"Hello, World!", "!dlroW ,olleH"},
}
for _, tc := range cases {
t.Run(tc.in, func(t *testing.T) {
if got := Reverse(tc.in); got != tc.want {
t.Errorf("Reverse(%q) = %q, want %q", tc.in, got, tc.want)
}
})
}
}
// Property (with go-quickcheck or testing.F)
func FuzzReverseInvolution(f *testing.F) {
f.Add("hello")
f.Add("")
f.Fuzz(func(t *testing.T, s string) {
if Reverse(Reverse(s)) != s {
t.Errorf("involution broken: %q", s)
}
})
}
Use both. Table-driven catches the specific edge cases you've thought of (empty string, unicode, palindrome). Property catches the generic ones (involution, length preservation) over thousands of generated inputs.
Composing Tables Across Test Files¶
Sometimes two packages should share test cases — a parser and a formatter that should roundtrip identical inputs.
internal/
testcases/
canonical.go // exports []Canonical
parser/
parser_test.go
formatter/
formatter_test.go
internal/testcases/canonical.go:
package testcases
type Canonical struct {
Name string
Pretty string
Compact string
}
var Cases = []Canonical{
{"simple", "{ a }", "{a}"},
{"nested", "{ a: { b } }", "{a:{b}}"},
}
Both consumers:
import "example.com/proj/internal/testcases"
for _, tc := range testcases.Cases {
t.Run(tc.Name, func(t *testing.T) {
got := parse(tc.Pretty)
if got.String() != tc.Compact { ... }
})
}
This guarantees parser and formatter cannot drift out of sync — adding a case forces both to update if needed.
Table-Driven Benchmarks¶
b.Run mirrors t.Run:
func BenchmarkSplit(b *testing.B) {
cases := []struct {
name string
in string
}{
{"empty", ""},
{"short", "a,b,c"},
{"medium", strings.Repeat("a,", 100)},
{"long", strings.Repeat("a,", 10000)},
}
for _, tc := range cases {
b.Run(tc.name, func(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
_ = strings.Split(tc.in, ",")
}
})
}
}
Each sub-benchmark gets its own b.N calibration. Run with:
Output:
BenchmarkSplit/empty-8 100000000 12 ns/op 0 B/op 0 allocs/op
BenchmarkSplit/short-8 20000000 80 ns/op 64 B/op 1 allocs/op
BenchmarkSplit/medium-8 1000000 1500 ns/op 1600 B/op 1 allocs/op
BenchmarkSplit/long-8 10000 150000 ns/op 160000 B/op 1 allocs/op
You can see scaling immediately. Compare optimizations across all input sizes with one run.
Table-Driven Fuzz Seeds¶
Fuzz tests start with a seed corpus. The seeds are effectively a table:
func FuzzParse(f *testing.F) {
seeds := []string{
"",
"a",
"abc",
"{}",
"{\"key\": \"value\"}",
"[1, 2, 3]",
"{ malformed",
strings.Repeat("a", 1000),
}
for _, s := range seeds {
f.Add(s)
}
f.Fuzz(func(t *testing.T, s string) {
_, _ = Parse(s) // should never panic
})
}
When you run go test -fuzz=FuzzParse, the fuzzer starts from the seeds and mutates. Any input that triggers a failure is saved under testdata/fuzz/FuzzParse/ and becomes a new seed for future runs. The seeds + corpus directory together form a growing table.
BDD-style Frameworks — When and Why Not¶
Some teams adopt goblin, ginkgo, or write a homegrown DSL on top of testing. The case for them:
- Hierarchical
describe/itblocks express setup nesting. - Fluent assertions (
expect(x).to.equal(y)) read like English. - Familiar from RSpec/Jest backgrounds.
The case against:
- Not idiomatic. Almost all Go code (including the stdlib) is table-driven. BDD-style libraries fight the language.
- DSLs hide control flow. A
BeforeEachruns implicitly; new contributors don't see when it fires. - Worse stack traces. Failures point into the DSL's matcher functions, not your test row.
- Tooling drift.
go test -rundoesn't compose cleanly with custom DSLs. IDEs and CI runners assume the stdlib testing model. - Adoption tax. Every new contributor must learn the DSL before reading tests.
The senior view: stay with table-driven + t.Run. The single benefit of BDD (nested setup) is solved cleanly with nested t.Run. The "fluent assertion" benefit is mostly aesthetic and is largely covered by go-cmp diffs.
This is not zealotry — if your team has 200 existing Ginkgo tests, don't migrate just for purity. But for new code, default to table-driven.
Worked Example — Compiler Lexer Tests¶
Compiler frontends have rich test surfaces. Lexers turn source text into tokens. A canonical table:
type token struct {
kind TokenKind
value string
line int
}
func TestLexer(t *testing.T) {
cases := []struct {
name string
source string
want []token
}{
{
name: "empty",
source: "",
want: nil,
},
{
name: "single_int",
source: "42",
want: []token{{IntKind, "42", 1}},
},
{
name: "binary_op",
source: "x + 1",
want: []token{
{IdentKind, "x", 1},
{PlusKind, "+", 1},
{IntKind, "1", 1},
},
},
{
name: "multiline",
source: "x\ny",
want: []token{
{IdentKind, "x", 1},
{NewlineKind, "\n", 1},
{IdentKind, "y", 2},
},
},
{
name: "string_literal_with_escape",
source: `"hi\n"`,
want: []token{{StringKind, "hi\n", 1}},
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
got := Lex(tc.source)
if diff := cmp.Diff(tc.want, got); diff != "" {
t.Errorf("Lex(%q) mismatch (-want +got):\n%s", tc.source, diff)
}
})
}
}
Things to notice:
wantis a slice. The row doesn't fight the data shape.- We use
go-cmpfor the diff — eyeballing token slices is painful. - The source field is short for each case. If sources grew to multi-line YAML or 100-line programs, move them to
testdata/lexer/<name>.inputand read them in:
This is how the Go stdlib's own go/parser tests work.
Worked Example — Distributed System State Transitions¶
For a Raft-style log replication library:
type event struct {
kind string
from int
term int
}
type expectation struct {
leader int // -1 means no leader
term int
committed []string
}
cases := []struct {
name string
nodes int
events []event
final expectation
}{
{
name: "leader_elected",
nodes: 3,
events: []event{
{"timeout", 0, 0},
{"vote", 1, 1},
{"vote", 2, 1},
},
final: expectation{leader: 0, term: 1},
},
{
name: "split_vote_resolved",
nodes: 3,
events: []event{
{"timeout", 0, 0},
{"timeout", 1, 0},
{"vote", 2, 1}, // 2 votes for 1 (or 0?)
// ... rest of the sequence
},
final: expectation{leader: 1, term: 2},
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
cluster := newCluster(tc.nodes)
defer cluster.Close()
for _, e := range tc.events {
cluster.Step(e)
}
if got := cluster.Leader(); got != tc.final.leader {
t.Errorf("leader = %d, want %d", got, tc.final.leader)
}
if got := cluster.Term(); got != tc.final.term {
t.Errorf("term = %d, want %d", got, tc.final.term)
}
})
}
The event sequence inside each row is what gives the table-driven pattern its power for stateful systems: each case is self-describing, the cluster is built fresh per row, and the final assertion is a single struct comparison.
Anti-Patterns at This Level¶
1. The God-Row¶
A single row with 25 fields, half of which are nil/zero per case. The struct is shaped for the union of all rows' needs. Split into multiple tests.
2. The Hidden Pre-condition¶
for _, tc := range cases {
setupGlobalState(tc.config) // mutates a package-level var
t.Run(tc.name, func(t *testing.T) {
result := DoThing()
...
})
}
The subtest body looks pure but actually depends on the loop body's side effects. Move setup inside t.Run or accept it as a documented sequential test.
3. The Magic want Function¶
cases := []struct{
name string
in int
wantFn func(int) int
}{
{"double", 5, func(n int) int { return n * 2 }},
{"square", 5, func(n int) int { return n * n }},
}
// ...
if got := f(tc.in); got != tc.wantFn(tc.in) { ... }
The "want" is now a function the reader has to mentally evaluate. Hardcode the result.
4. The 1500-Line Row Literal¶
When a single row has a 200-line literal want (a multi-page JSON structure), move it to a golden file.
5. Renaming Cases on Cleanup¶
If you rename tc.name from "empty_string" to "empty" to keep things short, you've broken anyone who had -run TestX/empty_string in their notes. Stable names are a contract. Rename only with intent.
Decision Checklist¶
Before adding a table-driven test, ask:
- Do all cases share the same assertion shape? If no, separate tests.
- Will future contributors add cases? If yes, table makes that one-line.
- Does any case need fundamentally different setup? If yes, that case doesn't belong.
- Will the table grow past 100 rows? If yes, plan for data files.
- Is each case independent of the others? If no, you have a sequence test (different pattern).
- Can the cases run in parallel? If yes, plan for
t.Parallelfrom the start. - Do I need golden files? Decide before writing the first case.
What to Read Next¶
- Professional — at-scale governance, data formats, CI.
- Optimize — measuring per-row overhead and reducing it.
- Specification — exact semantics of
t.Run,-run, Go 1.22 scope change.
Self-Check¶
- When should you nest
t.Runcalls vs flatten withname=fmt.Sprintf("%s/%s", a, b)? - What makes a row struct "right-shaped"? Give three properties.
- What's the difference between a table test and a property-based test? When do you use each?
- How do you share test cases across two packages without copy-pasting?
- What is a god-row, and how do you refactor away from one?
Deep Dive — Tables for Concurrency Tests¶
Concurrency tests are notoriously hard to make repeatable. Tables help by encoding the schedule rather than relying on timing.
Pattern: the row carries a sequence of events that synchronize via channels:
type event struct {
actor int // which goroutine performs the action
action string // "send", "recv", "close", "wait"
value int // optional payload
}
cases := []struct {
name string
events []event
want map[int][]int // per-goroutine observed values
}{
{
name: "single_send_recv",
events: []event{
{actor: 0, action: "send", value: 42},
{actor: 1, action: "recv"},
},
want: map[int][]int{1: {42}},
},
{
name: "two_senders_one_receiver",
events: []event{
{actor: 0, action: "send", value: 1},
{actor: 1, action: "send", value: 2},
{actor: 2, action: "recv"},
{actor: 2, action: "recv"},
},
want: map[int][]int{2: {1, 2}}, // assuming ordered channel
},
}
The test runner schedules events according to the row's plan, then verifies observations. This is more deterministic than naive go funcA(); go funcB(); time.Sleep(...).
Deep Dive — Asserting on Order-Insensitive Output¶
Some functions produce sets, not sequences. Comparing them as slices fails on permutations:
got := f(input) // []string{"b", "a", "c"}
want := []string{"a", "b", "c"}
if !reflect.DeepEqual(got, want) { ... } // fails: same set, different order
Three strategies:
- Sort both sides before comparison:
gotCopy := slices.Clone(got)
sort.Strings(gotCopy)
sort.Strings(want)
if !slices.Equal(gotCopy, want) { ... }
- Use
cmpopts.SortSlices:
opt := cmpopts.SortSlices(func(a, b string) bool { return a < b })
if diff := cmp.Diff(want, got, opt); diff != "" { ... }
- Compare as multisets: count occurrences, compare counts.
Pick (2) for production tests — it preserves the original slice and produces a clean diff.
Deep Dive — Asserting on Error Wrappings¶
Modern Go errors are wrapped chains: fmt.Errorf("upstream: %w", innerErr). Tests need to assert on the kind of error, not on the wrapping path.
cases := []struct {
name string
in Input
wantErr error
}{
{"valid", Input{}, nil},
{"db_unreachable", Input{...}, ErrDB},
{"validation", Input{Bad: true}, ErrValidation},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
err := f(tc.in)
if !errors.Is(err, tc.wantErr) {
t.Errorf("err = %v, want wrapped %v", err, tc.wantErr)
}
})
}
If the test surface includes structured error fields (*ValidationError), use errors.As:
var ve *ValidationError
if !errors.As(err, &ve) { t.Fatalf("not a ValidationError: %v", err) }
if ve.Field != tc.wantField { ... }
A senior should never assert on err.Error() substrings except as a last resort — message text is documentation, not a contract.
Deep Dive — Tables That Test Equivalence Classes¶
Boundary testing and equivalence-class partitioning are formal QA techniques that map cleanly to tables.
For a function Grade(score int) string that returns "F", "D", "C", "B", "A":
type gradeCase struct {
name string
score int
want string
}
// Equivalence classes: F (0-59), D (60-69), C (70-79), B (80-89), A (90-100)
// Boundaries: -1, 0, 59, 60, 69, 70, 79, 80, 89, 90, 100, 101
cases := []gradeCase{
{"below_zero", -1, "invalid"},
{"zero", 0, "F"},
{"top_of_F", 59, "F"},
{"bottom_of_D", 60, "D"},
{"top_of_D", 69, "D"},
{"bottom_of_C", 70, "C"},
{"top_of_C", 79, "C"},
{"bottom_of_B", 80, "B"},
{"top_of_B", 89, "B"},
{"bottom_of_A", 90, "A"},
{"max", 100, "A"},
{"above_max", 101, "invalid"},
{"middle_F", 25, "F"},
{"middle_A", 95, "A"},
}
The table includes both boundaries (each transition point) and representative cases from each class. This is a complete behavioral spec in 14 rows.
Deep Dive — Combining Tables with Snapshots¶
For complex outputs (HTML, large JSON, generated SQL), each row produces output that's hard to embed inline. Combine table-driven tests with snapshot files:
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
got := Render(tc.in)
golden := filepath.Join("testdata", "TestRender", tc.name+".golden")
if *update { os.WriteFile(golden, got, 0644); return }
want, err := os.ReadFile(golden)
if err != nil { t.Fatal(err) }
if !bytes.Equal(got, want) {
t.Errorf("Render(%s):\n--- want ---\n%s\n--- got ---\n%s", tc.name, want, got)
}
})
}
The table now holds only inputs; outputs live on disk where humans can read them comfortably. Intentional output changes show up as ordinary diffs in git review.
Deep Dive — Avoid Tables When Tests Are Truly Different¶
A senior should resist the urge to table everything. If you find yourself writing:
cases := []struct {
name string
kind string // "parse" or "validate" or "render"
...
}{
{"parse_simple", "parse", ...},
{"validate_empty", "validate", ...},
{"render_html", "render", ...},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
switch tc.kind {
case "parse": /* parse-specific logic */
case "validate": /* validate-specific logic */
case "render": /* render-specific logic */
}
})
}
You don't have a table — you have three tests squished into one. The switch is the smell. Split:
func TestParse(t *testing.T) { /* table of parse cases */ }
func TestValidate(t *testing.T) { /* table of validate cases */ }
func TestRender(t *testing.T) { /* table of render cases */ }
Each focused, each scannable, each can have its own setup.
A Sketch of Test-Suite Architecture for a Compiler¶
Suppose you're building a Go-like language. Your test architecture might look like:
internal/testcases/ # shared canonical inputs
program_samples.go
expression_samples.go
internal/lex/
lex_test.go # table of (source → tokens)
fuzz_test.go # fuzz seeds drawn from samples
internal/parse/
parse_test.go # table of (source → AST)
error_recovery_test.go # table of (broken source → recovered AST + errors)
internal/typecheck/
typecheck_test.go # table of (AST → type info)
matrix_test.go # all (declared type) × (used as) combos
internal/codegen/
codegen_test.go # table with golden file outputs
roundtrip_test.go # parse → codegen → parse, asserts AST equality
cmd/compile/
end_to_end_test.go # whole-program tests, smaller table, slower
Each test file is a focused table. Shared inputs live in internal/testcases. End-to-end tests are kept small because they're slow; unit tables are kept large because they're fast.
This is roughly how cmd/compile in the Go repo itself is organized.
When Tables Hurt Readability¶
A 30-row table where every row sets the same fields except one is noisier than 30 individual tests for that one field's variation. Counter to intuition.
// Hard to scan — what's varying?
cases := []struct {
name string
inputUser *User
inputOrders []*Order
inputItems []*Item
config Config
want Result
}{
{"a", baseUser, baseOrders, baseItems, Config{Mode: "fast"}, baseResult},
{"b", baseUser, baseOrders, baseItems, Config{Mode: "safe"}, altResult},
{"c", baseUser, baseOrders, baseItems, Config{Mode: "audit"}, auditResult},
// ... 27 more, each varying only Mode
}
This is a table of just one variable (Mode) padded with redundant fixtures. Cleaner:
cases := []struct {
name string
mode string
want Result
}{
{"fast", "fast", baseResult},
{"safe", "safe", altResult},
{"audit", "audit", auditResult},
// ...
}
// Build inputs once outside the loop
input := buildInputs()
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
input.Config = Config{Mode: tc.mode}
got := Process(input)
if got != tc.want { ... }
})
}
The table now shows only what varies. Reviewers see "we test three modes" at a glance.
Code Review Checklist for Table-Driven Tests¶
When reviewing a PR that adds a table:
- Are case names unique and descriptive?
- Is there at least one negative case (
wantErr)? - Are boundary values covered?
- Does the failure message include the inputs?
- If parallel: are rows independent?
- If
time.Nowor random data is used: is it deterministic? - If golden files: is
-updateopt-in and default-false? - Are helpers marked
t.Helper? - Is there shared mutable state across rows?
- Does the struct have any field used by only a tiny minority of rows?
These ten checks catch ~90% of the issues you'll see in real PRs.
Architecting Tables for Long-Lived Test Suites¶
A test suite that lives ten years accumulates patterns. Some that survive:
Pattern — One table, one concern¶
Every Test* function has exactly one table. The table tests exactly one behavior. No "general test of feature X" tables with 50 unrelated rows.
// Good — focused
func TestParseInt_Valid(t *testing.T) { /* table of valid inputs */ }
func TestParseInt_Invalid(t *testing.T) { /* table of invalid inputs */ }
func TestParseInt_Overflow(t *testing.T) { /* table of overflow cases */ }
// Bad — diffuse
func TestParseInt(t *testing.T) { /* 80 rows mixing valid/invalid/overflow */ }
The split makes failures easier to triage (go test -run TestParseInt_Overflow) and lets each table evolve independently.
Pattern — Test data per package, helpers per repo¶
Each package's testdata/ directory holds only that package's test data. Cross-package helpers live in internal/testutil or internal/testcases. This avoids "where does my fixture live" debates.
Pattern — Generated test code is annotated¶
Any _test.go file that contains generated rows starts with a clear marker:
Reviewers know not to hand-edit. The generation command is also in //go:generate so anyone can regenerate.
Pattern — Public assertion helpers are typed¶
Instead of taking any, helpers accept the specific type:
// Good
func assertOrderEqual(t *testing.T, got, want Order) { ... }
// Worse
func assertEqual(t *testing.T, got, want any) { ... }
The typed version catches misuse at compile time and produces better failure messages.
A Senior's Heuristics for Table Length¶
- < 10 rows: anything goes. Table is fine; separate functions are fine.
- 10–50 rows: table almost always wins.
- 50–200 rows: table, but consider grouping with nested
t.Runor splitting into 2–3 functions by category. - 200–1000 rows: data file (JSON/YAML/CSV), schema validation, name-uniqueness test.
- 1000+ rows: generation script under
//go:generate, sharded across multiple test functions, may need CI parallelism per shard.
These thresholds are rough but useful. Don't let "we have one table" become "we have one 5000-line table".
Tables and Mutation Testing¶
Mutation testing tools (go-mutesting, mutmut-go) introduce small changes to your source — flipping < to <=, swapping + to - — and run your tests. A test suite that catches all mutations is mutation-killing.
Tables are naturally good at mutation testing because:
- Lots of edge cases catch off-by-one mutations.
- Boundary values catch
<vs<=flips. - Negative cases catch sign-flips.
If you run mutation testing and see survivors, the surviving mutations point at gaps in your table. Add a row to kill each survivor.
This is a senior-level practice — not many teams do it, but the ones that do have remarkably defect-free code.
Tables and Code Review Discipline¶
When a contributor adds a new feature, the PR should include table rows for:
- The happy path.
- Each new edge case the feature introduces.
- Each error condition the feature can produce.
A senior reviewer asks: "what's the negative case for this change?" If the answer is "I don't know" or "no test", request rows for the failure modes.
This is the discipline that separates production-grade Go code from hobby code. The table-driven idiom makes the discipline cheap to follow — adding a row is one line — so there's no excuse.
When Senior-Level Judgment Means Breaking the Rules¶
You've now learned the canonical patterns. Senior judgment is knowing when to violate them.
- A 5-line test for a one-off bug doesn't need a table.
- A test for a behavior that only fires once in the codebase doesn't need three negative cases.
- A 200-line setup function might be cleaner than threading 12 fields through every row.
Rules exist to compress decision-making. When a rule says "no" but your context says "yes", trust your context — and explain in a code comment why this test is shaped the way it is.
// This test is intentionally a single function, not table-driven:
// the setup involves spinning up a real Postgres container, and
// expressing per-row setup as a table field would obscure the
// actual test logic. See ADR-2024-15.
func TestPostgresDriver(t *testing.T) {
...
}
A comment like that turns a "violation" into documented intent.
A Senior Worked Example — Schema Validator With Migration¶
Suppose you're testing a schema validator that supports versioning. The validator takes JSON, declares its schema version, and validates against the appropriate rules.
type ValidatorCase struct {
name string
schemaVer int
input string
wantErr bool
wantField string // for typed errors
wantCode string
}
func TestValidator(t *testing.T) {
cases := []ValidatorCase{
// v1 schema cases
{"v1/valid_simple", 1, `{"name":"Ada"}`, false, "", ""},
{"v1/valid_full", 1, `{"name":"Ada","age":30}`, false, "", ""},
{"v1/missing_name", 1, `{"age":30}`, true, "name", "required"},
{"v1/unknown_field", 1, `{"name":"Ada","xyz":1}`, true, "xyz", "unknown"},
// v2 schema cases (introduced "email" as required, made "age" optional)
{"v2/valid_simple", 2, `{"name":"Ada","email":"a@b.co"}`, false, "", ""},
{"v2/valid_no_age", 2, `{"name":"Ada","email":"a@b.co"}`, false, "", ""},
{"v2/missing_email", 2, `{"name":"Ada"}`, true, "email", "required"},
// v3 schema cases (introduced "roles" array)
{"v3/valid_one_role", 3, `{"name":"Ada","email":"a@b.co","roles":["admin"]}`, false, "", ""},
{"v3/empty_roles", 3, `{"name":"Ada","email":"a@b.co","roles":[]}`, true, "roles", "min_length"},
// Cross-version cases
{"unknown_version", 99, `{"name":"Ada"}`, true, "_version", "unknown"},
{"zero_version", 0, `{"name":"Ada"}`, true, "_version", "unknown"},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
err := Validate(tc.schemaVer, []byte(tc.input))
if (err != nil) != tc.wantErr {
t.Fatalf("err = %v, wantErr = %v", err, tc.wantErr)
}
if !tc.wantErr { return }
var ve *ValidationError
if !errors.As(err, &ve) {
t.Fatalf("want *ValidationError, got %T (%v)", err, err)
}
if ve.Field != tc.wantField {
t.Errorf("field = %q, want %q", ve.Field, tc.wantField)
}
if ve.Code != tc.wantCode {
t.Errorf("code = %q, want %q", ve.Code, tc.wantCode)
}
})
}
}
Observations:
- Prefixed names (
v1/,v2/,v3/, plus aCross-versionblock) make subtest filtering trivial:-run TestValidator/v2runs just v2 cases. - Typed errors via
errors.Aslet us assert onFieldandCodeindependently. The error message text is not tested — that's documentation, not contract. - Cross-version cases test the validator's behavior when given a version it doesn't know.
When v4 ships, you add a new prefix group. The structure stays clean.
A Senior Worked Example — Distributed Lock Tests¶
Testing a distributed lock that can be acquired, held, released, and that expires.
type lockEvent struct {
actor int // which client
action string // "acquire", "release", "expire", "renew"
timeline time.Duration // when (relative to test start)
wantOK bool
}
type lockCase struct {
name string
events []lockEvent
}
cases := []lockCase{
{
name: "single_acquire_release",
events: []lockEvent{
{actor: 0, action: "acquire", timeline: 0, wantOK: true},
{actor: 0, action: "release", timeline: 100 * time.Millisecond, wantOK: true},
},
},
{
name: "contention_one_winner",
events: []lockEvent{
{actor: 0, action: "acquire", timeline: 0, wantOK: true},
{actor: 1, action: "acquire", timeline: 10 * time.Millisecond, wantOK: false},
{actor: 0, action: "release", timeline: 100 * time.Millisecond, wantOK: true},
{actor: 1, action: "acquire", timeline: 110 * time.Millisecond, wantOK: true},
},
},
{
name: "expiration_releases_lock",
events: []lockEvent{
{actor: 0, action: "acquire", timeline: 0, wantOK: true},
// no explicit release; lock expires
{actor: 1, action: "acquire", timeline: 2 * time.Second, wantOK: true},
},
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
clock := newFakeClock()
lock := NewLock(clock, time.Second)
clients := make(map[int]Client)
for _, ev := range tc.events {
clock.Advance(ev.timeline) // jump to event's timeline
c, ok := clients[ev.actor]
if !ok {
c = lock.Client()
clients[ev.actor] = c
}
var ok2 bool
switch ev.action {
case "acquire": ok2 = c.Acquire()
case "release": ok2 = c.Release()
case "renew": ok2 = c.Renew()
}
if ok2 != ev.wantOK {
t.Errorf("event %s by actor %d at %v: ok = %v, want %v",
ev.action, ev.actor, ev.timeline, ok2, ev.wantOK)
}
}
})
}
This is the "encode the schedule in the row" pattern from earlier, applied to a real distributed-systems test. The clock is fake (controllable), so the test is deterministic. The events describe what should happen and when, not the implementation.
Adding a new case is straightforward: list the events, name them, done.