Parallel Tests — Senior¶
This page is for engineers who have to make architecture-level decisions about test concurrency: designing fixtures that scale across hundreds of parallel tests, deciding when to share state and when to clone it, migrating a legacy serial suite without breaking it, debugging hard-to-reproduce flakes, and wiring parallel-test discipline into a team's workflow.
1. The mental model: tests are concurrent programs¶
A serial test suite is a script. A parallel test suite is a small concurrent program — with goroutines (one per t.Parallel test), synchronisation (the scheduler's parallel queue), shared resources (DB pools, ports), and a shared address space. Every design decision in the suite is a concurrency-design decision in miniature.
Senior engineers learn to read a _test.go file the same way they read a server's main.go: looking for shared mutable state, missing synchronisation, leaked goroutines, and resource limits. The skills transfer in both directions.
2. Fixture taxonomy¶
There are five classes of fixture in a real test suite. Each interacts with t.Parallel differently.
| Class | Example | Sharing strategy | t.Parallel |
|---|---|---|---|
| Immutable | Compiled regex, parsed schema | Shared via sync.Once | Yes |
| Per-test ephemeral | t.TempDir, fresh struct | Built per test | Yes |
| Per-test heavy | Postgres schema | Pooled, recycled | Yes |
| Process-global | Logger, metrics registry | Singleton; tests treat as constant | Yes |
| Process-mutable | os.Setenv, os.Chdir | Forbid t.Parallel | No |
The art of designing a fast parallel suite is moving as much as possible up the table — turning per-test fixtures into immutable shared ones, turning pooled ones into immutable ones, turning forbidden ones into per-test ephemeral ones (e.g., replace env-var-based config with explicit struct fields).
3. Immutable shared fixtures¶
Some fixtures are expensive to build but cheap to share once built. Examples: parsing a 10 MB JSON schema, compiling a complex regex, loading a static lookup table, building a deterministic random number stream.
The idiomatic pattern is sync.Once:
var (
schemaOnce sync.Once
schema *Schema
schemaErr error
)
func loadSchema(t *testing.T) *Schema {
t.Helper()
schemaOnce.Do(func() {
b, err := os.ReadFile("testdata/schema.json")
if err != nil {
schemaErr = err
return
}
schema, schemaErr = parseSchema(b)
})
if schemaErr != nil {
t.Fatalf("loadSchema: %v", schemaErr)
}
return schema
}
Properties:
- First parallel test to call
loadSchematriggers the parse; others wait. - After the parse, every test reads the same
*Schemawith zero overhead. - The fixture is implicitly immutable; mutating it in one test would race siblings.
Document the immutability invariant in a comment. If the type has mutation methods (schema.Add(...)), wrap it in a read-only accessor before sharing.
4. Per-test ephemeral fixtures¶
These are the workhorse of parallel suites. Each call returns a fresh value with t.Cleanup registered:
func newCache(t *testing.T) *cache.Cache {
t.Helper()
c := cache.New()
t.Cleanup(c.Close)
return c
}
func TestCache(t *testing.T) {
t.Parallel()
c := newCache(t)
c.Set("k", "v")
if got, _ := c.Get("k"); got != "v" {
t.Errorf("got %q", got)
}
}
Two design rules:
- The helper takes
*testing.Tso it can register cleanup and callt.Fatalon setup errors. - The helper returns a value, never stores it in a package-level var.
5. Pooled heavy fixtures¶
Postgres schemas, Redis databases, Kafka topics. These are expensive enough that per-test creation is too slow but stateful enough that they can't be shared without isolation.
Two approaches:
A. Namespacing. Each test owns a unique namespace (schema, prefix, topic). The database itself is shared but the namespace is per-test.
func newSchema(t *testing.T, db *sql.DB) string {
t.Helper()
name := "t_" + sanitize(t.Name()) + "_" + randHex(4)
mustExec(t, db, "CREATE SCHEMA "+name)
t.Cleanup(func() {
mustExec(t, db, "DROP SCHEMA "+name+" CASCADE")
})
return name
}
func TestUserStore(t *testing.T) {
t.Parallel()
schema := newSchema(t, sharedDB)
runMigrations(t, sharedDB, schema)
// use sharedDB scoped to schema
}
B. Pooling. A buffered channel hands out connections; tests return them on cleanup. Bounded budget, mandatory cleanup.
var dbPool chan *sql.DB
func acquireDB(t *testing.T) *sql.DB {
t.Helper()
db := <-dbPool
t.Cleanup(func() {
truncateAll(t, db)
dbPool <- db
})
return db
}
Pooling is faster but requires perfect cleanup discipline (no orphaned data between tests). Namespacing is slower but bulletproof: each test sees an empty schema.
In practice large suites combine the two: pool the connections, namespace the schemas.
6. Designing for parallel-safe assertions¶
A parallel test that asserts on shared state needs careful design. Example: a worker pool that processes jobs.
type Pool struct {
in chan Job
out chan Result
}
func (p *Pool) Submit(j Job) { p.in <- j }
func (p *Pool) Results() <-chan Result { return p.out }
Naive parallel test:
func TestPool(t *testing.T) {
p := newPool(4)
for i := 0; i < 10; i++ {
i := i
t.Run(fmt.Sprint(i), func(t *testing.T) {
t.Parallel()
p.Submit(Job{ID: i})
r := <-p.Results() // PROBLEM
if r.ID != i {
t.Errorf("got %d, want %d", r.ID, i)
}
})
}
}
The problem: p.Results() is a shared channel. Subtests receive results out-of-order, and each gets a result that may belong to a different subtest's submitted job. The test is wrong on shared state, not on parallelism.
Fix: collect all results in the parent, dispatch back per-job.
func TestPool(t *testing.T) {
p := newPool(4)
results := make(map[int]Result)
var mu sync.Mutex
// Single collector goroutine.
done := make(chan struct{})
go func() {
defer close(done)
for r := range p.Results() {
mu.Lock()
results[r.ID] = r
mu.Unlock()
}
}()
t.Run("group", func(t *testing.T) {
for i := 0; i < 10; i++ {
i := i
t.Run(fmt.Sprint(i), func(t *testing.T) {
t.Parallel()
p.Submit(Job{ID: i})
// Wait until our result arrives.
for {
mu.Lock()
r, ok := results[i]
mu.Unlock()
if ok {
if r.Status != "done" {
t.Errorf("job %d: %s", i, r.Status)
}
return
}
select {
case <-t.Context().Done():
t.Fatal("timeout")
return
case <-time.After(10 * time.Millisecond):
}
}
})
}
})
close(p.in) // shut down the pool, drain results
<-done
}
The senior takeaway: when the production code is concurrent, the test cannot assume FIFO ordering even when subtests look serial. Build a collector and dispatch.
7. Goroutine leak detection at scale¶
In a 200-test package, every goroutine that escapes a test compounds. Two-thousand stray goroutines later, the test process is fighting itself. Use goleak aggressively:
package mypkg_test
import (
"testing"
"go.uber.org/goleak"
)
func TestMain(m *testing.M) {
goleak.VerifyTestMain(m,
goleak.IgnoreTopFunction("github.com/some/lib.backgroundWorker"),
)
}
IgnoreTopFunction whitelists goroutines you know are intentionally long-lived (a metrics flusher, a connection-pool keepalive). Treat the whitelist as a deliberate decision, not a junk drawer.
For per-test leak detection (catches leaks earlier, with better attribution):
func TestThing(t *testing.T) {
t.Parallel()
t.Cleanup(func() { goleak.VerifyNone(t) })
// test code
}
A leak detected per-test points at the exact test that introduced it, which is invaluable in CI.
8. The race detector as a CI policy¶
Two-team practice on -race:
Team A: "We run -race nightly. Failures are P2."
Team B: "We run -race on every PR. Failures block merge."
Team B catches races within hours of the offending commit, when the author still remembers the code. Team A often takes weeks. The cost is CI minutes; the benefit is faster bug closure and a healthier main.
When a race report lands:
- Don't dismiss it. A race that the detector caught is real. Production behaviour may differ in scheduling, but the memory-model violation exists.
- Don't paper over with mutex everywhere. Often the right fix is architectural — eliminating the shared state, or making the shared state read-only post-init.
- Don't assume the test is wrong. Sometimes the test is exercising a real race in production code, just rarely. Fix the production code.
9. Migrating a legacy serial suite¶
A common project: take a 10-year-old Go codebase where no test calls t.Parallel and turn it parallel-by-default. The wrong way is to add t.Parallel to everything in one commit. The right way is incremental:
Step 1: Baseline. Run go test -race -count=5 ./... and record every failure. Some are real races already, hidden by serial execution; fix them first.
Step 2: Triage by package. Pick a leaf package (no test dependencies). Add t.Parallel to all its tests. Run with -race -count=10. Fix what breaks.
Step 3: Move up the dependency graph. Each new package potentially shares fixtures with its dependencies. The leaf-first approach minimises blast radius.
Step 4: Tackle the suites that use env vars. These will resist parallelism. Refactor the production code: replace os.Getenv with Config.Get, and pass Config through constructors.
Step 5: Tackle the os.Chdir users. Replace with absolute paths or t.Chdir (Go 1.24+), which forbids parallel.
Step 6: Enable a linter that requires t.Parallel for new tests. Quarantine remaining non-parallel tests with a documented reason.
Plan 1–2 weeks for a medium repo (50K test lines). Expect a 3–10x reduction in CI wall-time.
10. Debugging a parallel-only flake¶
A test passes when run alone (go test -run=TestX), but fails 1 in 50 in the full suite. Classic parallel flake. Diagnostic protocol:
Step 1: Reproduce. go test -count=100 -parallel 16 -run=TestX ./pkg. If you can't reproduce, increase -count to 1000.
Step 2: Add -race. Often surfaces the underlying race.
Step 3: Bisect parallelism. -parallel 8, -parallel 4, -parallel 2, -parallel 1. The level at which the flake disappears tells you it's a concurrency bug.
Step 4: Inspect shared state. Audit the test for package-level vars, env vars, file paths. Audit the production code for global registries.
Step 5: Use GODEBUG=schedtrace=1000 to see the scheduler making decisions every second. Sometimes reveals a goroutine deadlocked on a mutex.
Step 6: goleak to rule out a previous test's leaked goroutine corrupting state.
Step 7: Cleanup ordering. A t.Cleanup may run while a sibling parallel test is still active. Check who owns the resource being cleaned up.
Step 8: Time-based assertions. Any time.Sleep(N) or time.After(N) in production code is suspect. Under heavy parallel load, goroutines may be scheduled later than expected.
The vast majority of parallel-test flakes are one of: shared state, env-var leakage, time-based assumptions, or goroutine leaks. The above protocol catches all four.
11. Cleanup ordering with parallel subtests¶
t.Cleanup order across parallel subtests is more subtle than the LIFO docs suggest:
func TestX(t *testing.T) {
t.Cleanup(func() { t.Log("parent C") })
t.Run("a", func(t *testing.T) {
t.Parallel()
t.Cleanup(func() { t.Log("a C") })
})
t.Run("b", func(t *testing.T) {
t.Parallel()
t.Cleanup(func() { t.Log("b C") })
})
}
Order:
a Candb Crun when each child finishes (concurrent goroutines, non-deterministic order between them).parent Cruns after both children finish.
If parent C depends on data only valid while both children are still alive, you have a bug — the parent cleanup runs after the children's data is gone. The fix is to scope the cleanup to the child that owns the resource.
A subtle case: a fixture acquired by the parent, used by parallel children, released by the parent's cleanup. That works, because the parent's cleanup runs after the children. But if a child stores a pointer to the fixture in some external place that another test reads later, that's a use-after-free.
12. Resource budgets¶
A production-grade parallel suite declares resource budgets up front:
-parallel 32 with 16 DB connections means at most 16 tests touching the DB at once; the other 16 use only in-memory state.
Audit the budgets against external limits:
- Postgres default
max_connectionsis 100. A test pool of 16 leaves headroom for the dev environment. - File descriptors:
ulimit -non Linux is typically 1024. A test that opens 50 files at once cannot run at-parallel 50. - ephemeral ports:
~/proc/sys/net/ipv4/ip_local_port_rangetypically gives ~28K ports; not usually a constraint, but TIME_WAIT can exhaust them under churn.
When a budget is violated, the test fails with cryptic errors (pq: too many connections, too many open files, bind: address already in use). The cure is sizing the budget down, not increasing ulimit blindly.
13. Designing testable code for parallelism¶
The most senior move is to make the production code easier to test in parallel. Heuristics:
- No package-level mutable state except where necessary (e.g.,
initregistrations). Replace with constructor-returned values. - No
init-driven side effects (registering metrics with a global default registry). Use explicit registration. - Read config from a struct, not the env.
s := NewServer(Config{Port: 8080})beatsos.Getenv("PORT"). - Time as a dependency.
clock Clockinjected, mockable in tests, real in production. Eliminates the need fortime.Sleepin tests. - Don't hold the working directory. Use absolute paths and a base-dir parameter.
A codebase that follows these is trivially parallel-testable. A codebase that doesn't requires teaching every test author to navigate the minefield.
14. Layered test packages¶
For repos with internal and external test packages (mypkg vs mypkg_test), parallelism rules apply per file, but the design implications differ:
- Internal tests (
package mypkg): can touch unexported state, including unexported package vars. More care needed — easier to mutate shared state accidentally. - External tests (
package mypkg_test): see only the exported API. Less risk of accidental sharing, but you need the API to be testable (constructors that return values, not singletons).
Idiomatic guidance: prefer external test packages for everything; fall back to internal only when truly necessary (testing an unexported helper). The parallel-safety story is cleaner.
15. Subtests and resource scoping¶
When parallel subtests share a resource that the parent acquired, the parent's t.Cleanup runs after all subtests — so the resource lives long enough. But if a subtest also wants to register cleanup for the same resource, you need a lock on the resource or a counter:
func TestSharedDB(t *testing.T) {
db := acquireDB(t) // parent owns
t.Run("a", func(t *testing.T) {
t.Parallel()
_ = db
})
t.Run("b", func(t *testing.T) {
t.Parallel()
_ = db
})
}
Both subtests see the same db. The parent's acquireDB cleanup runs after both finish. No extra locking needed for read-only use. If both subtests mutate db, the production code's locking must cover it (or the test is wrong).
16. When -parallel 1 is the right call¶
Some packages genuinely cannot run in parallel. Signal:
- The package configures process-global state (
signal.Notify,runtime.GOMAXPROCSadjustments). - The package executes subprocesses with
os/execand depends on the working directory. - The package is a test of
initsemantics, which must run alone.
In these cases, the build tag pattern works:
Or simply don't call t.Parallel. Document the reason inline. The CI runs these packages serially by virtue of the absent t.Parallel; the rest of the suite stays parallel.
17. A debugging vignette¶
A team reports: "the test suite passes locally but flakes 3% of the time in CI". Investigation:
- The author runs
go test -race -count=100 -parallel 16 ./...locally. No failures. - CI uses
-parallel 32andGOMAXPROCS=2. Locally the author hasGOMAXPROCS=16. - The flake is a
time.Sleep(100 * time.Millisecond)in test code, waiting for a goroutine to update a value. Under heavy contention, 100 ms isn't enough. - Fix: replace the sleep with a channel-based synchronisation. The test now reliably waits exactly until the work is done.
Two lessons:
- Local timing assumptions break under CI scheduling pressure.
time.Sleepis almost always a bug in test code. Use channels, contexts, or polling witht.Context().
18. Test ordering after t.Parallel¶
Tests execute in source order until they call t.Parallel. After pausing, parallel tests are batched and resumed after all serial tests in the same level have finished.
func TestA(t *testing.T) {
t.Parallel()
t.Log("A")
}
func TestB(t *testing.T) {
t.Log("B")
}
func TestC(t *testing.T) {
t.Parallel()
t.Log("C")
}
Schedule:
TestAstarts, pauses ont.Parallel.TestBstarts and runs serially: logs "B".TestCstarts, pauses ont.Parallel.- After all serial tests done,
TestAandTestCrun concurrently.
The print order is therefore: "B" first; "A" and "C" interleave. Don't write assertions that depend on log order in parallel tests.
19. The convergence of test design and production design¶
The patterns that make a parallel test suite fast and reliable are the same patterns that make a production system scalable and maintainable:
- Dependency injection.
- Immutable shared state.
- Bounded resource pools.
- Explicit cancellation via context.
- No global env-driven config.
A senior engineer treats the test suite as a design pressure on the production code. If a test wants to be parallel and can't, that's a signal the production code has a shared-state problem. Fix it; both sides benefit.
20. Summary¶
- Tests are concurrent programs; design them with the same discipline.
- Fixture taxonomy: immutable, per-test ephemeral, per-test heavy (pooled or namespaced), process-global, process-mutable.
- Use
sync.Oncefor immutable shared fixtures,t.Cleanupfor per-test, pools or namespacing for heavy state. - Goroutine leaks compound across parallel tests; use
goleakaggressively. -raceon every PR catches concurrency bugs while authors remember the code.- Legacy suites migrate to parallel-by-default in 1–2 weeks, leaf-package-first.
- Debug parallel flakes via reproduction, parallelism bisection,
-race, andgoleak. - The same design rules apply to production code: no globals, time as dependency, config in structs, no
os.Chdir.
21. Designing a fixture API¶
Mature test suites have a small, principled set of fixture-creation helpers. Each helper:
- Takes
*testing.T(or a stricter interface liketesting.TB). - Returns a fully-initialised value.
- Registers
t.Cleanupso the test author doesn't have to. - Calls
t.Helper()so error lines point at the caller. - Is safe to call from a parallel test.
A reference set for a real service:
func newConfig(t testing.TB) *Config {
t.Helper()
return &Config{
Host: "127.0.0.1",
Port: 0, // OS-picked
}
}
func newDB(t testing.TB) *sql.DB {
t.Helper()
db := acquireFromPool(t)
// Schema migrations are idempotent and run once per pool.
return db
}
func newServer(t testing.TB, cfg *Config, db *sql.DB) *httptest.Server {
t.Helper()
srv := httptest.NewServer(buildHandler(cfg, db))
t.Cleanup(srv.Close)
return srv
}
func newClient(t testing.TB, srv *httptest.Server) *Client {
t.Helper()
return NewClient(srv.URL, srv.Client())
}
func newSuite(t testing.TB) *Suite {
t.Helper()
cfg := newConfig(t)
db := newDB(t)
srv := newServer(t, cfg, db)
cli := newClient(t, srv)
return &Suite{Cfg: cfg, DB: db, Server: srv, Client: cli}
}
A test then reads:
func TestUserFlow(t *testing.T) {
t.Parallel()
s := newSuite(t)
user, err := s.Client.CreateUser("alice")
if err != nil {
t.Fatal(err)
}
if user.Name != "alice" {
t.Errorf("got %q", user.Name)
}
}
Five lines of setup, every one of them parallel-safe and cleanup-registered. The fixture API is the test code's most valuable asset.
22. Testing concurrent code with parallel tests¶
When the system under test is itself concurrent (a worker pool, a queue, a connection manager), the test needs to:
- Exercise the concurrency (multiple goroutines, ordered events).
- Synchronise the assertion (wait for the system to reach a known state).
- Cleanup deterministically (cancel goroutines, drain channels).
A worked example: a rate limiter.
type Limiter struct {
tokens chan struct{}
}
func NewLimiter(n int) *Limiter {
l := &Limiter{tokens: make(chan struct{}, n)}
for i := 0; i < n; i++ {
l.tokens <- struct{}{}
}
return l
}
func (l *Limiter) Acquire(ctx context.Context) error {
select {
case <-l.tokens:
return nil
case <-ctx.Done():
return ctx.Err()
}
}
func (l *Limiter) Release() {
l.tokens <- struct{}{}
}
Test:
func TestLimiterParallel(t *testing.T) {
t.Parallel()
l := NewLimiter(3)
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
defer cancel()
var (
wg sync.WaitGroup
active atomic.Int64
maxObserved atomic.Int64
)
for i := 0; i < 20; i++ {
wg.Add(1)
go func() {
defer wg.Done()
if err := l.Acquire(ctx); err != nil {
t.Errorf("acquire: %v", err)
return
}
n := active.Add(1)
for {
cur := maxObserved.Load()
if n <= cur || maxObserved.CompareAndSwap(cur, n) {
break
}
}
time.Sleep(5 * time.Millisecond)
active.Add(-1)
l.Release()
}()
}
wg.Wait()
if got := maxObserved.Load(); got > 3 {
t.Errorf("max active = %d, want <=3", got)
}
}
Key points:
- The whole test is parallel-marked.
- Twenty internal goroutines exercise the rate limiter.
- A context with timeout prevents the test from hanging forever.
atomicoperations track concurrent state without races.- The assertion is on the invariant (max concurrent ≤3), not on a specific schedule.
This is the right shape for testing a concurrent component: exercise from many goroutines, observe via atomics, assert on the invariant.
23. Test code as production design pressure¶
When a test wants to be parallel and can't easily be, that's design feedback for the production code. Common signals:
- "Can't easily test in parallel because the production code uses a singleton." → Refactor to use dependency injection.
- "Can't easily test in parallel because the production code reads env vars at runtime." → Refactor to take config as a struct.
- "Can't easily test in parallel because the production code spawns goroutines without a cancellation mechanism." → Refactor to accept a context.Context.
- "Can't easily test in parallel because the production code mutates the working directory." → Refactor to take paths as parameters.
Senior engineers treat these signals as architectural input. The result is production code that's easier to test, easier to deploy, easier to debug — and incidentally faster-to-test.
24. Integration vs unit boundaries¶
A common confusion: should integration tests use t.Parallel? The answer depends on the integration target:
| Integration with | t.Parallel? | Notes |
|---|---|---|
| Local Postgres | yes | Use namespacing or pooling. |
| Local Redis | yes | Each test picks a separate logical DB. |
| Local Kafka | yes | Each test gets a unique topic prefix. |
| Shared cloud service (e.g., S3 in a shared account) | maybe | Need namespacing + budget. |
| Third-party API with rate limits | no | Rate limiting forces serial. |
| External hardware | no | Single device. |
Parallel integration tests are entirely possible — and worthwhile — when the integration target supports concurrent access. The key is namespacing: each test acts in a slice of the resource that won't collide with sibling tests.
25. Layered parallelism for monorepos¶
A 50-package Go monorepo benefits from layered parallelism:
- Package level:
go test -p 8 ./...runs 8 test binaries simultaneously. - Test level: each binary runs
t.Paralleltests up to-parallel. - Subtest level: each test may have parallel subtests.
At peak, this can mean hundreds of *testing.T goroutines alive at once. The runtime handles it fine, but external resources (DB, file descriptors, ports) may not.
To prevent system-level resource exhaustion:
- Each package declares its own resource budget via a config file.
TestMainreads the budget and configures pools accordingly.- CI loads the budgets and adjusts
-pand-parallelper package.
26. The "test-local globals" anti-pattern¶
A common smell in legacy code:
var testServer *httptest.Server
func TestMain(m *testing.M) {
testServer = httptest.NewServer(handler)
code := m.Run()
testServer.Close()
os.Exit(code)
}
func TestFoo(t *testing.T) {
t.Parallel()
http.Get(testServer.URL)
}
This works — testServer is set before any test runs, so concurrent reads are safe. But it hides a fragility: if any test ever mutates testServer (replaces it, reconfigures it), all sibling tests race.
The cleaner pattern is a per-test server, even if it costs a few milliseconds to construct:
func TestFoo(t *testing.T) {
t.Parallel()
srv := httptest.NewServer(handler)
t.Cleanup(srv.Close)
http.Get(srv.URL)
}
The cost is small; the isolation guarantee is total. Save the shared-server pattern for cases where the construction is genuinely expensive (real DB connection pool, large in-memory dataset).
27. Refactoring a 50-line test into a parallel-friendly fixture¶
Before:
func TestEverything(t *testing.T) {
db, err := sql.Open("postgres", "...")
if err != nil {
t.Fatal(err)
}
defer db.Close()
if _, err := db.Exec("CREATE TABLE ..."); err != nil {
t.Fatal(err)
}
defer db.Exec("DROP TABLE ...")
srv := httptest.NewServer(handler(db))
defer srv.Close()
// 30 lines of assertions...
}
After:
func TestEverything(t *testing.T) {
t.Parallel()
s := newSuite(t)
// 30 lines of assertions using s.DB, s.Server, s.Client...
}
The 30 lines of assertions are unchanged. Everything else hoists into the fixture API. Now:
- Adding
t.Parallelis mechanical. - New tests reuse the suite helper.
- Cleanup is automatic via
t.Cleanupinside the helper. - The test reads top-to-bottom as business logic, not setup.
This refactor is the most impactful single change you can make to a legacy test suite.
28. Race-condition diagnosis: a case study¶
Real scenario: a Kubernetes-style scheduler test flakes 0.5% of the time in CI. The test:
func TestScheduler(t *testing.T) {
t.Parallel()
s := newScheduler()
s.Add(Pod{ID: "a"})
s.Add(Pod{ID: "b"})
go s.Run()
time.Sleep(100 * time.Millisecond)
if s.Scheduled() != 2 {
t.Errorf("got %d scheduled", s.Scheduled())
}
}
Steps to diagnose:
- Run
go test -race -count=200. Race detector reportss.Scheduled()racing withs.Run(). - The bug is in production code:
Scheduled()reads a counter thatRun()writes without synchronisation. - Fix production code (atomic counter).
- Replace
time.Sleep(100ms)with a deterministic wait:<-s.Done.
Both bugs found because the test was parallel and -race was on. Fixed in production code. Test now stable.
The senior takeaway: a CI race report is a real production bug, even if the test happened to expose it.
29. Designing for testable concurrency¶
Production concurrent code should expose hooks that tests can synchronise on:
type Scheduler struct {
pods chan Pod
done chan struct{}
sched atomic.Int64
}
func (s *Scheduler) Run() {
for p := range s.pods {
schedule(p)
s.sched.Add(1)
}
close(s.done)
}
func (s *Scheduler) Done() <-chan struct{} { return s.done }
func (s *Scheduler) Scheduled() int64 { return s.sched.Load() }
Tests use <-s.Done() to wait for completion. No time.Sleep. No flakes. The same production code is correct under production loads.
This is the trick: production code that exposes <-Done channels and atomic counters is both production-grade and test-friendly. Tests don't need a special API; they use the same one.
30. Migrations and parallel tests¶
For database-backed services, schema migrations interact with parallel tests:
- Per-test schema (namespacing): each test runs migrations into its own schema. Slow but bulletproof.
- Per-pool schema (pooled): migrations run once per pooled connection. Faster but assumes idempotent tests.
- Per-process schema (TestMain): migrations run once in
TestMain. Fastest but all tests share state.
For most projects, per-process schema with transaction-rollback per test is the sweet spot:
func TestMain(m *testing.M) {
db = openDB()
runMigrations(db)
code := m.Run()
os.Exit(code)
}
func withTx(t *testing.T, fn func(*sql.Tx)) {
t.Helper()
tx, err := db.Begin()
if err != nil {
t.Fatal(err)
}
t.Cleanup(func() { tx.Rollback() })
fn(tx)
}
Each parallel test gets its own transaction, sees its own writes, and rolls back at cleanup. Sibling tests are isolated.
31. Detecting flaky tests across runs¶
Flakiness is hard to detect in a single run; you need historical data. Two practical approaches:
A. CI test result analytics. Tools like Buildkite Test Analytics or custom scripts on go test -json track per-test pass/fail across runs. After 100 runs, any test with <100% pass rate is flagged.
B. Local stress testing. Before merging, run go test -count=200 -parallel 16 -run=TestNew ./pkg. If it ever fails, the test is unreliable.
A flaky test is a defect, regardless of who wrote it. Quarantine first, fix root cause within a sprint.
32. Test parallelism in CI pipelines¶
CI parallelism stacks on top of test parallelism:
- CI shard 1: runs
go test ./pkg-a/...with-parallel 16. - CI shard 2: runs
go test ./pkg-b/...with-parallel 16. - ...
If your repo has 8 packages and 2 CI shards, each shard handles 4 packages. Within each package, 16 parallel tests run at peak. Wall-clock is determined by the slowest shard.
Optimise by balancing the shards: assign packages so each shard takes roughly the same wall time. Tools like gotestsum --hide-summary=skipped can output per-package timings to inform the balancing.
33. Resource budgets in detail¶
A monolithic resource budget hidden in test code is hard to manage. Better: declare in code.
const (
MaxDBConnections = 8
MaxHTTPClients = 32
MaxBackgroundWorkers = 4
PerTestTempDirQuotaMB = 100
)
Each parallel test budgets against these constants. When violated, the pool blocks (for DB connections) or the test fails (for disk quota). Code review catches budget bumps.
A team that has explicit budgets is far less likely to ship a "death by 1000 cuts" PR that quietly raises CI memory by 5% per week.
34. Cross-language coordination¶
When Go tests integrate with other languages (Python ML pipelines, JavaScript browsers via Playwright), the parallel model breaks down: each cross-language call has overhead far greater than t.Parallel saves.
For these:
- Mark cross-language tests with a build tag (
//go:build integration). - Run them serially in a dedicated CI job.
- Keep the rest of the Go suite parallel and fast.
Don't try to make every cross-language test parallel; the constants of cross-process communication dwarf the benefit.
35. The testing.TB interface¶
For helpers that should work in both tests and benchmarks, use the testing.TB interface:
func newServer(tb testing.TB) *httptest.Server {
tb.Helper()
srv := httptest.NewServer(handler)
tb.Cleanup(srv.Close)
return srv
}
testing.TB is the common ancestor of *T and *B. The helper works in:
func TestThing(t *testing.T) { srv := newServer(t); ... }
func BenchmarkThing(b *testing.B) { srv := newServer(b); ... }
testing.TB doesn't expose t.Parallel (because b.Parallel is different). Helpers can do everything except mark the test as parallel; the test itself must do that.
36. When -shuffle matters¶
go test -shuffle=on randomises test order. Most tests pass regardless, but some have hidden dependencies on order:
var counter int
func TestSetup(t *testing.T) {
counter = 1
}
func TestUse(t *testing.T) {
if counter != 1 {
t.Fatal("counter not set")
}
}
Without shuffle, this passes. With shuffle, TestUse may run first and fail. The bug is the implicit dependency on test order — exactly what t.Parallel would also break.
Run go test -shuffle=on -count=10 periodically; any failures expose hidden ordering assumptions. Fix by removing the assumption.
37. Long-running parallel tests¶
Tests that take seconds, not milliseconds, deserve special consideration:
- They block the parallel queue for sibling tests on the same
-parallelbudget. - They consume CI minutes proportionally.
- They are usually integration tests.
Mitigations:
- Put them behind
//go:build integrationbuild tag. - Or behind
testing.Short():if testing.Short() { t.Skip("slow test") }. - Or in a separate test package with its own CI job.
The goal: PR CI completes in 5 minutes regardless of how many slow tests exist. Slow tests get their own pipeline.
38. The Go runtime and the parallel-test scheduler¶
A subtle interaction: the Go runtime's goroutine scheduler is preemptive (since Go 1.14). A tight CPU loop in one parallel test won't starve siblings. Earlier versions could.
But: the runtime's GC pauses (very brief in modern Go) apply globally. A -race binary may pause more often. For tight wall-time budgets, profile and tune.
39. Test-data management¶
Large test datasets (megabytes of fixtures) interact with parallelism:
- Read-only fixtures are safe to share. Load once via
sync.Once. - Write-once fixtures (e.g., a generated database snapshot) can be cached on disk under
testdata/and loaded once. - Per-test datasets should be generated cheaply (via factories) so each parallel test has its own.
The testdata/ directory is the Go convention for test fixtures. Subdirectories per test name keep things organized:
mypkg/
├── core.go
├── core_test.go
└── testdata/
├── TestParse/
│ ├── input.json
│ └── expected.json
└── TestEncode/
└── input.txt
Helpers read from testdata/<TestName>/...:
func loadFixture(t *testing.T, name string) []byte {
t.Helper()
path := filepath.Join("testdata", t.Name(), name)
data, err := os.ReadFile(path)
if err != nil {
t.Fatal(err)
}
return data
}
Parallel-safe: read-only access to disk.
40. Stress testing parallel code¶
For libraries that exercise concurrency primitives (mutexes, channels, atomics), stress tests amplify rare race conditions:
func TestStress(t *testing.T) {
if testing.Short() {
t.Skip()
}
t.Parallel()
const N = 100000
var counter atomic.Int64
var wg sync.WaitGroup
for i := 0; i < runtime.GOMAXPROCS(0); i++ {
wg.Add(1)
go func() {
defer wg.Done()
for j := 0; j < N; j++ {
counter.Add(1)
}
}()
}
wg.Wait()
expected := int64(N * runtime.GOMAXPROCS(0))
if got := counter.Load(); got != expected {
t.Errorf("got %d, want %d", got, expected)
}
}
Stress tests:
- Take seconds, not milliseconds. Skip under
-short. - Run with
-race -count=10to surface rare races. - Exercise the same primitives many times to maximise the chance of hitting a bug.
For library authors writing primitives, stress tests are non-negotiable.
41. Senior summary¶
- Fixture APIs collapse setup boilerplate into one-line helpers.
- Production code's concurrency should expose hooks (channels, atomics) tests synchronise on, instead of
time.Sleep. - Migrate legacy suites leaf-package-first, with per-package PRs.
-racereports are real production bugs; treat them accordingly.- Resource budgets per package prevent CI death-by-thousand-cuts.
- Long-running tests live behind build tags or
testing.Short(). - Stress tests exercise primitives at scale to catch rare races.
- Test order assumptions are bugs;
-shuffle=oncatches them.
41a. Architecting for parallelism: the boundary discipline¶
The principle "if I have N parallel tests, the union of their side effects must be permutation-invariant" gives a checklist for production code:
- No process-global mutable state that tests will exercise.
- No
initblocks that touch the world (network, files, env). - No singleton with mutation methods.
- No flag-driven behavior beyond what's set before
m.Run. - No reliance on file descriptors at fixed paths.
- Every async operation accepts a cancellation context.
A codebase that scores high on this checklist tests parallel trivially. A codebase that scores low requires either refactoring or a tortured test layer. Refactoring is almost always the better investment.
41b. Fixtures as a domain language¶
The fixture API is the test layer's domain language. It should read as if it were business logic:
func TestPromotion(t *testing.T) {
t.Parallel()
user := newUser(t)
user.AddCredits(100)
promo := newPromotion(t, "SAVE10")
user.Apply(promo)
if user.Credits() != 110 {
t.Errorf("got %d", user.Credits())
}
}
newUser and newPromotion are domain fixtures that hide DB queries, ID generation, and cleanup. The test is readable to a product manager. The parallel safety is hidden inside the fixtures.
Designing this layer is a project. Once done, every new test inherits the discipline.
The professional page extends these architectural patterns into team policy, CI configuration, and flake budgets.
42. A two-week parallel-test migration plan¶
For a tech lead taking over a legacy Go codebase:
Week 1: discovery and quick wins.
- Day 1: read every test file in one leaf package. Categorise as "trivially parallel", "needs work", or "intentionally serial".
- Day 2: PR adding
t.Parallelto the trivial ones. Run-race -count=20locally. Measure wall-time before/after. - Day 3: identify the top 5 production-code issues blocking parallelism (env vars, globals, etc.).
- Day 4-5: PR a refactor for one of them; pair with a teammate.
Week 2: scale up and institutionalise.
- Day 6: replicate the leaf-package pattern on 3 more packages.
- Day 7: write
TESTING.md. Get a teammate to review it. - Day 8: enable a linter in warning mode for new
Test*functions withoutt.Parallel. - Day 9: wire
-raceinto CI as a required check. - Day 10: announce in team standup. Demo CI wall-time improvement.
The visible result: CI is faster, tests are more reliable, the team has a documented norm. Subsequent weeks chip away at remaining packages.
43. Common architectural objections and responses¶
"We need to keep tests serial for debugging."
Use -parallel 1 in your local debug workflow. The same code runs both ways; you don't need to remove t.Parallel to debug.
"Adding t.Parallel is a lot of churn."
The churn is one line per test. Mechanical to add, mechanical to review. The benefit is sustained for the life of the project.
"We've never had a race condition in production."
You may have, but didn't notice. Races often show up as occasional inconsistencies, "weird logs", or rare incidents. Adding t.Parallel and -race surfaces them.
"We don't have CI minutes to run -race."
-race is 5-10x slower. The cost is real. Quantify it: if your unit suite is 30 seconds, -race is 3-5 minutes. Compare to engineer time saved by catching races early. Almost always worth it.
"Tests already pass; why change?"
Tests passing on serial schedule prove serial correctness. Production runs concurrently. The test suite should match production conditions to catch bugs that production will hit.
44. The senior's role in test culture¶
A senior engineer's value in test culture isn't writing more tests. It's:
- Modelling good patterns (every new PR has parallel-safe tests).
- Reviewing junior PRs and explaining the patterns.
- Refactoring production code so its tests can be parallel.
- Investing in tooling (linters, dashboards, fixtures).
- Documenting the norms (TESTING.md, code-review checklists).
The cultural artefact outlasts any single engineer's tenure. A team with strong test culture continues to ship reliably even as members rotate.
45. Closing reflection¶
t.Parallel is a one-line API call. Behind it lies a model of concurrent program correctness, a set of architectural rules, an entire philosophy of test design. Senior engineers internalise the model; junior engineers learn by writing many tests; tech leads enforce the norms.
The reward is a test suite that runs in seconds, catches concurrency bugs deterministically, and serves as design pressure on production code. Worth the investment.
45a. Pattern catalogue¶
For quick reference, the named patterns introduced on this page:
- Fixture helper — function taking
*testing.T, returning a value, registeringt.Cleanup. - Connection pool — buffered channel handing out scarce resources, returned on cleanup.
- Schema namespacing — each test gets a unique DB schema; cleanup drops it.
- Transaction rollback — each test wraps work in a transaction, cleanup rolls back.
- Sync.Once shared fixture — expensive but immutable; built once, shared.
- Group-then-fan-out — wrap parallel subtests in
t.Run("group", ...)for sync. - Channel-based wait — replace
time.Sleepwith<-done. - Context-rooted cancellation —
t.Context()cancels all derived work at test end. goleakper-test — detect leaks at the test that introduced them.- Stress harness — exercise primitives at scale with
-countand many goroutines.
Each pattern composes with the others. A typical complex test uses 3-5 of them.
45b. The senior's reading list¶
To deepen understanding beyond this page:
src/testing/testing.goandsrc/testing/run.goin the Go source.- "The Go Memory Model" (
go.dev/ref/mem) for understanding race detector reports. - Kubernetes'
pkg/scheduler/internal/queuetests for real fixture design. - The
goleakREADME for ignore patterns. - Russ Cox's blog posts on Go testing philosophy.
A serious engineer reads at least the first two over a weekend; the rest becomes useful when specific problems arise.
45c. Anti-pattern catalogue (senior tier)¶
Subtler than the bugs on the find-bug page; these are architecture-level:
- The "test base class": simulating xUnit-style fixture inheritance in Go. Goroutines and method dispatch don't work like that; you get hard-to-debug interactions. Use composition (helpers returning values), not inheritance.
- The mega-
TestMain: 500 lines of setup. Refactor into smaller per-package helpers;TestMainshould be 20 lines max. - The hidden integration test: marked as a unit test but reaches into a real service. Move to a tagged integration test.
- The fragile golden file: regenerated on every run; passing because the test wrote what it then read. Make golden files static and version-controlled.
- The cleanup chain: nested
t.Cleanupregistering moret.Cleanupregistering more. A sign of mis-layered fixtures. - The state-leaking fixture: a "fresh" DB that still has rows from a previous test's last-minute write. Add a verification step.
Recognising these is what separates a senior engineer from a mid-level. Pattern matching builds with experience.
46. Appendix: a fixture-design checklist¶
When designing a new fixture helper, run through this list:
- Does it take
*testing.T(ortesting.TB)? If not, it can't callt.Cleanuport.Helper. - Does it call
t.Helper()as the first line? - Does it register all cleanups via
t.Cleanup, notdeferinside the helper? - Is the returned value safe to use from a parallel test?
- If the helper acquires a shared resource (pool, file lock), does cleanup release it?
- Does it have a
t.Fatalpath for setup failures, with a descriptive message? - Is it idempotent — calling it twice is safe?
- Does it document its parallel-safety properties in a comment?
A fixture that satisfies all eight is robust. One that misses any is a future flake.
47. Appendix: parallel-test anti-patterns¶
These show up in real codebases. Recognise and refactor:
- Implicit serial dependency: TestB assumes TestA ran first.
- Shared mutable test variable:
var dataat package level, written from tests. time.Sleepfor synchronisation: ban these.deferin parallel tests for resource cleanup: prefert.Cleanup.os.Setenvdirect (not viat.Setenv): breaks isolation.os.Chdirdirect: same.- Bare
httptest.Serverwithout cleanup: leaks listeners. - Goroutines without exit mechanism: leaks goroutines.
- Hardcoded ports: collides at scale.
- Test fixtures in
/tmpwithoutt.TempDir: races on path.
A _test.go file containing none of the above is a healthy file.
48. Appendix: per-package parallel-test budget template¶
// budget.go (inside the test files of a package)
package mypkg_test
const (
// ParallelLimit is the recommended -parallel value for this package.
// The package's tests use up to MaxDBConnections DB connections;
// running more parallel tests would exceed the budget.
ParallelLimit = 16
// MaxDBConnections is the size of the test DB connection pool.
// Postgres test instance is configured with max_connections=100;
// 16 leaves headroom for other tests running concurrently in CI.
MaxDBConnections = 16
// MaxBackgroundWorkers caps the number of goroutines started by tests
// in this package. Above this, the goleak verification becomes flaky
// because the runtime takes too long to collect.
MaxBackgroundWorkers = 32
)
The constants are documentation, even when not enforced programmatically. Reviewers know the budget; deviations require justification.
49. Appendix: layering integration tests¶
A typical pattern for separating fast unit tests from slow integration tests:
//go:build integration
// +build integration
package mypkg_test
// This file is compiled only with -tags=integration.
CI runs:
The unit suite runs on every PR; the integration suite runs nightly or on a separate, slower CI job. Both can be parallel within their domain.
50. Appendix: monitoring suite health metrics¶
Track these over time:
| Metric | Target | Action if drifting |
|---|---|---|
| Wall time (unit suite) | < 5 min | Profile, optimise top-20 slow tests |
| Wall time (race job) | < 15 min | Lower -parallel, optimise memory |
| Flake rate | < 3% in 7 days | Quarantine flaky tests, fix root cause |
| Goleak findings | 0 | Treat each as a P1 bug |
| Race detector findings | 0 | P1 bug; do not merge |
| Test count growth | 5-10%/quarter | Healthy growth; verify no orphans |
Dashboards keep the team honest. Without metrics, suite quality decays.
51. Appendix: the testing package's evolution¶
Brief timeline of testing package features relevant to parallelism:
- Go 1.5:
t.Parallel,t.Run, subtests. - Go 1.7:
t.Runfor subtests becomes idiomatic. - Go 1.14:
t.Cleanup. - Go 1.17:
t.Setenv. - Go 1.22: for-loop variable scope change eliminates the loop-capture bug.
- Go 1.24:
t.Chdir,t.Context,b.Loopfor benchmarks.
Each version's release notes are worth reading for testing-specific changes. The trajectory has been steadily toward more test-author ergonomics and stricter parallel-safety guarantees.
52. Appendix: industry references¶
Codebases worth reading for parallel-test patterns:
- Go standard library (
encoding/json,net/http,sync): canonical. - Kubernetes (
pkg/scheduler,pkg/controller): real-world fixture design at scale. - etcd (
server/etcdserver): testing distributed consensus. - gRPC-Go (
internal/balancer): testing complex async code. - Prometheus (
tsdb/wlog): testing storage engines with concurrent writers.
Read the test files alongside the production code; the design discipline is visible in both.
53. Appendix: when to break the rules¶
Every rule on this page has exceptions. A senior engineer recognises when:
- A test genuinely cannot be parallel and the reason is documented.
- The fixture pattern is overkill for a 3-line test.
- The race detector is reporting a known-benign race (rare, but possible with
unsafe). - A
time.Sleepis acceptable in a benchmark exploring scheduling behavior. - Production code's globals are intentional and the tests work around them.
The rules are defaults. Treat them as the starting point; deviate with justification.
54. Appendix: troubleshooting matrix¶
Symptom → likely cause → first action:
| Symptom | Likely cause | First action |
|---|---|---|
| Test passes alone, flakes in suite | Race or shared state | Run -race -count=200 -parallel 16 |
panic: t.Setenv called after t.Parallel | Order of calls | Remove t.Parallel or refactor away from env vars |
WARNING: DATA RACE | Unsynchronised shared access | Read race report, fix shared state |
| Goleak finds N leaked goroutines | Missing cancellation | Add t.Context()-based shutdown |
Test times out with -timeout | Hang in production code | Inspect goroutine dump in failure output |
too many open files | Leaked file descriptors | Audit for missing Close() calls |
bind: address already in use | Hardcoded port | Use :0 or httptest.NewServer |
CI memory exhaustion under -race | Too many parallel tests | Lower -parallel for race job |
| Tests duplicate IDs | Shared counter race | Use atomic.Int64 or scope per-test |
| Subtests log "c" three times | Pre-1.22 loop-var bug | Add tc := tc or upgrade Go |
Print, laminate, stick on your desk.
55. Closing observations from real Go projects¶
After watching parallel-test discipline take hold in many Go projects, a few observations recur:
- Teams that adopt parallel-by-default in year one ship faster than teams that adopt it in year three. The compounding effect of fast CI is large.
- Engineers who write parallel tests routinely produce more thread-safe production code, because the design pressure transfers.
- Test code becomes a readable specification of the production code's contract — "given these inputs and concurrent access, the result is X".
- Refactoring becomes safer: races introduced by a refactor are caught by
-racebefore they ship. - New hires onboard faster when the test suite runs in seconds; they iterate more.
- Code review focus shifts from "did the author run the tests?" to "does the test express the intent?". A more productive conversation.
The discipline pays back, year over year. The investment is measured in hours; the dividend is measured in years of saved engineer time.
56. One last metaphor¶
Think of t.Parallel as the Go test framework asking: "would you be willing to swear an oath that this test never touches the world outside its own goroutine?" Saying yes is cheap (one method call). The framework then takes you at your word and runs the test concurrently with others.
If you swore the oath falsely — touched a shared variable, mutated the environment, leaked a goroutine — the race detector tells the truth. There is no escaping it; the truth surfaces in CI.
Engineers who internalise this contract write better Go. They learn to spot non-isolated code on sight; they refactor production code that "wants" to be testable in parallel; they ship more confidently because their tests have run under realistic concurrent schedules thousands of times.
The framework's small API hides a large discipline. Learn it deeply; it pays back for every Go project you'll ever touch.
57. Bridging to the rest of the testing material¶
This subsection is one of nine in the testing and benchmarking section. Related subsections that build on this material:
- Benchmarking basics:
b.RunParallelandb.SetParallelismuse a different model fromt.Parallel. The benchmark page explains why and shows the patterns. - Fuzzing: fuzz targets call
t.Parallelto share fuzz inputs across goroutines. Same scheduling rules apply. - Integration testing: the build-tag separation pattern shown here is elaborated for integration tests.
- Test doubles and mocks: parallel tests amplify the need for per-test mock instances rather than shared singletons.
Read in any order. The parallel-tests material is a foundation: every other subsection assumes you can read a parallel test and predict its behaviour.
58. Final word¶
You now have 1400+ lines of context on a one-line API call. That is the right ratio. t.Parallel() is small; the discipline behind it is not.
The professional page extends these architectural patterns into team policy, CI configuration, and flake budgets.