Concurrent Fuzzing — Optimisation¶

Making the fuzzer fast. Reducing per-iteration cost, tuning -fuzztime, sharding across CI, managing corpora at scale, and keeping -race overhead in check.

Table of Contents¶

Why fuzzer speed matters
Measuring iteration rate
Reducing per-iteration cost
Allocation reduction
Goroutine overhead in the harness
Reducing -race overhead
Tuning -fuzztime and -fuzzminimizetime
Sharding across CI
Corpus management at scale
Seed selection
Sample optimisation case study
Summary

Why fuzzer speed matters¶

A fuzzer that runs at 100,000 iterations/sec is ~100× more likely to find a rare bug in a fixed time window than one that runs at 1,000 iter/sec. The cost of optimising a fuzz target pays off in every subsequent run, every developer's local fuzz session, and every CI invocation.

Faster iteration rate means:

Bugs find each other faster.
CI budgets can be smaller for the same coverage.
Developers will actually run fuzz tests locally because they finish in seconds.

The goal is not to make the fuzz target "fast" in the sense of complete-test-suite optimisation. It is to make each iteration cheap enough that mutation can explore a million inputs in minutes.

Measuring iteration rate¶

The fuzzer prints its progress periodically:

fuzz: elapsed: 30s, execs: 312345 (10412/sec), new interesting: 27 (total: 53)

execs/sec is the per-process iteration rate. With -parallel=8, the total throughput is roughly 8× this number.

Baseline: what is "fast enough"?¶

100,000+ iter/sec per worker: ideal. Pure Go, no allocations, no goroutines.
10,000–100,000 iter/sec: good. Light parsing, some allocations.
1,000–10,000 iter/sec: acceptable for concurrent harnesses with goroutines.
< 1,000 iter/sec: investigate. Likely heavy allocation, network, or mutex contention.

With -race, divide by 5–10.

Profiling a fuzz target¶

go test does not expose -cpuprofile cleanly during fuzzing because of the multi-process model. Instead, profile a single-input replay of a representative seed:

func TestProfileFuzzBody(t *testing.T) {
    f, _ := os.Create("cpu.prof")
    defer f.Close()
    pprof.StartCPUProfile(f)
    defer pprof.StopCPUProfile()
    raw := []byte("...") // a representative input
    for i := 0; i < 10000; i++ {
        fuzzBody(raw) // same code your f.Fuzz callback runs
    }
}

Then go tool pprof cpu.prof. The hot spots are the same hot spots the fuzzer faces.

Reducing per-iteration cost¶

The most common per-iteration cost categories:

Allocation.
Goroutine spawn.
sync.Pool misuse.
string([]byte) conversion.
fmt.Sprintf in hot paths.
Logging (anything writing to stderr).

Allocation¶

Each make, each append past capacity, each string([]byte), each interface boxing, costs. The fuzzer runs each cost a million times. Minimise.

// BAD: allocates per iteration
f.Fuzz(func(t *testing.T, data []byte) {
    s := string(data)
    parts := strings.Split(s, ",")
    _ = parts
})

// BETTER: avoid string conversion
f.Fuzz(func(t *testing.T, data []byte) {
    parsePartsBytes(data) // operate on []byte directly
})

Goroutine spawn¶

A fuzz harness that spawns 4 goroutines per iteration may pay 1–4 microseconds per iteration just on goroutine creation. For a target that should run at 100k/sec, that is the whole budget.

Two strategies:

Pre-spawned worker pool. Construct N worker goroutines once. Each iteration sends a small task through a buffered channel. Workers run until the fuzz session ends.
Reduce goroutine count per iteration. 4 goroutines find most concurrent bugs; 32 are usually overkill.

Worker-pool pattern¶

var workerPool struct {
    once sync.Once
    in   chan func()
    done chan struct{}
}

func ensureWorkerPool(n int) {
    workerPool.once.Do(func() {
        workerPool.in = make(chan func(), n)
        for i := 0; i < n; i++ {
            go func() {
                for fn := range workerPool.in {
                    fn()
                    workerPool.done <- struct{}{}
                }
            }()
        }
    })
}

f.Fuzz(func(t *testing.T, data []byte) {
    ensureWorkerPool(4)
    ops := decode(data)
    for _, op := range ops {
        op := op
        workerPool.in <- func() { runOp(op) }
    }
    for range ops {
        <-workerPool.done
    }
})

Trade-off: the pool persists state across iterations. You must ensure each iteration constructs the SUT fresh and never relies on the pool's state.

String / byte conversions¶

Go's string([]byte) and []byte(string) allocate. The compiler optimises some cases (e.g. m[string(b)] for map lookup), but most cases allocate. Audit hot loops.

Allocation reduction¶

A fuzz body that allocates 10 bytes per iteration at 100,000 iter/sec is 1 MB/sec of garbage. The GC keeps up, but the fuzzer slows. Aim for zero allocations per iteration where feasible.

Use `sync.Pool` carefully¶

var bufferPool = sync.Pool{
    New: func() any { return make([]byte, 0, 256) },
}

f.Fuzz(func(t *testing.T, data []byte) {
    buf := bufferPool.Get().([]byte)
    defer func() {
        bufferPool.Put(buf[:0])
    }()
    _ = doSomething(buf, data)
})

Caveat: sync.Pool is not free. For very short-lived buffers, on-stack allocation may be faster. Benchmark.

Reuse slices¶

Within an iteration, reuse a single slice for all operation lists:

var ops []op // declared at package level; reset each iteration
f.Fuzz(func(t *testing.T, data []byte) {
    ops = ops[:0]
    ops = decodeOpsInto(data, ops)
    // ...
})

This is one of the few exceptions to "no cross-iteration state." The slice is capacity-only shared; the length is reset.

Avoid `fmt.Sprintf` in fuzz bodies¶

fmt.Sprintf allocates and parses format strings. In a fuzz body that runs a million times per minute, this is wasted CPU. Use strconv.AppendInt, bytes.Buffer, or a hand-rolled formatter.

Goroutine overhead in the harness¶

Spawning a goroutine costs ~1 microsecond on modern hardware. Joining via sync.WaitGroup costs another. For a fuzz harness that spawns 4 goroutines per iteration, the floor is ~5 microseconds, or 200,000 iter/sec — before any actual work.

Strategies¶

Pre-spawned worker pool (see above).
Single-goroutine harness for sequential property checking. If the SUT is sequential, you do not need goroutines at all.
errgroup.Group for cleaner cancellation but same cost as raw goroutines.
sync.WaitGroup reused. Constructing a new WaitGroup per iteration costs little; reusing one across iterations is fine if you reset before each.

Avoid `time.Sleep` in fuzz harnesses¶

Even a 1 ms Sleep caps iteration rate at 1000/sec — five orders of magnitude below ideal. Use channels, WaitGroup, or pre-known synchronisation, never Sleep for "let things settle."

Reducing `-race` overhead¶

The race detector slows execution 5–15×. For fuzzing, this is usually worth it. But you can keep the overhead manageable.

Strategy 1: Two-phase fuzzing¶

Run two separate fuzz jobs:

-fuzz=FuzzXxx -fuzztime=5m (no -race). High iteration rate, finds panics and invariant violations across many inputs.
-fuzz=FuzzXxx -fuzztime=5m -race. Lower rate, finds races on whatever inputs the race-build can mutate to.

Both share the persistent corpus, so the no-race job's discoveries seed the race job.

Strategy 2: Shorter iteration budget under `-race`¶

Under -race, each iteration costs more. Reduce per-iteration work:

Spawn fewer goroutines (2–4 instead of 8–16).
Run fewer operations per iteration (16 instead of 32).
Use smaller input bound caps.

The race detector still catches races on smaller inputs; it just runs many more of them.

Strategy 3: Avoid `-race` in tight loops¶

If your fuzz function spawns thousands of goroutines per iteration, even one iteration may take seconds under -race. Cap the loop:

n := int(data[0])
if n > 8 {
    n = 8
}

8 goroutines is enough to find most concurrency bugs; 800 just slows the detector.

Strategy 4: Memory bound checks¶

-race uses ~5–10× more memory than the base build. Workers may OOM. Set process limits and adjust -parallel downward if needed.

Tuning `-fuzztime` and `-fuzzminimizetime`¶

`-fuzztime`¶

Setting	When to use
`10s`	Smoke test before pushing
`30s`	Quick local exploration
`5m`	Reasonable per-PR check
`10m`	Nightly CI per target
`1h`	Pre-release validation
`24h`	Critical security code
(unlimited)	OSS-Fuzz-style continuous fuzzing

For most teams, 10m per nightly target is the sweet spot: long enough to discover meaningful coverage, short enough that the CI matrix completes before morning.

`-fuzzminimizetime`¶

Minimisation budget. The default 60s is reasonable. Increase to 5m if your inputs are large and minimisation matters; decrease to 10s if you do not care about minimal reproducers and want to maximise mutation time.

-fuzzminimizetime=0x disables minimisation entirely — useful when triaging known issues where the input shape is already understood.

Iteration-count vs duration¶

-fuzztime=10000x

Equal-budget across runs. Useful for benchmarking iteration rate across changes. Use 10000x to compare "before and after my fuzz target change."

Sharding across CI¶

A single go test -fuzz runs GOMAXPROCS workers. To use more parallelism, shard at the CI level.

Shard by target¶

One CI job per fuzz target. Easy and effective. Each job runs go test -fuzz=FuzzXxx -fuzztime=10m -race. The matrix runs in parallel.

Shard by seed subset¶

If a single target has thousands of seeds, split them across multiple CI jobs. Each job loads a subset:

func FuzzParseShard(f *testing.F) {
    shard := os.Getenv("FUZZ_SHARD")
    patterns := map[string]string{
        "a": "testdata/samples/[abc]*.bin",
        "b": "testdata/samples/[def]*.bin",
        "c": "testdata/samples/[ghi]*.bin",
    }
    matches, _ := filepath.Glob(patterns[shard])
    for _, p := range matches {
        b, _ := os.ReadFile(p)
        f.Add(b)
    }
    f.Fuzz(/* ... */)
}

Each CI matrix entry sets FUZZ_SHARD=a, b, or c. Discovery is independent per shard, but they share the committed corpus.

Combine corpora at end¶

After each shard runs, archive its generated corpus. A periodic merge job downloads all shards' corpora and uploads a combined corpus as the seed for future runs.

- run: go test -run=^$ -fuzz=^FuzzXxx$ -fuzztime=10m -fuzzcachedir=$HOME/corpus -race
- uses: actions/upload-artifact@v4
  with:
    name: corpus-${{ matrix.shard }}
    path: $HOME/corpus

Corpus management at scale¶

A successful fuzzing program generates large corpora. Manage them deliberately.

Generated corpus¶

Cleared with go clean -fuzzcache. Useful when:
Upgrading Go.
The corpus has grown to gigabytes.
Internal coverage instrumentation has changed.
Per-package, per-target. Use -fuzzcachedir to share across CI runs.

Committed reproducers¶

Stored under <package>/testdata/fuzz/FuzzXxx/.
Each file represents a once-found-failure that the team has fixed.
Periodically audit: if a reproducer covers a code path that no longer exists, delete it.

Curated samples¶

Stored under <package>/testdata/samples/ (your convention).
Real-world inputs to bootstrap the seed corpus.
Refresh periodically. Strip PII and sensitive data.

Size targets¶

testdata/fuzz/FuzzXxx/: typically < 100 entries per target. Each entry should map to a fixed bug.
testdata/samples/: 100–1000 entries. Curated, representative.
$GOCACHE/fuzz/...: bounded by go clean -fuzzcache and CI cache eviction. Can reach gigabytes.

Seed selection¶

The seed corpus is the fuzzer's launchpad. A good seed reaches deep coverage; a bad one wastes mutation budget.

Selection criteria¶

Diversity. Cover as many distinct code paths as possible. Use coverage reports to verify.
Minimality. Smaller seeds mutate faster and produce smaller reproducers.
Realism. Real-world inputs explore code that random bytes would not.
Boundary cases. Empty input, single byte, maximum length, all-zero, all-0xff.

Selecting from a corpus¶

When you have 10,000 candidate seeds but f.Add can hold only a few hundred efficiently, prune:

Run all 10,000 with coverage instrumentation enabled.
Greedy-select: pick the input that covers the most unseen edges; repeat until all edges are covered.
The minimal covering set is your seed corpus.

This is a manual coverage-minimisation pass. The native fuzzer does not provide it; tools like dvyukov/go-fuzz-corpus and OSS-Fuzz infrastructure do.

Seeding from production¶

A pipeline that captures (sanitised) production inputs and feeds them to testdata/samples/ is the gold standard. Each new release has fresh corpora; coverage tracks real usage patterns.

Sample optimisation case study¶

Baseline target¶

func FuzzParseRequest(f *testing.F) {
    f.Add([]byte("GET / HTTP/1.1\r\nHost: x\r\n\r\n"))
    f.Fuzz(func(t *testing.T, data []byte) {
        req, err := ParseRequest(data)
        if err != nil {
            return
        }
        out := req.Format()
        req2, err := ParseRequest([]byte(out))
        if err != nil {
            t.Fatal(err)
        }
        if !reflect.DeepEqual(req, req2) {
            t.Fatal("round-trip")
        }
    })
}

Initial measurement: 3,200 iter/sec under -race. Profile shows:

string([]byte) conversion in out := req.Format() — 22% of time.
reflect.DeepEqual — 14% of time.
Repeated regexp.Compile inside ParseRequest — 19% of time.

Optimisations applied¶

Cache the regex at package level: var headerRE = regexp.MustCompile(...). Eliminates 19%.
Replace reflect.DeepEqual with a custom equality function specialised for the Request type. Saves 12%.
Have Format write to a reusable bytes.Buffer instead of returning a string. Saves 18%.

Result¶

Iteration rate: 9,800 iter/sec under -race. 3× speedup. Same coverage, same invariants, same corpus.

Takeaway¶

Profiling identifies the easy wins. Most fuzz targets have at least one 2–3× speedup available within an hour of work.

Summary¶

Optimising fuzz targets is engineering work with high payoff. The cost is paid once; the benefit accrues forever. Aim for at least 10,000 iter/sec under -race; pursue 100,000+ when feasible. Reduce allocations, use worker pools instead of per-iteration goroutine spawn, cap loop bounds derived from input, and profile representative replays to find the hot spots. Tune -fuzztime to a budget that fits CI windows. Shard fuzz targets across CI jobs for near-linear speedup. Curate seed corpora deliberately. Manage $GOCACHE/fuzz/ with periodic cleans. The discipline pays for itself the first time the fuzzer finds a race in seconds instead of hours.