Concurrent Fuzzing — Optimisation¶
Making the fuzzer fast. Reducing per-iteration cost, tuning
-fuzztime, sharding across CI, managing corpora at scale, and keeping-raceoverhead in check.
Table of Contents¶
- Why fuzzer speed matters
- Measuring iteration rate
- Reducing per-iteration cost
- Allocation reduction
- Goroutine overhead in the harness
- Reducing
-raceoverhead - Tuning
-fuzztimeand-fuzzminimizetime - Sharding across CI
- Corpus management at scale
- Seed selection
- Sample optimisation case study
- Summary
Why fuzzer speed matters¶
A fuzzer that runs at 100,000 iterations/sec is ~100× more likely to find a rare bug in a fixed time window than one that runs at 1,000 iter/sec. The cost of optimising a fuzz target pays off in every subsequent run, every developer's local fuzz session, and every CI invocation.
Faster iteration rate means:
- Bugs find each other faster.
- CI budgets can be smaller for the same coverage.
- Developers will actually run fuzz tests locally because they finish in seconds.
The goal is not to make the fuzz target "fast" in the sense of complete-test-suite optimisation. It is to make each iteration cheap enough that mutation can explore a million inputs in minutes.
Measuring iteration rate¶
The fuzzer prints its progress periodically:
execs/sec is the per-process iteration rate. With -parallel=8, the total throughput is roughly 8× this number.
Baseline: what is "fast enough"?¶
- 100,000+ iter/sec per worker: ideal. Pure Go, no allocations, no goroutines.
- 10,000–100,000 iter/sec: good. Light parsing, some allocations.
- 1,000–10,000 iter/sec: acceptable for concurrent harnesses with goroutines.
- < 1,000 iter/sec: investigate. Likely heavy allocation, network, or mutex contention.
With -race, divide by 5–10.
Profiling a fuzz target¶
go test does not expose -cpuprofile cleanly during fuzzing because of the multi-process model. Instead, profile a single-input replay of a representative seed:
func TestProfileFuzzBody(t *testing.T) {
f, _ := os.Create("cpu.prof")
defer f.Close()
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
raw := []byte("...") // a representative input
for i := 0; i < 10000; i++ {
fuzzBody(raw) // same code your f.Fuzz callback runs
}
}
Then go tool pprof cpu.prof. The hot spots are the same hot spots the fuzzer faces.
Reducing per-iteration cost¶
The most common per-iteration cost categories:
- Allocation.
- Goroutine spawn.
sync.Poolmisuse.string([]byte)conversion.fmt.Sprintfin hot paths.- Logging (anything writing to stderr).
Allocation¶
Each make, each append past capacity, each string([]byte), each interface boxing, costs. The fuzzer runs each cost a million times. Minimise.
// BAD: allocates per iteration
f.Fuzz(func(t *testing.T, data []byte) {
s := string(data)
parts := strings.Split(s, ",")
_ = parts
})
// BETTER: avoid string conversion
f.Fuzz(func(t *testing.T, data []byte) {
parsePartsBytes(data) // operate on []byte directly
})
Goroutine spawn¶
A fuzz harness that spawns 4 goroutines per iteration may pay 1–4 microseconds per iteration just on goroutine creation. For a target that should run at 100k/sec, that is the whole budget.
Two strategies:
- Pre-spawned worker pool. Construct N worker goroutines once. Each iteration sends a small task through a buffered channel. Workers run until the fuzz session ends.
- Reduce goroutine count per iteration. 4 goroutines find most concurrent bugs; 32 are usually overkill.
Worker-pool pattern¶
var workerPool struct {
once sync.Once
in chan func()
done chan struct{}
}
func ensureWorkerPool(n int) {
workerPool.once.Do(func() {
workerPool.in = make(chan func(), n)
for i := 0; i < n; i++ {
go func() {
for fn := range workerPool.in {
fn()
workerPool.done <- struct{}{}
}
}()
}
})
}
f.Fuzz(func(t *testing.T, data []byte) {
ensureWorkerPool(4)
ops := decode(data)
for _, op := range ops {
op := op
workerPool.in <- func() { runOp(op) }
}
for range ops {
<-workerPool.done
}
})
Trade-off: the pool persists state across iterations. You must ensure each iteration constructs the SUT fresh and never relies on the pool's state.
String / byte conversions¶
Go's string([]byte) and []byte(string) allocate. The compiler optimises some cases (e.g. m[string(b)] for map lookup), but most cases allocate. Audit hot loops.
Allocation reduction¶
A fuzz body that allocates 10 bytes per iteration at 100,000 iter/sec is 1 MB/sec of garbage. The GC keeps up, but the fuzzer slows. Aim for zero allocations per iteration where feasible.
Use sync.Pool carefully¶
var bufferPool = sync.Pool{
New: func() any { return make([]byte, 0, 256) },
}
f.Fuzz(func(t *testing.T, data []byte) {
buf := bufferPool.Get().([]byte)
defer func() {
bufferPool.Put(buf[:0])
}()
_ = doSomething(buf, data)
})
Caveat: sync.Pool is not free. For very short-lived buffers, on-stack allocation may be faster. Benchmark.
Reuse slices¶
Within an iteration, reuse a single slice for all operation lists:
var ops []op // declared at package level; reset each iteration
f.Fuzz(func(t *testing.T, data []byte) {
ops = ops[:0]
ops = decodeOpsInto(data, ops)
// ...
})
This is one of the few exceptions to "no cross-iteration state." The slice is capacity-only shared; the length is reset.
Avoid fmt.Sprintf in fuzz bodies¶
fmt.Sprintf allocates and parses format strings. In a fuzz body that runs a million times per minute, this is wasted CPU. Use strconv.AppendInt, bytes.Buffer, or a hand-rolled formatter.
Goroutine overhead in the harness¶
Spawning a goroutine costs ~1 microsecond on modern hardware. Joining via sync.WaitGroup costs another. For a fuzz harness that spawns 4 goroutines per iteration, the floor is ~5 microseconds, or 200,000 iter/sec — before any actual work.
Strategies¶
- Pre-spawned worker pool (see above).
- Single-goroutine harness for sequential property checking. If the SUT is sequential, you do not need goroutines at all.
errgroup.Groupfor cleaner cancellation but same cost as raw goroutines.sync.WaitGroupreused. Constructing a newWaitGroupper iteration costs little; reusing one across iterations is fine if you reset before each.
Avoid time.Sleep in fuzz harnesses¶
Even a 1 ms Sleep caps iteration rate at 1000/sec — five orders of magnitude below ideal. Use channels, WaitGroup, or pre-known synchronisation, never Sleep for "let things settle."
Reducing -race overhead¶
The race detector slows execution 5–15×. For fuzzing, this is usually worth it. But you can keep the overhead manageable.
Strategy 1: Two-phase fuzzing¶
Run two separate fuzz jobs:
-fuzz=FuzzXxx -fuzztime=5m(no-race). High iteration rate, finds panics and invariant violations across many inputs.-fuzz=FuzzXxx -fuzztime=5m -race. Lower rate, finds races on whatever inputs the race-build can mutate to.
Both share the persistent corpus, so the no-race job's discoveries seed the race job.
Strategy 2: Shorter iteration budget under -race¶
Under -race, each iteration costs more. Reduce per-iteration work:
- Spawn fewer goroutines (2–4 instead of 8–16).
- Run fewer operations per iteration (16 instead of 32).
- Use smaller input bound caps.
The race detector still catches races on smaller inputs; it just runs many more of them.
Strategy 3: Avoid -race in tight loops¶
If your fuzz function spawns thousands of goroutines per iteration, even one iteration may take seconds under -race. Cap the loop:
8 goroutines is enough to find most concurrency bugs; 800 just slows the detector.
Strategy 4: Memory bound checks¶
-race uses ~5–10× more memory than the base build. Workers may OOM. Set process limits and adjust -parallel downward if needed.
Tuning -fuzztime and -fuzzminimizetime¶
-fuzztime¶
| Setting | When to use |
|---|---|
10s | Smoke test before pushing |
30s | Quick local exploration |
5m | Reasonable per-PR check |
10m | Nightly CI per target |
1h | Pre-release validation |
24h | Critical security code |
| (unlimited) | OSS-Fuzz-style continuous fuzzing |
For most teams, 10m per nightly target is the sweet spot: long enough to discover meaningful coverage, short enough that the CI matrix completes before morning.
-fuzzminimizetime¶
Minimisation budget. The default 60s is reasonable. Increase to 5m if your inputs are large and minimisation matters; decrease to 10s if you do not care about minimal reproducers and want to maximise mutation time.
-fuzzminimizetime=0x disables minimisation entirely — useful when triaging known issues where the input shape is already understood.
Iteration-count vs duration¶
Equal-budget across runs. Useful for benchmarking iteration rate across changes. Use 10000x to compare "before and after my fuzz target change."
Sharding across CI¶
A single go test -fuzz runs GOMAXPROCS workers. To use more parallelism, shard at the CI level.
Shard by target¶
One CI job per fuzz target. Easy and effective. Each job runs go test -fuzz=FuzzXxx -fuzztime=10m -race. The matrix runs in parallel.
Shard by seed subset¶
If a single target has thousands of seeds, split them across multiple CI jobs. Each job loads a subset:
func FuzzParseShard(f *testing.F) {
shard := os.Getenv("FUZZ_SHARD")
patterns := map[string]string{
"a": "testdata/samples/[abc]*.bin",
"b": "testdata/samples/[def]*.bin",
"c": "testdata/samples/[ghi]*.bin",
}
matches, _ := filepath.Glob(patterns[shard])
for _, p := range matches {
b, _ := os.ReadFile(p)
f.Add(b)
}
f.Fuzz(/* ... */)
}
Each CI matrix entry sets FUZZ_SHARD=a, b, or c. Discovery is independent per shard, but they share the committed corpus.
Combine corpora at end¶
After each shard runs, archive its generated corpus. A periodic merge job downloads all shards' corpora and uploads a combined corpus as the seed for future runs.
- run: go test -run=^$ -fuzz=^FuzzXxx$ -fuzztime=10m -fuzzcachedir=$HOME/corpus -race
- uses: actions/upload-artifact@v4
with:
name: corpus-${{ matrix.shard }}
path: $HOME/corpus
Corpus management at scale¶
A successful fuzzing program generates large corpora. Manage them deliberately.
Generated corpus¶
- Cleared with
go clean -fuzzcache. Useful when: - Upgrading Go.
- The corpus has grown to gigabytes.
- Internal coverage instrumentation has changed.
- Per-package, per-target. Use
-fuzzcachedirto share across CI runs.
Committed reproducers¶
- Stored under
<package>/testdata/fuzz/FuzzXxx/. - Each file represents a once-found-failure that the team has fixed.
- Periodically audit: if a reproducer covers a code path that no longer exists, delete it.
Curated samples¶
- Stored under
<package>/testdata/samples/(your convention). - Real-world inputs to bootstrap the seed corpus.
- Refresh periodically. Strip PII and sensitive data.
Size targets¶
testdata/fuzz/FuzzXxx/: typically < 100 entries per target. Each entry should map to a fixed bug.testdata/samples/: 100–1000 entries. Curated, representative.$GOCACHE/fuzz/...: bounded bygo clean -fuzzcacheand CI cache eviction. Can reach gigabytes.
Seed selection¶
The seed corpus is the fuzzer's launchpad. A good seed reaches deep coverage; a bad one wastes mutation budget.
Selection criteria¶
- Diversity. Cover as many distinct code paths as possible. Use coverage reports to verify.
- Minimality. Smaller seeds mutate faster and produce smaller reproducers.
- Realism. Real-world inputs explore code that random bytes would not.
- Boundary cases. Empty input, single byte, maximum length, all-zero, all-0xff.
Selecting from a corpus¶
When you have 10,000 candidate seeds but f.Add can hold only a few hundred efficiently, prune:
- Run all 10,000 with coverage instrumentation enabled.
- Greedy-select: pick the input that covers the most unseen edges; repeat until all edges are covered.
- The minimal covering set is your seed corpus.
This is a manual coverage-minimisation pass. The native fuzzer does not provide it; tools like dvyukov/go-fuzz-corpus and OSS-Fuzz infrastructure do.
Seeding from production¶
A pipeline that captures (sanitised) production inputs and feeds them to testdata/samples/ is the gold standard. Each new release has fresh corpora; coverage tracks real usage patterns.
Sample optimisation case study¶
Baseline target¶
func FuzzParseRequest(f *testing.F) {
f.Add([]byte("GET / HTTP/1.1\r\nHost: x\r\n\r\n"))
f.Fuzz(func(t *testing.T, data []byte) {
req, err := ParseRequest(data)
if err != nil {
return
}
out := req.Format()
req2, err := ParseRequest([]byte(out))
if err != nil {
t.Fatal(err)
}
if !reflect.DeepEqual(req, req2) {
t.Fatal("round-trip")
}
})
}
Initial measurement: 3,200 iter/sec under -race. Profile shows:
string([]byte)conversion inout := req.Format()— 22% of time.reflect.DeepEqual— 14% of time.- Repeated
regexp.CompileinsideParseRequest— 19% of time.
Optimisations applied¶
- Cache the regex at package level:
var headerRE = regexp.MustCompile(...). Eliminates 19%. - Replace
reflect.DeepEqualwith a custom equality function specialised for theRequesttype. Saves 12%. - Have
Formatwrite to a reusablebytes.Bufferinstead of returning a string. Saves 18%.
Result¶
Iteration rate: 9,800 iter/sec under -race. 3× speedup. Same coverage, same invariants, same corpus.
Takeaway¶
Profiling identifies the easy wins. Most fuzz targets have at least one 2–3× speedup available within an hour of work.
Summary¶
Optimising fuzz targets is engineering work with high payoff. The cost is paid once; the benefit accrues forever. Aim for at least 10,000 iter/sec under -race; pursue 100,000+ when feasible. Reduce allocations, use worker pools instead of per-iteration goroutine spawn, cap loop bounds derived from input, and profile representative replays to find the hot spots. Tune -fuzztime to a budget that fits CI windows. Shard fuzz targets across CI jobs for near-linear speedup. Curate seed corpora deliberately. Manage $GOCACHE/fuzz/ with periodic cleans. The discipline pays for itself the first time the fuzzer finds a race in seconds instead of hours.