Skip to content

Fuzzing — Professional

← Back

Fuzzing in a hobby project is a curiosity. Fuzzing in a production codebase is a discipline. The transition is not about learning more f.Fuzz syntax — it is about learning how to choose targets, allocate CPU budget, triage findings, coordinate with a security team, and run the whole thing as boring, predictable infrastructure that catches real bugs before customers do.

This page is the operator manual. By the time you reach it you already know how to write a FuzzXxx function, manage a seed corpus, and read a crash file. The remaining questions are organizational: where do we point the fuzzer, how long do we run it, what do we do with the crashes, and how do we prove the program is worth the cycles it consumes?


1. What is worth fuzzing in production code

The single biggest mistake teams make when adopting fuzzing is "fuzz everything." Fuzzing has a cost — CPU time, engineer attention to triage, flaky CI minutes — and not every function pays it back. A useful target has three properties: a wide input surface, a meaningful invariant the fuzzer can detect, and a real adversary that might exercise it.

1.1 Parsers and decoders

Parsers are the canonical fuzz target. They consume untrusted byte sequences, they have deep state, and a bug typically presents as a panic or an infinite loop — both of which fuzzing detects natively. Within a typical Go service the parser surface looks like this:

// Strong fuzz candidates
package config

func ParseYAML(data []byte) (*Config, error)
func ParseTOML(data []byte) (*Config, error)
func ParseJSON5(data []byte) (*Config, error)

package auth

func DecodeJWT(token string) (Claims, error)
func ParseCookie(raw string) (Session, error)

package protocol

func DecodeFrame(b []byte) (Frame, error)
func ParseHTTPRequest(r *bufio.Reader) (*Request, error)

For each of these you can usually write a one-line oracle: "the decoder must not panic on any input." That alone catches dozens of bugs across a large codebase.

1.2 Security boundaries

Anything that sits between an untrusted caller and a privileged operation is a fuzz target by default. Authentication token parsers, ACL evaluators, SQL query builders that accept user-supplied filters, URL parsers, path canonicalizers — these are the components where a bug is not just a crash but a potential CVE.

func FuzzAuthorize(f *testing.F) {
    seeds := []string{
        "GET /api/users",
        "POST /api/users",
        "GET /api/users/../admin",
        "GET /api/users/%2e%2e/admin",
    }
    for _, s := range seeds {
        f.Add(s)
    }

    f.Fuzz(func(t *testing.T, request string) {
        // Invariant: any request the parser accepts must satisfy the
        // canonical-path property. If it does not, we may have a path
        // traversal hidden behind encoding tricks.
        parsed, err := ParseRequest(request)
        if err != nil {
            return
        }
        if strings.Contains(parsed.Path, "..") {
            t.Fatalf("parser accepted traversal path: %q", parsed.Path)
        }
        if parsed.Path != path.Clean(parsed.Path) {
            t.Fatalf("non-canonical path slipped through: %q", parsed.Path)
        }
    })
}

The fuzzer is not looking for a panic here — it is looking for a violation of a security invariant. That is the most valuable kind of fuzz target in a production codebase, and it is also the easiest to justify to a security team when you are asking for CI budget.

1.3 State machines

Anything with a connection lifecycle — a database driver, an HTTP/2 framer, a gRPC stream, a websocket handler — is a hidden state machine, and the transition graph is where bugs hide. You can fuzz the transition function directly:

func FuzzConnectionLifecycle(f *testing.F) {
    f.Fuzz(func(t *testing.T, ops []byte) {
        c := newConn()
        for _, op := range ops {
            switch op % 6 {
            case 0:
                c.Handshake()
            case 1:
                c.Send([]byte("ping"))
            case 2:
                c.Receive()
            case 3:
                c.Close()
            case 4:
                c.Reset()
            case 5:
                c.Reconnect()
            }
        }
        // Invariant: after any sequence, internal counters must agree.
        if c.sent != c.acked+c.pending {
            t.Fatalf("counter drift: sent=%d acked=%d pending=%d",
                c.sent, c.acked, c.pending)
        }
    })
}

Each byte selects the next transition. The fuzzer rapidly explores the state-space graph and finds sequences a human writing example tests would never imagine.

1.4 Serialization round-trips

If your system marshals and unmarshals data — protobuf, msgpack, custom binary protocols, encryption envelopes — you almost certainly have a round-trip property:

func FuzzMarshalRoundTrip(f *testing.F) {
    f.Fuzz(func(t *testing.T, data []byte) {
        var v Message
        if err := v.Unmarshal(data); err != nil {
            return
        }
        out, err := v.Marshal()
        if err != nil {
            t.Fatalf("marshal after successful unmarshal failed: %v", err)
        }
        var v2 Message
        if err := v2.Unmarshal(out); err != nil {
            t.Fatalf("re-unmarshal failed: %v", err)
        }
        if !reflect.DeepEqual(v, v2) {
            t.Fatalf("round-trip mismatch:\n  in=%+v\n out=%+v", v, v2)
        }
    })
}

These tests find every off-by-one, every silent truncation, every forgotten field. They are the cheapest high-value fuzz tests in existence.


2. What NOT to fuzz

Equally important is recognizing what fuzzing is bad at.

  • Business logic with discrete cases. A function that takes OrderStatus and returns a boolean is not a fuzz target. It has six inputs. Write a table-driven test.
  • IO-bound code. A function that makes a network call cannot be fuzzed cheaply; you either burn time on the network or you stub the network and fuzz the parser around it. Fuzz the parser, not the IO.
  • Code that legitimately panics. Some packages (e.g. must-style helpers) panic by contract. You will either silence them or drown in false positives.
  • Non-deterministic functions. If the same input produces different outputs across runs (clock, randomness, goroutine scheduling), the fuzzer cannot minimize a crash. Wrap the non-determinism behind an injectable seed.
  • Functions with effectively-unbounded resource use. A target that allocates 1GB on certain inputs will OOM the fuzzer. Either add a guard on input size or use -fuzzminimizetime aggressively.

3. Fuzz budget allocation in a large monorepo

Suppose your Go monorepo has 400 packages. You cannot run every fuzz target every night for an hour each — that is 400 CPU-hours per day. You need a budget policy.

A reasonable starting allocation:

total CI budget for fuzz:        24 CPU-hours / 24h day
                                 -----
critical security boundaries:    12 CPU-hours, every commit, time-boxed 5min
core parsers (json, protobuf):    6 CPU-hours, nightly, 30min each
state machines & protocol:        4 CPU-hours, nightly, 15min each
new / changed targets in PR:      2 CPU-hours, per-PR, 60s each

The principle: spend CPU where bugs are most expensive. A bug in the JWT parser is worth far more attention than a bug in the metrics serializer. Be explicit about this in a written policy so engineers do not feel arbitrarily de-prioritized.

To enforce the policy mechanically, tag each target with a category:

// FuzzDecodeJWT — category: security-critical, budget: 30m nightly.
func FuzzDecodeJWT(f *testing.F) { /* ... */ }

A CI script reads the categories and dispatches runs accordingly. Do not let engineers hand-tune timeouts; centralize the policy.


4. Crash triage

When the fuzzer finds a crash, Go writes a reproducer to testdata/fuzz/FuzzXxx/<hash>. The file is a small text format:

go test fuzz v1
[]byte("AAAA\x00\xff\xff")

The first triage step is severity classification. A useful ladder:

  1. Cosmetic — a panic in a non-public helper that takes input only from trusted code. Log, fix in a normal PR, no urgency.
  2. Reliability — a panic in code reachable from network input that crashes the process. Fix this week, no security implications.
  3. Security-impacting — a panic, infinite loop, memory blowup, or invariant violation reachable from untrusted input where a remote attacker could trigger it. Trigger embargoed fix.
  4. Exploitable — a memory-safety bug (rare in pure Go but possible via unsafe or cgo) or an authentication bypass. Stop the line.

Each level dictates different process. Cosmetic bugs go through normal PR review. Security-impacting bugs go through a private branch, a CVE request, and coordinated disclosure.

4.1 Reading a crash file

// testdata/fuzz/FuzzDecodeJWT/8c9f7e
go test fuzz v1
string("eyJhbGciOiJub25lIn0..foo")

Open it, copy the literal into a regular table test, and confirm the bug reproduces under the normal test runner. If it does, you have a stable reproducer you can attach to a ticket and bisect against.

func TestDecodeJWT_Crash_8c9f7e(t *testing.T) {
    _, err := DecodeJWT("eyJhbGciOiJub25lIn0..foo")
    // Should error, must not panic.
    if err == nil {
        t.Fatal("expected error")
    }
}

This conversion — corpus file to named regression test — is the most important habit in production fuzzing. Every crash becomes a permanent test.

4.2 Filing a CVE

If the bug is security-impacting and the affected code is published (a library used externally, or a service whose source is open), you must coordinate a CVE.

The Go security team has a public process via the vuln.go.dev database and GitHub Security Advisories. Steps:

  1. File a private security advisory on the repository (Security tab → Report a vulnerability).
  2. Request a CVE ID via the GitHub form or via MITRE if the repository is not on GitHub.
  3. Coordinate an embargo date with downstream consumers (typically 90 days, shorter for trivial fixes, longer if a large vendor needs lead time).
  4. Prepare the patch on a private fork.
  5. On embargo day, push the patch, publish the advisory, and notify golang-announce if the bug touches a stdlib-adjacent package.

Do not push the fix to a public branch before the embargo lifts. Do not name the bug in commit messages until disclosure. Do not file a public issue.


5. Working with a security team

Once fuzzing becomes part of the build, you have a continuous source of potential security findings. The engineering team and the security team need a shared playbook.

A working agreement that has held up in production at several companies:

  • All fuzz crashes from a designated security-critical category route automatically to a private Slack channel and a private issue tracker project. They never appear in the main bug tracker until triaged.
  • Security on-call has 24 hours to classify. After classification, ownership transfers back to the owning team for the fix.
  • The disclosure timeline is fixed at 90 days from confirmed reproduction, with an option to extend for coordinated fixes across multiple vendors.
  • Fixes ship with an internal-only post-mortem within 30 days. The post-mortem includes: how the input was reached, why the existing tests did not catch it, and what new fuzz target or invariant prevents regression.

Without this kind of agreement the security team treats every fuzz crash as an emergency, the engineering team treats every escalation as noise, and the program collapses within a quarter.


6. Continuous fuzzing in CI

There are three sensible cadences.

6.1 Per-commit time-boxed

For security-critical targets, fuzz on every push for a small budget (30–120 seconds per target). This is "validation fuzzing" — it will not find deep bugs, but it catches regressions where someone reintroduces a known class of input that used to crash. Combine with -run=^$ to suppress the normal test run and -fuzztime=60s to bound execution.

go test ./auth/... \
    -run=^$ \
    -fuzz=FuzzDecodeJWT \
    -fuzztime=60s \
    -fuzzminimizetime=10s

6.2 Nightly extended

For broader coverage, run each registered fuzz target for 15–60 minutes overnight on dedicated runners. Cache the resulting corpus, append it to the checked-in seed corpus weekly. This is where most new bugs come from.

6.3 OSS-Fuzz integration

For open-source Go libraries, Google's OSS-Fuzz program runs continuous fuzzing on shared infrastructure free of charge. Onboarding is well documented and the value is enormous: OSS-Fuzz has independently found bugs in golang.org/x/text, golang.org/x/net, the standard library's encoding/gob, and many others. If your library is open source and security-sensitive, you should be on OSS-Fuzz.


7. Corpus management

The fuzz corpus is data. It has all the management challenges of any other dataset.

7.1 Checked-in seed corpus

The seed corpus lives in testdata/fuzz/FuzzXxx/ and is committed to the repository. Keep it small and curated: a handful of representative inputs, plus every previous crash reproducer. This is the "regression corpus" — it must be exercised on every CI run.

Bad practice:

testdata/fuzz/FuzzParseJSON/   # 40,000 generated files, 200MB total

Good practice:

testdata/fuzz/FuzzParseJSON/
  seed-empty-object
  seed-deep-nesting
  crash-2024-03-stack-overflow
  crash-2024-07-null-pointer
  crash-2025-01-utf16-surrogate

7.2 External corpus storage

The discovered corpus — the millions of inputs the fuzzer mutates into during long runs — should not live in git. It belongs in object storage (S3, GCS) and should be downloaded at fuzz time:

aws s3 sync s3://corpus-bucket/auth/jwt/ ${GOCACHE}/fuzz/auth/FuzzDecodeJWT/
go test ./auth -run=^$ -fuzz=FuzzDecodeJWT -fuzztime=1h
aws s3 sync ${GOCACHE}/fuzz/auth/FuzzDecodeJWT/ s3://corpus-bucket/auth/jwt/

Compress before upload, deduplicate by hash, and rotate older corpus that no longer contributes coverage. A 5GB corpus that never grows the coverage map is waste.


8. War stories — real Go CVEs found by fuzz

The Go ecosystem has a healthy track record of CVEs found by native and external fuzzing. Read these as case studies in what fuzzing is good at.

  • encoding/xml infinite loop on crafted input. A crafted XML document caused the decoder to enter an infinite loop. Found by stdlib fuzz target. Fix shipped in a point release; CVE issued.
  • net/http request smuggling via chunked encoding. A combination of Content-Length and Transfer-Encoding headers parsed inconsistently between Go's server and certain proxies enabled request smuggling. Discovered by fuzzing the HTTP/1.1 parser.
  • golang.org/x/net/html quadratic complexity in malformed tags. A pathological tag-name pattern triggered O(n^2) work in the tokenizer. Performance bug, but reachable from any web crawler; treated as reliability.
  • crypto/x509 name constraint parsing panic. A specially-crafted certificate caused a nil-pointer dereference in name constraint checking. Found by fuzzing certificate parsing; classified as security-impacting because it affected TLS verification.

The shared lesson: each bug lived in a parser that handled adversarial input, each was missed by hand-written tests for years, and each was found in minutes once a fuzz target existed. The cost of fuzzing was paid back the first day the target shipped.


9. Combining fuzz with coverage

Fuzz exploration is guided by code coverage — the engine prefers inputs that exercise previously-uncovered branches. You can use the inverse signal too: after a fuzz run, dump the coverage profile and look for surviving uncovered regions.

go test ./parser \
    -run=^$ \
    -fuzz=FuzzParse \
    -fuzztime=30m \
    -coverprofile=fuzz.cov

go tool cover -html=fuzz.cov -o fuzz.html

Open the HTML report and look for red blocks inside the fuzz target's reachable code. Red regions after a long fuzz run usually mean one of three things: the fuzzer cannot construct inputs that reach the region (a hint to expand the seed corpus), the region is dead code, or the region depends on state the fuzzer cannot reach without external setup (a hint to refactor or add a separate test).

This habit — "what is still red after fuzzing?" — is one of the most productive uses of coverage data you can adopt.


10. Fuzz-as-validation in PR review

A useful PR-time check is: "for every changed function that has a fuzz target, run that target for 60 seconds against the diff." This catches regressions where a refactor silently weakens parsing logic.

A script that does this on a GitHub Actions runner:

name: fuzz-diff
on: pull_request
jobs:
  fuzz-diff:
    runs-on: ubuntu-latest
    timeout-minutes: 15
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with: { go-version: '1.22' }
      - name: detect changed packages
        id: changed
        run: |
          pkgs=$(git diff --name-only origin/main...HEAD \
                 | xargs -I{} dirname {} | sort -u \
                 | xargs -I{} go list ./{}... 2>/dev/null | sort -u)
          echo "pkgs=$pkgs" >> "$GITHUB_OUTPUT"
      - name: fuzz changed packages
        run: |
          for pkg in ${{ steps.changed.outputs.pkgs }}; do
            targets=$(go test -list '^Fuzz' "$pkg" | grep '^Fuzz' || true)
            for t in $targets; do
              go test "$pkg" -run=^$ -fuzz="^${t}$" -fuzztime=60s || exit 1
            done
          done

Sixty seconds per target sounds tiny, but it is enough to surface input classes the existing corpus knew about. The job exists to enforce: "you cannot regress a known-good fuzz target without noticing."


11. Measuring fuzz effectiveness

Treat the fuzz program as you would any other engineering investment: it needs metrics, and the metrics need to be reported quarterly to whoever funds the CI bill.

Useful metrics:

  • Bugs per CPU-hour. Total unique bugs found, divided by total fuzz CPU hours. A healthy program lands somewhere in the range of 1 bug per 1,000–10,000 CPU-hours during steady state; the rate is much higher during the first month after a new target lands.
  • Coverage growth curve. Plot covered edges over wall-clock fuzz time per target. A target whose curve flattens within hours is "saturated" — reduce its budget and reinvest the cycles elsewhere.
  • Crash time-to-resolution. Median and p95 time from crash detection to merged fix. A growing p95 is a strong signal that triage has bottlenecked.
  • Regression rate. How often does a previously-fixed crash recur? Goal: zero. Non-zero means the crash regression test is not enforced.

Report all of this on the same dashboard as your normal test metrics. The fuzz program is part of the test program; do not silo its metrics.


12. Operating fuzz infrastructure

A few concrete recommendations from running fuzz at scale.

  • Dedicated runners. Do not share fuzz CPU with normal CI. Fuzzers will consume every cycle they can; co-tenanting them with build jobs causes unpredictable timeouts.
  • CPU isolation. Pin fuzz workers to specific cores. Allow only one fuzz worker per physical core; hyperthread siblings interfere measurably with coverage instrumentation throughput.
  • Memory limits. Constrain each fuzz worker to a fixed RSS via GOMEMLIMIT and OS-level cgroups. A target that allocates wildly will starve siblings.
  • Watchdogs. Wrap each fuzz invocation in a hard wall-clock timeout via timeout(1). Go's -fuzztime is honored cooperatively and a stuck target can outlive it.
  • Artifact retention. Save every crash file forever. Even crashes that appear duplicate today may correspond to distinct root causes that only emerge during triage. Storage is cheap.
  • No GPU. Despite occasional internet folklore, native Go fuzzing does not benefit from GPU acceleration. It is CPU- and branch-prediction-bound. Spend on more cores instead.
// Example wrapper a CI runner might use to launch a single fuzz job
// with all the operational guardrails in place.
//
// timeout(1) handles the outer kill-switch.
// GOMAXPROCS bounds worker count.
// GOMEMLIMIT bounds heap.
// -fuzztime bounds intended runtime.
// -parallel bounds in-test parallelism.
//
//   timeout --signal=KILL 35m \
//     env GOMAXPROCS=4 GOMEMLIMIT=4GiB \
//     go test ./pkg \
//       -run=^$ -fuzz=^FuzzParse$ \
//       -fuzztime=30m \
//       -fuzzminimizetime=2m \
//       -parallel=4 \
//       -timeout=35m

The discipline is simple: make every fuzz invocation reproducible from a single shell command, including resource limits. When something goes wrong at 03:00, the on-call engineer should be able to copy that command, change one flag, and reproduce the failure locally.


13. A worked example — adopting fuzz on an existing codebase

Pretend you inherit a 200-package Go monorepo that has never been fuzzed. A reasonable 90-day adoption plan:

Days 1–7. Inventory. Walk every package and tag each public function as parser, security boundary, state machine, serializer, or none-of-the-above. Output: a CSV. Most of the answers are "none-of-the-above" — that is fine.

Days 8–21. Pick the five highest-leverage targets. Almost always: the config parser, the auth token decoder, one binary protocol decoder, the URL or path normalizer, and the JSON-or-protobuf marshal round-trip. Write fuzz targets for each. Expect to find bugs within hours; do not be surprised.

Days 22–45. Build CI. Add the per-PR 60-second job, the nightly 30-minute job, and the corpus storage in S3. Wire crash output into the security private channel.

Days 46–60. Triage and fix. The first month produces the largest backlog; work it down. Convert every fixed crash into a regression test.

Days 61–90. Expand. Add five more targets. Tune budgets based on observed bug rates. Write the metrics dashboard. Brief leadership with numbers, not anecdotes.

By day 90 you have a real fuzz program, a curated set of high-value targets, a measurable bug rate, and an operational playbook. The next 90 days are incremental: more targets, longer corpus retention, eventual OSS-Fuzz integration if anything you maintain is public.


14. Final principles

Three rules that have survived every production deployment of Go fuzzing the author has seen.

  1. Treat the fuzzer as a colleague, not a magic wand. It will find bugs you would not have found, but only in places you point it, and only at invariants you teach it to check. The skill is target selection and invariant design, not running the command.
  2. Every crash becomes a regression test. Without this discipline the program leaks bugs back into the codebase faster than it removes them.
  3. Budget is finite; spend it where the adversary is. CPU you spend fuzzing your business logic is CPU you do not spend fuzzing your auth token parser. Be honest about the threat model and let it drive allocation.

Fuzzing is the cheapest, highest-yield testing technique Go has shipped in the last decade. Treated as a real engineering practice — with budgets, metrics, triage, and disclosure — it pays back its cost many times over. Treated as a curiosity, it fades out within a quarter. The difference is not technical; it is operational.


← Back