Benchmarks — Tasks¶

Hands-on exercises. Do them in order: each one introduces a tool or technique you will need in the next.

Task 1 — Write your first benchmark¶

Goal. Produce a working benchmark and read its output.

Steps.

Create a new directory bench-task-01 with go mod init example/bench01.
Add a file sum.go:

package bench01

func Sum(xs []int) int {
    var s int
    for _, x := range xs {
        s += x
    }
    return s
}

Add a file sum_test.go:

package bench01

import "testing"

var input = make([]int, 1000)

func init() {
    for i := range input {
        input[i] = i
    }
}

func BenchmarkSum(b *testing.B) {
    for i := 0; i < b.N; i++ {
        _ = Sum(input)
    }
}

Run go test -bench=. -benchmem.

Deliverable. Paste the output line. Identify:

The chosen b.N.
The ns/op.
The B/op and allocs/op.

Expected observation. allocs/op should be 0 — Sum does not allocate.

Task 2 — Convert to table-driven¶

Goal. Use b.Run so a single benchmark function exercises many input sizes.

Refactor BenchmarkSum so it runs four sub-benchmarks: sizes 100, 1_000, 10_000, 100_000.

func BenchmarkSum(b *testing.B) {
    sizes := []int{100, 1_000, 10_000, 100_000}
    for _, n := range sizes {
        xs := make([]int, n)
        for i := range xs {
            xs[i] = i
        }
        b.Run(fmt.Sprintf("n=%d", n), func(b *testing.B) {
            for i := 0; i < b.N; i++ {
                _ = Sum(xs)
            }
        })
    }
}

Run with -benchmem. Verify ns/op scales roughly linearly with n. If it does not, your benchmark has a bug — find it.

Task 3 — Add `b.SetBytes`¶

Sum reads n*8 bytes per call (assuming int is 8 bytes). Inside each sub-benchmark, call b.SetBytes(int64(len(xs) * 8)).

Deliverable. Output that now includes a MB/s column. The number should be roughly constant across sizes — that is the bandwidth your CPU can sustain on a tight integer-sum loop. Note the value.

Task 4 — Setup excluded with `b.ResetTimer`¶

Build a benchmark whose setup is deliberately slow. Compare with and without b.ResetTimer.

func BenchmarkWithSetup(b *testing.B) {
    // Expensive setup we do NOT want timed.
    data := make([]byte, 10_000_000)
    for i := range data {
        data[i] = byte(i)
    }
    // Without ResetTimer: setup time inflates ns/op.
    // With ResetTimer: only the inner work is measured.
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        _ = data[i%len(data)]
    }
}

Run the benchmark twice — once with the b.ResetTimer() line, once without. Report the ratio of ns/op. Explain.

Task 5 — Compare two implementations with `benchstat`¶

Goal. Demonstrate a real comparison workflow.

Install benchstat:

go install golang.org/x/perf/cmd/benchstat@latest

Write two implementations of string concatenation:

package bench05

import "strings"

func ConcatPlus(parts []string) string {
    var s string
    for _, p := range parts {
        s += p
    }
    return s
}

func ConcatBuilder(parts []string) string {
    var b strings.Builder
    for _, p := range parts {
        b.WriteString(p)
    }
    return b.String()
}

Benchmark both with the same input:

var parts = make([]string, 100)

func init() {
    for i := range parts {
        parts[i] = "abcdef"
    }
}

func BenchmarkConcatPlus(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i < b.N; i++ {
        _ = ConcatPlus(parts)
    }
}

func BenchmarkConcatBuilder(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i < b.N; i++ {
        _ = ConcatBuilder(parts)
    }
}

Run each ten times:

go test -bench=ConcatPlus -count=10 -benchmem > plus.txt
go test -bench=ConcatBuilder -count=10 -benchmem > builder.txt
benchstat plus.txt builder.txt

Deliverable. Paste the benchstat output. Identify the percentage delta and the p-value.

Task 6 — Identify a benchmark trap¶

The following benchmark gives 0.27 ns/op. Explain why and fix it.

package bench06

import "testing"

func square(x int) int { return x * x }

func BenchmarkSquare(b *testing.B) {
    for i := 0; i < b.N; i++ {
        square(i)
    }
}

Expected fix. Either assign to a package-level sink:

var sink int

func BenchmarkSquare(b *testing.B) {
    var s int
    for i := 0; i < b.N; i++ {
        s = square(i)
    }
    sink = s
}

Or use for b.Loop() on Go 1.24+. Run both forms; compare ns/op.

Task 7 — `RunParallel` on a mutex-protected counter¶

Implement two counters: one with sync.Mutex, one with sync/atomic.Int64. Benchmark both under contention with b.RunParallel.

func BenchmarkMutexCounter(b *testing.B) {
    var (
        mu sync.Mutex
        n  int64
    )
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            mu.Lock()
            n++
            mu.Unlock()
        }
    })
}

func BenchmarkAtomicCounter(b *testing.B) {
    var n atomic.Int64
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            n.Add(1)
        }
    })
}

Run with -cpu=1,2,4,8 to see scaling. Report which scales better.

Task 8 — Profile-collecting run¶

For the slower of the two counters from Task 7, collect a CPU profile and a mutex profile.

go test -bench=BenchmarkMutexCounter -cpuprofile=cpu.out -mutexprofile=mutex.out -count=1
go tool pprof -top cpu.out
go tool pprof -top mutex.out

Deliverable. The top three entries of each profile. Explain where contention shows up.

Task 9 — Reproducibility experiment¶

Run Task 5's benchmark five times on your laptop, each time with -count=10. Save each as run-N.txt. Then run:

benchstat run-1.txt run-2.txt run-3.txt run-4.txt run-5.txt

Question. Do the means drift across runs? By how much? This is your laptop's noise floor — improvements smaller than this number are statistically indistinguishable.

Task 10 — Stretch goal: noise reduction¶

If you are on a Linux box:

Set the CPU governor to performance (root):

echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Disable turbo (Intel):

echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo

Pin to one physical core:

taskset -c 3 go test -bench=. -count=10 -benchmem > pinned.txt

Compare pinned.txt to a normal run. The reduction in stddev is what professional benchmarkers buy with this setup.

Benchmarks — Tasks¶

Task 1 — Write your first benchmark¶

Task 2 — Convert to table-driven¶

Task 3 — Add b.SetBytes¶

Task 4 — Setup excluded with b.ResetTimer¶

Task 5 — Compare two implementations with benchstat¶