Skip to content

Generic Performance — Tasks

Exercise structure

  • 🟢 Easy — for beginners
  • 🟡 Medium — middle level
  • 🔴 Hard — senior level
  • 🟣 Expert — professional level

A solution sketch for each exercise is provided at the end. Each task is designed to measure something — code that just compiles is not enough. Run the benchmarks.


Easy 🟢

Task 1 — Benchmark generic vs concrete Min

Write MinG[T cmp.Ordered], MinC(int, int) int, and benchmarks for both. Confirm they are within 5% of each other.

Task 2 — Benchmark generic vs interface Sum

Implement SumGen[T int | float64], SumIface(s []any) int, and SumConc(s []int) int. Benchmark on 1,000,000 elements. Report the relative cost.

Task 3 — Run -benchmem

Take Task 2's benchmarks and add b.ReportAllocs(). Identify which version allocates and how many bytes.

Task 4 — Inspect inlining

Run go build -gcflags="-m=2" . on a package that defines func Add[T int | float64](a, b T) T { return a + b }. Find the line that says it is inlined.

Task 5 — Read a stencil name

Write a tiny program that calls Find[int] and Find[string], build with go build -gcflags="-S", and locate the two distinct stencil symbols.


Medium 🟡

Task 6 — slices.Sort vs sort.Slice

Write benchmarks for sorting 10,000 random int values using both APIs. Confirm slices.Sort is at least 25% faster.

Task 7 — Generic vs interface cache

Build CacheG[K comparable, V any] and CacheI (uses map[any]any plus assertions). Benchmark Get/Set over 100,000 operations. Report ns/op and allocations.

Task 8 — Escape analysis surprise

Write a generic Wrap[T any](v T) *T { return &v }. Confirm via -gcflags="-m" that v escapes to the heap. Now write a non-generic WrapInt(v int) *int. Compare the escape decisions.

Task 9 — Pointer-shape penalty

Implement Find[T comparable](s []T, target T) int. Instantiate it for int, string, and three custom struct types. Benchmark each. Compare with hand-written equivalents for the slowest case.

Task 10 — Profile with pprof

Take a benchmark from Task 7 and run it with -cpuprofile=cpu.prof. Open the profile in go tool pprof. Locate the stencil-mangled symbol.

Task 11 — slices.SortFunc migration

Convert this code to use slices.SortFunc:

sort.Slice(users, func(i, j int) bool {
    return users[i].Age < users[j].Age
})
Benchmark before / after.

Task 12 — Allocation-free generic

Write Reverse[T any](s []T) []T that returns a new slice. Then write a variant ReverseInPlace[T any](s []T) that mutates. Benchmark both. Identify the allocation in the first one.

Task 13 — Map of generic struct keys

Build Set[T comparable]. Benchmark Add+Has for int keys vs Point{x, y int} keys. Explain the difference.

Task 14 — Compare with golang.org/x/exp/constraints

Implement Sum first with a custom Number constraint, then with constraints.Integer | constraints.Float. Confirm they produce the same machine code (go tool objdump -s 'Sum').


Hard 🔴

Task 15 — Specialize a hot path

Write a Cache[K comparable, V any] and a userCache wrapper specialised for Cache[string, *User]. Benchmark both. Demonstrate that the specialised wrapper inlines the generic body.

Task 16 — Detect dictionary indirection

Construct a program where Find[T comparable] is instantiated for 5 distinct pointer-shaped types. Verify in pprof that all five appear under the same stencil name. Measure the overhead vs a hand-written equivalent.

Task 17 — iter.Seq[T] vs chan T vs chan interface{}

Implement a producer that yields integers using each of the three patterns. Benchmark consumption of 1,000,000 values. Rank the three by speed.

Task 18 — Generic JSON decoder

Write Decode[T any](data []byte) (T, error) using encoding/json. Compare with the equivalent Decode(data []byte, v any) error. Discuss whether the generic version saves anything.

Task 19 — Benchstat with -count

Run go test -bench=. -count=10 > new.txt. Use benchstat to compare against an earlier baseline. Determine if the change is statistically significant.


Expert 🟣

Task 20 — PGO experiment

Capture a CPU profile of a generic-heavy service. Place it as default.pgo and rebuild. Re-benchmark. Show the speedup attributable to PGO.

Task 21 — Binary size accounting

Use go tool nm and objdump -h to measure the size attributable to generic stencils in a real binary. Identify the top 5 contributors.

Task 22 — Detect a regression in a Go upgrade

Run a benchmark suite under Go 1.21 and Go 1.22 (or any two adjacent versions). Identify any benchmark that regressed and explain a likely cause from the release notes.


Solutions

Solution 1

import "cmp"

func MinG[T cmp.Ordered](a, b T) T {
    if a < b { return a }
    return b
}
func MinC(a, b int) int {
    if a < b { return a }
    return b
}

func BenchmarkMinG(b *testing.B) { for i := 0; i < b.N; i++ { _ = MinG(3, 5) } }
func BenchmarkMinC(b *testing.B) { for i := 0; i < b.N; i++ { _ = MinC(3, 5) } }
Expected: identical or within 1-2 ns/op.

Solution 2

func SumGen[T int | float64](s []T) T {
    var t T
    for _, v := range s { t += v }
    return t
}
func SumIface(s []any) int {
    t := 0
    for _, v := range s { t += v.(int) }
    return t
}
func SumConc(s []int) int {
    t := 0
    for _, v := range s { t += v }
    return t
}
Expected: SumIface is 15-30× slower than the others.

Solution 3

Add b.ReportAllocs() at the start of each benchmark function. SumIface allocates only if its input is built by boxing; if the slice is pre-built, it does not. The point of the exercise is to see the difference.

Solution 4

The compiler prints lines like:

./code.go:5:6: can inline Add[float64]
./code.go:7:14: inlining call to Add[int]
Numeric generics inline aggressively.

Solution 5

After go build -gcflags="-S" 2>asm.txt, search for Find\[. You will see entries like:

"".Find[go.shape.int_0] STEXT ...
"".Find[go.shape.string] STEXT ...

Solution 6

func BenchmarkSortSlice(b *testing.B) {
    for i := 0; i < b.N; i++ {
        s := makeRandom(10000)
        b.StartTimer()
        sort.Slice(s, func(i, j int) bool { return s[i] < s[j] })
        b.StopTimer()
    }
}
func BenchmarkSlicesSort(b *testing.B) {
    for i := 0; i < b.N; i++ {
        s := makeRandom(10000)
        b.StartTimer()
        slices.Sort(s)
        b.StopTimer()
    }
}
Expected: slices.Sort 25-40% faster.

Solution 7

type CacheG[K comparable, V any] struct{ m map[K]V }
type CacheI struct{ m map[any]any }

func (c *CacheG[K, V]) Get(k K) (V, bool) { v, ok := c.m[k]; return v, ok }
func (c *CacheI) Get(k any) (any, bool)    { v, ok := c.m[k]; return v, ok }
Expected: generic version 2-5× faster, no allocations on Get.

Solution 8

func Wrap[T any](v T) *T { return &v }
func WrapInt(v int) *int { return &v }
Both should escape to the heap — the generic version is no worse here. The point is to read the escape decision.

Solution 9

Multiple instantiations share the pointer-shape stencil. Hand-written findFoo outperforms Find[*Foo] by ~50% in tight loops because of dictionary indirection.

Solution 10

go test -bench=Cache -cpuprofile=cpu.prof
go tool pprof -http=:8080 cpu.prof
Look for CacheG[go.shape.string] (or similar) in the flame graph.

Solution 11

slices.SortFunc(users, func(a, b User) int { return a.Age - b.Age })
Expected: ~30-40% faster on large slices.

Solution 12

Reverse allocates a new slice (make([]T, n)). ReverseInPlace allocates nothing. Confirm with b.ReportAllocs().

Solution 13

Set[int] is fast — int keys hit the fast-path in the runtime hash. Set[Point] is slower — Point shares a stencil with other 16-byte structs and dictionary calls handle hashing.

Solution 14

Use objdump -s 'Sum' and confirm the body is identical. The constraint type is just a compile-time gate.

Solution 15

type Cache[K comparable, V any] struct { m map[K]V }
func (c *Cache[K, V]) Get(k K) (V, bool) { v, ok := c.m[k]; return v, ok }

type userCache struct { c Cache[string, *User] }
func (uc *userCache) Get(k string) (*User, bool) { return uc.c.Get(k) }
The wrapper should inline Get for a single concrete type. Inspect with -gcflags="-m=2".

Solution 16

Define five struct types containing pointers, instantiate Find for each, run benchmarks. Confirm via pprof that they appear as one stencil with five dictionary entries. Measure overhead vs findExactType for each.

Solution 17

Roughly: - iter.Seq[int] — fastest, zero allocation. - chan int — second; channel synchronization is the cost. - chan interface{} — slowest; boxing on send.

Solution 18

The generic Decode[T] saves the caller from allocating T and casting from any, but internally encoding/json uses reflection regardless. Net runtime cost is similar; ergonomic gain is real.

Solution 19

go test -bench=. -count=10 > old.txt
# make a change
go test -bench=. -count=10 > new.txt
benchstat old.txt new.txt
The output shows mean, p-value, and delta. Treat changes with p < 0.05 as significant.

Solution 20

go test -bench=. -cpuprofile=default.pgo
go build -pgo=auto .
# benchmark again
Expected: 2-5% speedup on hot generic paths.

Solution 21

go tool nm -size ./bin/app | sort -k1 -n -r | head -50
Look for symbols containing [go.shape.. The cumulative size is the generic stencil cost.

Solution 22

Common cause: a compiler optimization changed in the new release. Read the release notes' "Compiler" section. Examples: PGO defaults, register allocator changes, or improvements to the inliner.


Final notes

Performance work is iterative and measured. The point of these tasks is not the answer — it is the habit of running the benchmark, reading the profile, and trusting numbers over intuition. A senior Go programmer measures first, theorises later.