Generic Types & Interfaces — Optimization¶

This file is about the cost of generic types and methods in Go, and how to measure and reduce it. We cover monomorphization vs dictionary passing (GC stenciling), memory layout, allocation behavior, and when a non-generic interface is the better choice.

Table of Contents¶

How the Go compiler implements generics
Monomorphization vs GC stenciling
Dictionary passing
Cost categories
Memory layout of instantiated types
Allocations and escape analysis
Inlining of generic methods
When to use a non-generic interface instead
Benchmarking generic code
Common micro-optimizations
Compile time and binary size
Decision rules
Summary

How the Go compiler implements generics¶

Three high-level options exist for implementing generics:

Full monomorphization — emit a separate compiled body per concrete type. Maximum performance, but blows up binary size.
Pure dictionary passing — emit one compiled body for all types; pass a "dictionary" of operations. Smallest binary, slower at runtime.
Hybrid (GC stenciling + dictionaries) — group types by their gcshape, emit one body per shape, pass dictionaries for per-instantiation differences.

Go (since 1.18) uses option 3. This means:

For each shape (e.g., "pointer-shaped types", "int-shaped types", "[]byte-shaped types") the compiler emits one function body.
A dictionary — a small struct of type metadata — is passed implicitly to each generic function call. It holds runtime info needed for operations like ==, allocation, type conversion, and reflection.

The result: most of the speed of monomorphization, most of the compactness of pure dictionary passing.

Monomorphization vs GC stenciling¶

What "shape" means¶

A gcshape groups types by:

Memory size (e.g., 8 bytes, 16 bytes).
GC properties (does the value contain pointers? where?).
Alignment.

Two types with the same shape can use the same compiled body — the compiler does not need to look at the type, only at the layout.

Concrete examples¶

Type set	Same shape?
`int32`, `uint32`	Yes — 4-byte non-pointer scalars
`int64`, `uint64`, `float64`	Yes — 8-byte non-pointer scalars
`User`, `Order`, `*Item`	Yes — all pointers (8 bytes, GC pointer)
`[]byte`, `[]int`, `[]string`	Slightly different — slice header is identical, but element shape may differ
`User`, `Order` (different structs)	Almost certainly different shapes

Practical consequence¶

A Stack[*User] and Stack[*Order] share one compiled body.
A Stack[int] and Stack[*User] use different bodies.

This is mostly invisible. You see it only when you read the disassembly or look at binary growth metrics.

Dictionary passing¶

The dictionary contains:

A *runtime._type for the type parameter (so equality, allocation, and type assertions work).
Pointers to specialized helper functions (e.g., a == implementation for the parameter type).
Sub-dictionaries for type parameters that themselves have type parameters.

The dictionary is passed as an extra argument by the caller. Its lookup adds a small but real cost — typically a few nanoseconds per generic call.

Example — what gets specialized¶

type Set[T comparable] struct { m map[T]struct{} }

func (s *Set[T]) Add(v T) { s.m[v] = struct{}{} }

For Set[int]: - The compiled body uses int hashing (specialized). For Set[*User]: - The body uses pointer hashing (specialized via dictionary).

The hash operation is one of the things the dictionary parameterizes.

Cost categories¶

Where do generic methods/types pay their costs?

Cost	Magnitude	When
Dictionary load	~1 ns	every generic call
Hash/equal via dictionary	~few ns	map ops, set ops
Boxing avoidance	NEGATIVE cost (savings)	replacing `interface{}`
Larger binary	hundreds of KB	many distinct shapes
Slower compile	tens of %	heavy use of generics
Less aggressive inlining	varies	generic call across pkg
Cache-friendliness	varies	complex generics over big structs

The savings are often larger than the costs — especially compared to interface{} baselines.

Memory layout of instantiated types¶

Stack[int] and Stack[string] have different memory layouts because their element types differ.

type Stack[T any] struct {
    items []T
}

Stack[int]: items is []int — slice header (24 bytes) pointing to int elements (8 bytes each on 64-bit).
Stack[string]: items is []string — slice header (24 bytes) pointing to string headers (16 bytes each: ptr + len).
Stack[*User]: items is []*User — slice header pointing to pointers (8 bytes each).

The compiler lays out fields exactly as if you had hand-written the struct.

Cache behavior¶

For value types of small size, generic code is at least as cache-friendly as hand-written, often more so (no boxing, contiguous storage). For pointer types, it depends on what the pointers chase to.

For very large T (huge structs), prefer *T:

type Cache[K comparable, V any] struct { m map[K]V }
// avoid: Cache[string, BigStruct]
// prefer: Cache[string, *BigStruct]

Otherwise, every Get and Set copies a big value.

Allocations and escape analysis¶

Generics do not change the rules of escape analysis, but they do interact with it.

Pattern 1 — local generic value, no escape¶

func Sum[T Numeric](xs []T) T {
    var s T
    for _, x := range xs { s += x }
    return s
}

s lives on the stack. No allocation.

Pattern 2 — generic struct returned by pointer¶

func NewStack[T any]() *Stack[T] { return &Stack[T]{items: nil} }

Stack[T]{} escapes through the return — heap allocation. Same as the non-generic case.

Pattern 3 — boxing into a generic interface¶

type Pusher[T any] interface { Push(T) }

func Use(p Pusher[int]) { p.Push(42) }

Calling Use(stack) boxes stack into the interface header — the same cost as a non-generic interface call.

Detecting escapes¶

go build -gcflags='-m' ./...

Read the output. Generic code shows up the same way as non-generic code in escape analysis output.

Inlining of generic methods¶

The Go compiler can inline generic methods, but with some caveats:

If the call site has all type parameters fixed (statically known), inlining is comparable to non-generic.
Cross-package calls with unknown gcshape may not inline.
Larger generic bodies are less likely to inline.

To check inlining decisions:

go build -gcflags='-m=2' ./...

For hot paths, structure the function to fit within the inliner's budget (default 80 nodes-ish).

When to use a non-generic interface instead¶

Generics are not always the right answer. Choose a non-generic interface when:

Heterogeneity matters. A []Shape mixing circles and squares cannot be []Circle or []Square. Generics enforce homogeneity.
Open extension. External users will plug in new implementations you do not know. Generic types lock you into a specific T; an interface lets anyone implement.
Plugin-style architecture. Decoupling at runtime via dynamic dispatch.
Reflection-heavy code. When the code already uses reflect, generics add ceremony without benefits.
Simple cases where the boxing cost is irrelevant. A handful of calls per request, with values already on the heap, gain nothing from generics.
You want a stable, narrow API surface. A non-generic interface is easier to evolve.

In short: generics for "container of T" or "operation on T"; interfaces for "any of these things".

Benchmarking generic code¶

Set up a fair benchmark:

func BenchmarkSetGeneric(b *testing.B) {
    s := NewSet[int]()
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        s.Add(i)
        _ = s.Has(i)
    }
}

func BenchmarkSetInterface(b *testing.B) {
    s := make(map[interface{}]struct{})
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        s[i] = struct{}{}
        _, _ = s[i]
    }
}

Run with allocations:

go test -run x -bench . -benchmem -cpu 1

Typical observations on a generic set vs interface set:

1.5×–3× faster for value-type elements.
30–60% less memory.

For a generic concurrent map vs sync.Map with type assertions:

Roughly equal speed.
Similar memory.
Cleaner code — that is the win.

Always run benchmarks on real data shapes, not toy examples.

Common micro-optimizations¶

1. Pre-allocate slices¶

func NewStack[T any]() *Stack[T] {
    return &Stack[T]{items: make([]T, 0, 16)}
}

Saves a few growth-and-copy steps for the first 16 pushes.

2. Pointer to large `T`¶

type Cache[K comparable, V any] struct{ m map[K]V }
// for big V, prefer Cache[K, *V]

Saves on per-Set/Get copies.

3. Avoid unnecessary interface wraps¶

If you have a *Stack[int], do not wrap it in Pusher[int] if you do not need polymorphism.

4. Reuse dictionaries¶

Calling many small generic functions in a loop with the same parameter type means the dictionary load happens repeatedly. Where it matters, consider hoisting the call into a non-generic helper.

5. Consider `cmp.Ordered` vs hand-rolled `Numeric`¶

cmp.Ordered is well-optimized in the standard library and recognized by tooling. Custom Numeric interfaces are fine but lose some standard-library affordances.

6. Use `slices` and `maps` packages¶

slices.Index, slices.Sort, maps.Clone are tuned by the standard library and avoid reinventing the wheel.

Compile time and binary size¶

Heavy generic use can:

Increase compile time by 20–50%.
Increase binary size by hundreds of KB.

Each unique gcshape requires its own compiled body. A package using Stack[int], Stack[string], Stack[*User], Stack[*Order] adds two or three bodies (pointers share a shape).

Mitigations:

Avoid creating tiny generic types that exist only to be instantiated once.
Prefer one big generic type with several methods over many small generic types.
For internal-only helpers, sometimes a non-generic specialized version is fine.

If binary size is a concern (embedded targets, lambdas), measure with:

go build -ldflags="-s -w" -o bin ./cmd/...
ls -la bin

And compare with and without your generic refactor.

Decision rules¶

A pragmatic flowchart:

Is the relationship "container of T" or "operation on T"?
├── No  → use a regular interface
└── Yes → continue

Are all elements the same concrete T at compile time?
├── No  → use a regular interface
└── Yes → continue

Will the value type be small (≤ 32 bytes) and copied a lot?
├── Yes → generic by value gives the biggest speed win
└── No  → generic by pointer; check if generics still help over interface

Do you need == on T?
├── Yes → constrain to comparable
└── No  → any

Are you on a hot path where every nanosecond counts?
├── Yes → benchmark generic vs hand-specialized
└── No  → ship the generic version

In 90% of cases, generic types and interfaces are simply the right choice — clearer, safer, and faster than the interface{} alternative. Only the remaining 10% need careful tuning.

Summary¶

Go uses GC stenciling with dictionaries — a hybrid of monomorphization and dictionary passing.
Pointer-shaped types share one compiled body; many value types share another.
The dictionary cost is small but real (~ns per call).
Generic code typically beats interface{} for value types (no boxing) and matches it for pointer types.
Choose non-generic interfaces for heterogeneity, open extension, and plugin-style architectures.
Always benchmark on real workloads. Measure compile time and binary size on heavy generic refactors.

End of optimize.md.