Generic Limitations — Optimize¶
Table of Contents¶
- The cost of each workaround
- Free function vs method
- Type-switch via
anycost - Element-by-element copy cost
- Reflection vs interface vs codegen
- Choosing the lowest-cost workaround
- Profiling for limit-driven hot spots
- Summary
The cost of each workaround¶
Every generic limit forces you toward a workaround. Each workaround has a different runtime profile. The table is the cheat sheet:
| Workaround | Typical overhead | Allocations | Notes |
|---|---|---|---|
| Free function (replacing method) | 0 ns | 0 | Compiles to identical code |
any(v).(type) switch | 1-5 ns + possible alloc | 0-1 | Boxing if T is value-typed |
| Element-by-element copy | O(n) | 1 (the new slice) | Cannot be amortized |
| Cached reflection | ~50 ns first call, ~10 ns cached | 0 cached | Depends on cache hit rate |
| Codegen | 0 ns | 0 | Build-time cost only |
| Interface dispatch | ~1-2 ns | 0-1 | One v-table call |
The right workaround minimizes the overhead given the call frequency and the data size.
Free function vs method¶
This is the cheapest workaround by far. The Go compiler generates identical code for:
// Method version (forbidden because of new type parameter)
// func (b Box[T]) Map[U any](f func(T) U) Box[U] { return Box[U]{V: f(b.V)} }
// Free function workaround
func MapBox[T, U any](b Box[T], f func(T) U) Box[U] {
return Box[U]{V: f(b.V)}
}
The receiver b becomes a regular first parameter. The dictionary lookup mechanism is unchanged. The only "cost" is at the call site: MapBox(b, f) instead of b.Map(f). There is no runtime difference.
Benchmark sketch¶
func BenchmarkMethodCall(b *testing.B) {
s := &Stack[int]{}
for i := 0; i < 1000; i++ { s.Push(i) }
b.ResetTimer()
for i := 0; i < b.N; i++ {
_ = s.Len() // method call
}
}
func BenchmarkFreeFunction(b *testing.B) {
s := &Stack[int]{}
for i := 0; i < 1000; i++ { s.Push(i) }
b.ResetTimer()
for i := 0; i < b.N; i++ {
_ = LenStack(s) // equivalent free function
}
}
Both run at the same speed. The compiler emits the same instructions.
Conclusion¶
Never worry about the perf of switching from a method to a free function. The "cost" is purely ergonomic.
Type-switch via any cost¶
This is where surprises happen. any(v).(type) looks innocuous but carries:
- Conversion to interface — for value types, this allocates if escape analysis cannot prove the boxed value stays on the stack.
- Type tag dispatch — the type switch compares the runtime type tag against each case.
- Loss of constant-folding — the compiler cannot fold the switch at compile time.
Benchmark numbers (illustrative)¶
For T = int:
| Approach | ns/op | allocs |
|---|---|---|
| Direct branch on a constraint | 0.5 | 0 |
switch any(v).(type) | 4-8 | 0-1 |
| Interface method call | 2-3 | 0 |
For T = struct{a, b int}:
| Approach | ns/op | allocs |
|---|---|---|
| Direct field access | 0.3 | 0 |
switch any(v).(type) | 8-15 | 1 (boxing) |
| Interface method call | 3-5 | 1 |
The boxing cost is the killer for value types. Inside a hot loop, this can dominate.
Mitigations¶
- Hoist the switch out of the loop — do the type analysis once, then loop with the resolved branch.
- Use an interface if the per-type behaviour is real polymorphism.
- Specialize the hot path — write a non-generic helper for the common type.
// Slow:
for _, v := range items {
switch x := any(v).(type) {
case int: total += x
case string: total += len(x)
}
}
// Faster: dispatch once
switch first := any(items[0]).(type) {
case int:
s := items.([]int) // (assuming convertibility)
for _, v := range s { total += v }
case string:
/* etc. */
}
Element-by-element copy cost¶
When invariance forces you to convert []Cat to []Animal, the cost is unavoidable:
For each element: - One interface conversion (boxing). - One slice store.
For n elements: O(n) operations and one allocation for the result slice. If Cat is a value type, n boxings happen. For pointer types it is just n stores.
When this hurts¶
- A hot middleware that converts request slices on every call.
- A serialization layer that re-types elements before encoding.
- A data-pipeline node that must adapt slice element types.
Mitigations¶
- Design with the broader type from the start — accept
[]Animalin the API, not[]Cat. - Avoid the conversion — work with
[]Catthroughout if downstream consumers are flexible. - Cache the converted slice if the input does not change.
- Use
unsafeas a last resort to reinterpret memory — but only if the layouts genuinely match and you accept the safety cost. (Discouraged.)
Reflection vs interface vs codegen¶
When the limit pushes toward a runtime mechanism, three choices remain. Compare:
Reflection¶
Cost per call: ~50-200 ns uncached, ~10 ns cached.
Best for one-shot operations on dynamic input (decode, validate). Disastrous in inner loops.
Interface¶
Cost per call: ~1-2 ns. One v-table lookup, no allocation if the value is already an interface.
Best for any time you can express the operation as a method.
Codegen¶
Cost per call: 0 ns. Pre-compiled to direct calls.
Best for stable types where method sets must vary.
The hierarchy¶
Use interfaces first. Reach for codegen when method sets vary per type. Use reflection only when the input is genuinely dynamic.
Mixing the three is normal in mature codebases — typed interface API on top, reflection for serialization, codegen for client stubs.
Choosing the lowest-cost workaround¶
A practical decision tree:
You hit a limit. Which workaround?
├─ Method needs new type param?
│ └─ Free function — 0 cost, no thought required.
│
├─ Type-switch on T?
│ ├─ Per-type behaviour really differs? → Interface (~2 ns)
│ └─ Just formatting / one-off? → any(v).(type) at boundary
│
├─ Container covariance?
│ ├─ Slice is small or static? → Element-by-element copy
│ └─ Slice is large + hot? → Restructure to use the broader type from the start
│
├─ HKT abstraction?
│ └─ Per-container free functions. Verbose, no perf cost.
│
├─ Specialization for hot type?
│ ├─ Profile-guided? → PGO; no code change
│ ├─ Critical path? → Hand-write a non-generic helper
│ └─ One-shot? → Inline the hot branch with any(v).(type)
│
└─ Dynamic per-type metadata?
├─ Stable types, small set? → Codegen
├─ Open type space? → Cached reflection
└─ Single dispatch? → Interface
The rule of thumb: prefer the workaround that compiles to direct code. Free functions and codegen do; reflection and any(v) do not.
Profiling for limit-driven hot spots¶
When you suspect a limit-driven workaround is hurting performance:
1. CPU profile¶
Look for: - runtime.convT* functions — boxing for any(v) conversions. - runtime.typeswitch* — type-switch dispatch. - reflect.* calls — reflection overhead.
If they appear in your hot path, the workaround cost is real.
2. Allocation profile¶
Look for allocations matching the rate of your any(v) calls. Each allocation is a boxing.
3. -gcflags="-m" to inspect escape analysis¶
Look for messages like:
If v escapes only because of the workaround, consider restructuring the call so the boxing happens once outside the loop.
4. PGO¶
# Record profile in production
go test -cpuprofile=prod.prof
# Build with PGO
go build -pgo=prod.prof .
PGO can devirtualize hot generic dispatches automatically. Even if you do not change a line of code, the compiler may specialize for the dominant instantiation.
Summary¶
The generic limitations have predictable workaround costs:
| Workaround | Cost level |
|---|---|
| Free function | None |
| Codegen | None (build-time only) |
| Element copy | Linear in slice size |
| Interface | Small (1-2 ns) |
any(v).(type) | Small per call, big in hot loops |
| Reflection (cached) | Medium |
| Reflection (uncached) | Big |
Choose the lightest workaround for your use case:
- Lift methods to free functions without hesitation. No cost.
- Use interfaces when the limit pushes toward polymorphism.
- Reach for
any(v).(type)only at boundaries, never inside hot loops. - Restructure away from element-by-element copy when the hot path demands it.
- Cache reflection if you must use it.
- Codegen when method sets vary per type or when the perf budget rules out alternatives.
The biggest performance lesson: the workaround you choose has more impact on speed than the limit itself. A good workaround is invisible; a bad one shows up at the top of every flame graph. Profile, choose deliberately, and let the design follow the data.