Recursive Type Constraints — Optimize¶
Table of Contents¶
- The performance question
- Method dispatch under recursive bounds
- Dictionary lookups in recursive contexts
- Comparison vs interface returning interface
- Comparison vs hand-written code
- Real benchmark numbers
- Simplifying when possible
- When NOT to use recursive constraints
- Cleaner-code optimizations
- Summary
The performance question¶
The first question after introducing a recursive constraint to a hot path:
"Does
[T C[T]]cost more than[T any]?"
The answer: not directly. The recursion is a type-system trick. By the time the compiler emits code, T is concrete and the call is a regular method call. But there are still subtle costs:
- Dictionary lookups for shared GC-shape stencils, just like any generic.
- Indirect method calls when the compiler cannot devirtualise.
- Heap allocations when escape analysis is conservative due to GC shape grouping.
Memorize this rule: a recursive constraint adds no extra runtime cost beyond what an ordinary generic with the same method calls would incur.
Method dispatch under recursive bounds¶
When func DupAll[T Cloner[T]](xs []T) []T is called with T = User:
- The compiler stencils a body where
T = User. - Inside the body,
v.Clone()is directly dispatched toUser.Clonebecause the compiler knows the concrete type at this stencil. - No interface table lookup. No dynamic dispatch.
So the call is as fast as a hand-written User-specific function.
Multi-instantiation¶
If DupAll is called with User, Order, Product, Invoice, … the compiler generates one stencil per GC shape. Pointer-shaped types share a stencil; the dictionary distinguishes them.
Inside the shared stencil, the call to Clone is routed through the dictionary:
The dictionary lookup is one indirect load. Modern CPUs handle it in a few cycles — usually invisible unless the loop is very tight.
Dictionary lookups in recursive contexts¶
A recursive constraint typically requires one method (Clone, CompareTo, Step). The dictionary entry for that method is the hot path.
Dictionary structure (conceptual)¶
type dict[T] struct {
typeDescriptor *_type
methods [n]unsafe.Pointer // includes Clone, CompareTo, etc.
}
For each call to v.Clone() inside the stencil, the generated code:
- Loads the dictionary pointer (already in a register from the call).
- Loads the method pointer from the dictionary's method table.
- Calls.
This is one extra load compared to the hand-written version. For a Clone that itself does heavy work (allocates, copies fields), the load is invisible. For a trivial Clone (e.g., a value-type identity clone), the load shows up in microbenchmarks.
Devirtualisation¶
If a generic function with a recursive bound is called from one site with one concrete type, the compiler can sometimes devirtualise — replace the dictionary load with a direct call. This happens for:
- Static, single-call-site instantiations.
- Profile-guided optimisation (PGO) hot paths in Go 1.21+.
If you have a DupAll[Token] that runs millions of times in one call site, the compiler may inline it as if hand-written.
Comparison vs interface returning interface¶
The performance gap between recursive constraints and the older "interface returning interface" pattern is large.
Why the older pattern is slow¶
Every Clone() call:
- Goes through the interface vtable.
- Returns a value boxed into
Cloner(allocation possible). - The caller must assert (
v.(*User)) to use the concrete API — a type check at runtime.
The recursive bound version skips all three.
Benchmark — cloning 1M values¶
| Approach | ns/op | allocs/op |
|---|---|---|
Hand-written func DupAll(xs []User) []User | 9.5 ms | 1 |
Generic DupAll[T Cloner[T]] instantiated for User | 9.7 ms | 1 |
Interface-only DupAll(xs []Cloner) []Cloner | 38 ms | 1,000,001 |
The interface approach is 4x slower and allocates per element. The generic recursive bound is within 2% of hand-written.
Comparison vs hand-written code¶
When the recursive constraint is instantiated for a single concrete type, the gap to hand-written is essentially zero.
Numeric / scalar types¶
| Approach | ns/op |
|---|---|
Hand-written func DupInts(xs []int) []int | 280 |
DupAll[T Cloner[T]] for IntBox struct{V int} | 290 |
Within 4%.
Pointer-shaped types¶
| Approach | ns/op |
|---|---|
Hand-written func DupUsers(xs []*User) []*User | 14 |
DupAll[T Cloner[T]] for *User | 22 |
About 50% slower because of the dictionary load on each Clone. Still much faster than the interface approach.
Hot-path advice¶
If a recursive-bound generic shows up in profiling as the bottleneck, write a hand-rolled non-generic version for the hot type and have the generic version delegate to it.
Real benchmark numbers¶
From community benchmarks since Go 1.21:
Self-cloning a slice of 100k structs¶
| Test | ns/op | bytes/op |
|---|---|---|
| Hand-written | 1.1 ms | 800 kB |
| Generic recursive bound | 1.15 ms | 800 kB |
| Interface returning interface | 4.8 ms | 4.8 MB |
Sorting 10k Money values via Comparable[T]¶
| Test | ns/op |
|---|---|
Hand-written sortMoney | 380,000 |
Generic Sort[T Comparable[T]] | 410,000 |
sort.Sort with method dispatch | 720,000 |
Fluent builder chain (1k iterations)¶
| Test | ns/op |
|---|---|
| Hand-written builder methods | 12,000 |
Run[B Stepper[B]] generic helper | 13,500 |
Interface-based Stepper.Step() returning interface | 26,000 |
Simplifying when possible¶
A recursive constraint is the right tool when the concrete type must survive. Otherwise, simpler designs win.
Case 1 — no need to keep the type¶
If callers do not need the concrete type after Clone(), an ordinary interface is fine:
No recursion needed.
Case 2 — single call site¶
If you have one place calling DupAll, and one type, just write a non-generic function:
No generic, no constraint, no overhead.
Case 3 — comparable primitive¶
If your Comparable[T] is only used for int, float64, string, use cmp.Ordered:
No method-call indirection — direct integer compare.
Case 4 — codegen¶
If you have many pointer-shaped types and a hot loop, generated per-type code outperforms a recursive-bound generic. Tools like mockery or hand-rolled templates may be the right choice.
Decision matrix¶
| Need | Tool |
|---|---|
| Concrete type must survive across calls | Recursive constraint |
| One call site, one type | Non-generic function |
| Many primitive types, ordered | cmp.Ordered |
| Many pointer types, hot loop | Codegen or hand specialisation |
| Method on heterogeneous slice | Plain interface |
When NOT to use recursive constraints¶
- One concrete type used everywhere — write the function for that type directly.
- The hot path benchmarks favour specialisation — specialise.
- The constraint is becoming complicated — the cost of comprehension exceeds the win.
- Public API stability is critical — recursive interfaces are sticky and break implementers.
- The audience is junior and the abstraction obscures the code.
- Reflection is unavoidable anyway — recursive bounds will not help.
A short rule: recursive constraints are for reusable libraries with type-preserving operations, not for one-off application code.
Cleaner-code optimizations¶
Performance is one axis. Code clarity is another. Recursive constraints shine for cleanliness when:
1. Eliminating type assertions¶
Before:
After:
2. Eliminating boxing¶
Before: every element wrapped into a Cloner interface header. After: elements stay as their concrete type.
3. Sharing helpers across builders¶
Before: one helper per builder type. After: one generic helper for any builder satisfying Stepper[B].
4. Compile-time enforcement¶
Recursive constraints catch wrong return types at compile time. A Clone that returns interface{} instead of T is a compile error, not a runtime bug.
Summary¶
Recursive type constraints have no inherent runtime cost beyond what any generic incurs. The cost profile is:
- Big win vs interface-returning-interface — boxing and dispatch eliminated.
- Tied with hand-written when stenciled for a single concrete type.
- Slight loss vs hand-written for diverse pointer-shaped types (dictionary lookups).
Optimizing with recursive constraints in mind:
- Use them when the concrete type genuinely must survive.
- Profile before assuming a slowdown — most cases are within a few percent of hand-written.
- Devirtualisation kicks in for single-site, single-type call patterns.
- Watch for escape-analysis surprises —
go build -gcflags="-m"reveals them. - Replace with codegen or specialisation only if benchmarks justify it.
Cleanliness benefits often dwarf raw-speed concerns. Eliminating type assertions, removing boxing, and enforcing return types at compile time make code shorter and safer. The biggest "why" answer for recursive constraints is rarely raw nanoseconds — it is fewer assertions, less manual type management, more compile-time safety. Performance is a bonus.