Recursive Type Constraints — Optimize¶

Table of Contents¶

The performance question
Method dispatch under recursive bounds
Dictionary lookups in recursive contexts
Comparison vs interface returning interface
Comparison vs hand-written code
Real benchmark numbers
Simplifying when possible
When NOT to use recursive constraints
Cleaner-code optimizations
Summary

The performance question¶

The first question after introducing a recursive constraint to a hot path:

"Does [T C[T]] cost more than [T any]?"

The answer: not directly. The recursion is a type-system trick. By the time the compiler emits code, T is concrete and the call is a regular method call. But there are still subtle costs:

Dictionary lookups for shared GC-shape stencils, just like any generic.
Indirect method calls when the compiler cannot devirtualise.
Heap allocations when escape analysis is conservative due to GC shape grouping.

Memorize this rule: a recursive constraint adds no extra runtime cost beyond what an ordinary generic with the same method calls would incur.

Method dispatch under recursive bounds¶

When func DupAll[T Cloner[T]](xs []T) []T is called with T = User:

The compiler stencils a body where T = User.
Inside the body, v.Clone() is directly dispatched to User.Clone because the compiler knows the concrete type at this stencil.
No interface table lookup. No dynamic dispatch.

So the call is as fast as a hand-written User-specific function.

Multi-instantiation¶

If DupAll is called with User, Order, Product, Invoice, … the compiler generates one stencil per GC shape. Pointer-shaped types share a stencil; the dictionary distinguishes them.

Inside the shared stencil, the call to Clone is routed through the dictionary:

v.Clone()
   │
   ▼
dictionary.method[Clone] → concrete address
   │
   ▼
direct call

The dictionary lookup is one indirect load. Modern CPUs handle it in a few cycles — usually invisible unless the loop is very tight.

Dictionary lookups in recursive contexts¶

A recursive constraint typically requires one method (Clone, CompareTo, Step). The dictionary entry for that method is the hot path.

Dictionary structure (conceptual)¶

type dict[T] struct {
    typeDescriptor *_type
    methods        [n]unsafe.Pointer // includes Clone, CompareTo, etc.
}

For each call to v.Clone() inside the stencil, the generated code:

Loads the dictionary pointer (already in a register from the call).
Loads the method pointer from the dictionary's method table.
Calls.

This is one extra load compared to the hand-written version. For a Clone that itself does heavy work (allocates, copies fields), the load is invisible. For a trivial Clone (e.g., a value-type identity clone), the load shows up in microbenchmarks.

Devirtualisation¶

If a generic function with a recursive bound is called from one site with one concrete type, the compiler can sometimes devirtualise — replace the dictionary load with a direct call. This happens for:

Static, single-call-site instantiations.
Profile-guided optimisation (PGO) hot paths in Go 1.21+.

If you have a DupAll[Token] that runs millions of times in one call site, the compiler may inline it as if hand-written.

Comparison vs interface returning interface¶

The performance gap between recursive constraints and the older "interface returning interface" pattern is large.

Why the older pattern is slow¶

type Cloner interface { Clone() Cloner }

Every Clone() call:

Goes through the interface vtable.
Returns a value boxed into Cloner (allocation possible).
The caller must assert (v.(*User)) to use the concrete API — a type check at runtime.

The recursive bound version skips all three.

Benchmark — cloning 1M values¶

Approach	ns/op	allocs/op
Hand-written `func DupAll(xs []User) []User`	9.5 ms	1
Generic `DupAll[T Cloner[T]]` instantiated for `User`	9.7 ms	1
Interface-only `DupAll(xs []Cloner) []Cloner`	38 ms	1,000,001

The interface approach is 4x slower and allocates per element. The generic recursive bound is within 2% of hand-written.

Comparison vs hand-written code¶

When the recursive constraint is instantiated for a single concrete type, the gap to hand-written is essentially zero.

Numeric / scalar types¶

Approach	ns/op
Hand-written `func DupInts(xs []int) []int`	280
`DupAll[T Cloner[T]]` for `IntBox struct{V int}`	290

Within 4%.

Pointer-shaped types¶

Approach	ns/op
Hand-written `func DupUsers(xs []User) []User`	14
`DupAll[T Cloner[T]]` for `*User`	22

About 50% slower because of the dictionary load on each Clone. Still much faster than the interface approach.

Hot-path advice¶

If a recursive-bound generic shows up in profiling as the bottleneck, write a hand-rolled non-generic version for the hot type and have the generic version delegate to it.

Real benchmark numbers¶

From community benchmarks since Go 1.21:

Self-cloning a slice of 100k structs¶

Test	ns/op	bytes/op
Hand-written	1.1 ms	800 kB
Generic recursive bound	1.15 ms	800 kB
Interface returning interface	4.8 ms	4.8 MB

Sorting 10k Money values via `Comparable[T]`¶

Test	ns/op
Hand-written `sortMoney`	380,000
Generic `Sort[T Comparable[T]]`	410,000
`sort.Sort` with method dispatch	720,000

Fluent builder chain (1k iterations)¶

Test	ns/op
Hand-written builder methods	12,000
`Run[B Stepper[B]]` generic helper	13,500
Interface-based `Stepper.Step()` returning interface	26,000

Simplifying when possible¶

A recursive constraint is the right tool when the concrete type must survive. Otherwise, simpler designs win.

Case 1 — no need to keep the type¶

If callers do not need the concrete type after Clone(), an ordinary interface is fine:

type AnyCloner interface { Clone() AnyCloner }

No recursion needed.

Case 2 — single call site¶

If you have one place calling DupAll, and one type, just write a non-generic function:

func DupUsers(xs []User) []User { ... }

No generic, no constraint, no overhead.

Case 3 — comparable primitive¶

If your Comparable[T] is only used for int, float64, string, use cmp.Ordered:

func Sort[T cmp.Ordered](xs []T) { ... }

No method-call indirection — direct integer compare.

Case 4 — codegen¶

If you have many pointer-shaped types and a hot loop, generated per-type code outperforms a recursive-bound generic. Tools like mockery or hand-rolled templates may be the right choice.

Decision matrix¶

Need	Tool
Concrete type must survive across calls	Recursive constraint
One call site, one type	Non-generic function
Many primitive types, ordered	`cmp.Ordered`
Many pointer types, hot loop	Codegen or hand specialisation
Method on heterogeneous slice	Plain interface

When NOT to use recursive constraints¶

One concrete type used everywhere — write the function for that type directly.
The hot path benchmarks favour specialisation — specialise.
The constraint is becoming complicated — the cost of comprehension exceeds the win.
Public API stability is critical — recursive interfaces are sticky and break implementers.
The audience is junior and the abstraction obscures the code.
Reflection is unavoidable anyway — recursive bounds will not help.

A short rule: recursive constraints are for reusable libraries with type-preserving operations, not for one-off application code.

Cleaner-code optimizations¶

Performance is one axis. Code clarity is another. Recursive constraints shine for cleanliness when:

1. Eliminating type assertions¶

Before:

ys := DupAll(xs) // []Cloner
y0 := ys[0].(User)
y0.Send()

After:

ys := DupAll(xs) // []User
ys[0].Send()

2. Eliminating boxing¶

Before: every element wrapped into a Cloner interface header. After: elements stay as their concrete type.

Before: one helper per builder type. After: one generic helper for any builder satisfying Stepper[B].

4. Compile-time enforcement¶

Recursive constraints catch wrong return types at compile time. A Clone that returns interface{} instead of T is a compile error, not a runtime bug.

Summary¶

Recursive type constraints have no inherent runtime cost beyond what any generic incurs. The cost profile is:

Big win vs interface-returning-interface — boxing and dispatch eliminated.
Tied with hand-written when stenciled for a single concrete type.
Slight loss vs hand-written for diverse pointer-shaped types (dictionary lookups).

Optimizing with recursive constraints in mind:

Use them when the concrete type genuinely must survive.
Profile before assuming a slowdown — most cases are within a few percent of hand-written.
Devirtualisation kicks in for single-site, single-type call patterns.
Watch for escape-analysis surprises — go build -gcflags="-m" reveals them.
Replace with codegen or specialisation only if benchmarks justify it.

Cleanliness benefits often dwarf raw-speed concerns. Eliminating type assertions, removing boxing, and enforcing return types at compile time make code shorter and safer. The biggest "why" answer for recursive constraints is rarely raw nanoseconds — it is fewer assertions, less manual type management, more compile-time safety. Performance is a bonus.