Skip to content

Recursive Type Constraints — Optimize

Table of Contents

  1. The performance question
  2. Method dispatch under recursive bounds
  3. Dictionary lookups in recursive contexts
  4. Comparison vs interface returning interface
  5. Comparison vs hand-written code
  6. Real benchmark numbers
  7. Simplifying when possible
  8. When NOT to use recursive constraints
  9. Cleaner-code optimizations
  10. Summary

The performance question

The first question after introducing a recursive constraint to a hot path:

"Does [T C[T]] cost more than [T any]?"

The answer: not directly. The recursion is a type-system trick. By the time the compiler emits code, T is concrete and the call is a regular method call. But there are still subtle costs:

  • Dictionary lookups for shared GC-shape stencils, just like any generic.
  • Indirect method calls when the compiler cannot devirtualise.
  • Heap allocations when escape analysis is conservative due to GC shape grouping.

Memorize this rule: a recursive constraint adds no extra runtime cost beyond what an ordinary generic with the same method calls would incur.


Method dispatch under recursive bounds

When func DupAll[T Cloner[T]](xs []T) []T is called with T = User:

  1. The compiler stencils a body where T = User.
  2. Inside the body, v.Clone() is directly dispatched to User.Clone because the compiler knows the concrete type at this stencil.
  3. No interface table lookup. No dynamic dispatch.

So the call is as fast as a hand-written User-specific function.

Multi-instantiation

If DupAll is called with User, Order, Product, Invoice, … the compiler generates one stencil per GC shape. Pointer-shaped types share a stencil; the dictionary distinguishes them.

Inside the shared stencil, the call to Clone is routed through the dictionary:

v.Clone()
dictionary.method[Clone] → concrete address
direct call

The dictionary lookup is one indirect load. Modern CPUs handle it in a few cycles — usually invisible unless the loop is very tight.


Dictionary lookups in recursive contexts

A recursive constraint typically requires one method (Clone, CompareTo, Step). The dictionary entry for that method is the hot path.

Dictionary structure (conceptual)

type dict[T] struct {
    typeDescriptor *_type
    methods        [n]unsafe.Pointer // includes Clone, CompareTo, etc.
}

For each call to v.Clone() inside the stencil, the generated code:

  1. Loads the dictionary pointer (already in a register from the call).
  2. Loads the method pointer from the dictionary's method table.
  3. Calls.

This is one extra load compared to the hand-written version. For a Clone that itself does heavy work (allocates, copies fields), the load is invisible. For a trivial Clone (e.g., a value-type identity clone), the load shows up in microbenchmarks.

Devirtualisation

If a generic function with a recursive bound is called from one site with one concrete type, the compiler can sometimes devirtualise — replace the dictionary load with a direct call. This happens for:

  • Static, single-call-site instantiations.
  • Profile-guided optimisation (PGO) hot paths in Go 1.21+.

If you have a DupAll[Token] that runs millions of times in one call site, the compiler may inline it as if hand-written.


Comparison vs interface returning interface

The performance gap between recursive constraints and the older "interface returning interface" pattern is large.

Why the older pattern is slow

type Cloner interface { Clone() Cloner }

Every Clone() call:

  1. Goes through the interface vtable.
  2. Returns a value boxed into Cloner (allocation possible).
  3. The caller must assert (v.(*User)) to use the concrete API — a type check at runtime.

The recursive bound version skips all three.

Benchmark — cloning 1M values

Approach ns/op allocs/op
Hand-written func DupAll(xs []User) []User 9.5 ms 1
Generic DupAll[T Cloner[T]] instantiated for User 9.7 ms 1
Interface-only DupAll(xs []Cloner) []Cloner 38 ms 1,000,001

The interface approach is 4x slower and allocates per element. The generic recursive bound is within 2% of hand-written.


Comparison vs hand-written code

When the recursive constraint is instantiated for a single concrete type, the gap to hand-written is essentially zero.

Numeric / scalar types

Approach ns/op
Hand-written func DupInts(xs []int) []int 280
DupAll[T Cloner[T]] for IntBox struct{V int} 290

Within 4%.

Pointer-shaped types

Approach ns/op
Hand-written func DupUsers(xs []*User) []*User 14
DupAll[T Cloner[T]] for *User 22

About 50% slower because of the dictionary load on each Clone. Still much faster than the interface approach.

Hot-path advice

If a recursive-bound generic shows up in profiling as the bottleneck, write a hand-rolled non-generic version for the hot type and have the generic version delegate to it.


Real benchmark numbers

From community benchmarks since Go 1.21:

Self-cloning a slice of 100k structs

Test ns/op bytes/op
Hand-written 1.1 ms 800 kB
Generic recursive bound 1.15 ms 800 kB
Interface returning interface 4.8 ms 4.8 MB

Sorting 10k Money values via Comparable[T]

Test ns/op
Hand-written sortMoney 380,000
Generic Sort[T Comparable[T]] 410,000
sort.Sort with method dispatch 720,000

Fluent builder chain (1k iterations)

Test ns/op
Hand-written builder methods 12,000
Run[B Stepper[B]] generic helper 13,500
Interface-based Stepper.Step() returning interface 26,000

Simplifying when possible

A recursive constraint is the right tool when the concrete type must survive. Otherwise, simpler designs win.

Case 1 — no need to keep the type

If callers do not need the concrete type after Clone(), an ordinary interface is fine:

type AnyCloner interface { Clone() AnyCloner }

No recursion needed.

Case 2 — single call site

If you have one place calling DupAll, and one type, just write a non-generic function:

func DupUsers(xs []User) []User { ... }

No generic, no constraint, no overhead.

Case 3 — comparable primitive

If your Comparable[T] is only used for int, float64, string, use cmp.Ordered:

func Sort[T cmp.Ordered](xs []T) { ... }

No method-call indirection — direct integer compare.

Case 4 — codegen

If you have many pointer-shaped types and a hot loop, generated per-type code outperforms a recursive-bound generic. Tools like mockery or hand-rolled templates may be the right choice.

Decision matrix

Need Tool
Concrete type must survive across calls Recursive constraint
One call site, one type Non-generic function
Many primitive types, ordered cmp.Ordered
Many pointer types, hot loop Codegen or hand specialisation
Method on heterogeneous slice Plain interface

When NOT to use recursive constraints

  1. One concrete type used everywhere — write the function for that type directly.
  2. The hot path benchmarks favour specialisation — specialise.
  3. The constraint is becoming complicated — the cost of comprehension exceeds the win.
  4. Public API stability is critical — recursive interfaces are sticky and break implementers.
  5. The audience is junior and the abstraction obscures the code.
  6. Reflection is unavoidable anyway — recursive bounds will not help.

A short rule: recursive constraints are for reusable libraries with type-preserving operations, not for one-off application code.


Cleaner-code optimizations

Performance is one axis. Code clarity is another. Recursive constraints shine for cleanliness when:

1. Eliminating type assertions

Before:

ys := DupAll(xs) // []Cloner
y0 := ys[0].(User)
y0.Send()

After:

ys := DupAll(xs) // []User
ys[0].Send()

2. Eliminating boxing

Before: every element wrapped into a Cloner interface header. After: elements stay as their concrete type.

3. Sharing helpers across builders

Before: one helper per builder type. After: one generic helper for any builder satisfying Stepper[B].

4. Compile-time enforcement

Recursive constraints catch wrong return types at compile time. A Clone that returns interface{} instead of T is a compile error, not a runtime bug.


Summary

Recursive type constraints have no inherent runtime cost beyond what any generic incurs. The cost profile is:

  • Big win vs interface-returning-interface — boxing and dispatch eliminated.
  • Tied with hand-written when stenciled for a single concrete type.
  • Slight loss vs hand-written for diverse pointer-shaped types (dictionary lookups).

Optimizing with recursive constraints in mind:

  1. Use them when the concrete type genuinely must survive.
  2. Profile before assuming a slowdown — most cases are within a few percent of hand-written.
  3. Devirtualisation kicks in for single-site, single-type call patterns.
  4. Watch for escape-analysis surprisesgo build -gcflags="-m" reveals them.
  5. Replace with codegen or specialisation only if benchmarks justify it.

Cleanliness benefits often dwarf raw-speed concerns. Eliminating type assertions, removing boxing, and enforcing return types at compile time make code shorter and safer. The biggest "why" answer for recursive constraints is rarely raw nanoseconds — it is fewer assertions, less manual type management, more compile-time safety. Performance is a bonus.