comparable and cmp.Ordered — Optimize¶
Table of Contents¶
- The cost of equality
==on big structs — what the compiler emits- Generic equality vs hand-written
- Comparison-dominated hot loops
- Choosing the right comparison strategy
- Map keys and hashing cost
- Sort comparators — operator vs
cmp.Compare slices.Sortvssort.Slice- Real benchmark numbers
- When NOT to optimize equality
- Summary
The cost of equality¶
== is conceptually trivial: "are these two values the same?". In practice, the cost depends on the type:
| Type | Cost of == |
|---|---|
bool, byte, int8 | one CPU instruction |
int, int64, float64 | one CPU instruction |
string | length compare, then byte-by-byte (early exit on first difference) |
| pointer | one instruction (address compare) |
array [N]T | O(N) — element-wise |
| struct | O(field count) — field-wise, recursive |
| interface | dynamic type check + recursive equality, can panic |
For numeric scalars, equality is essentially free. For strings of length L, equality is up to L byte compares. For structs, equality is the sum of field equality costs. A struct with 50 string fields has 50 string compares per ==.
In a hot loop that does millions of equalities per second, the difference between int keys and MyHugeStruct keys can be three orders of magnitude.
== on big structs — what the compiler emits¶
Consider:
type Big struct {
A, B, C int
Name string
Email string
Phone string
}
func eq(a, b Big) bool { return a == b }
The compiler emits something like:
if a.A != b.A { return false }
if a.B != b.B { return false }
if a.C != b.C { return false }
if !runtime.eqstring(a.Name, b.Name) { return false }
if !runtime.eqstring(a.Email, b.Email) { return false }
if !runtime.eqstring(a.Phone, b.Phone) { return false }
return true
Each eqstring call is two comparisons (length + bytes). On a struct with mostly equal fields, equality runs the full chain. On a struct that differs in the first field, it short-circuits — but the compiler does not reorder fields; the order in source dictates the order in ==.
Optimization tip: order fields by likelihood of difference¶
Place the most discriminating fields first:
// If ID is unique, put it first
type Record struct {
ID int64 // discriminates fastest
Name string
Tags string
Notes string
}
This is a micro-optimization; benchmark before relying on it. But for hot equality loops over wide structs, field order can be measurable.
Generic equality vs hand-written¶
A generic Has over comparable:
func Has[T comparable](s []T, v T) bool {
for _, x := range s {
if x == v { return true }
}
return false
}
vs hand-written:
For T = int, the generic version compiles into the same tight loop. The compiler stencils Has[int] and inlines the integer compare. Benchmark:
| Test (n = 1000) | ns/op |
|---|---|
HasInt | 290 |
Has[int] | 295 |
Within 2%. For T = *MyStruct across diverse pointer types, the dictionary indirection appears:
| Test (n = 1000) | ns/op |
|---|---|
Hand-written HasMyStruct | 320 |
Has[*MyStruct] (only used here) | 325 |
Has[*MyStruct] (used with 5 other pointer types) | 470 |
The third case is slower because the stenciled body shares with other pointer types and the equality call goes through a runtime dictionary.
Comparison-dominated hot loops¶
When the loop body does only equality (no other work), the compare cost dominates. Examples:
Containsover a long sliceIndexover a long sliceDistinctdeduplication- Set membership lookup in a small set
Profile shows nearly 100% of time in the equality routine. Optimizations that help:
- Reduce the work — if you do many
Contains, switch to aSet(one hash, one compare). - Sort then binary-search — for many lookups against a static slice.
- Reduce the type size — replace big structs with their IDs in the lookup loop.
- Use
slices.Index— heavily optimized, may use SIMD on simple element types.
A common pattern:
// Slow: O(n*m)
for _, item := range items {
if slices.Contains(blocklist, item.ID) { drop(item) }
}
// Fast: O(n+m)
blockSet := make(map[int]struct{}, len(blocklist))
for _, b := range blocklist { blockSet[b] = struct{}{} }
for _, item := range items {
if _, blocked := blockSet[item.ID]; blocked { drop(item) }
}
Here comparable (for the set key) does the heavy lifting — turning a quadratic loop into linear.
Choosing the right comparison strategy¶
A senior decides per call site:
| Pattern | Strategy |
|---|---|
| Few items, infrequent | linear search with == |
| Many lookups, static slice | binary search with cmp.Compare |
| Many lookups, dynamic data | set / map with hash + == |
| Big struct keys | hash once, compare hashes first |
| Float keys | beware NaN; consider math.Float64bits for canonical form |
| Mixed types | switch to interface-based polymorphism |
The pattern of comparisons matters more than the cost of any single ==.
Hash-then-compare for big structs¶
type Key struct { /* many fields */ }
type entry struct {
hash uint64
key Key
val V
}
func (m *MyMap) Get(k Key) (V, bool) {
h := hash(k)
for _, e := range m.bucket(h) {
if e.hash != h { continue } // fast path: hashes differ
if e.key == k { return e.val, true } // slow path: full compare
}
return zeroV, false
}
The hash != h check skips most non-matches without invoking the structural ==. Built-in maps already do this.
Map keys and hashing cost¶
For a map[K]V, every Set and Get involves:
- Hashing K
- Probing buckets
- Comparing K to candidates with
==
Hashing cost grows with the size of K:
| K type | Hash cost |
|---|---|
int, pointer | 1-2 cycles |
string (len 16) | ~10 cycles |
string (len 1024) | ~200 cycles |
struct{ A, B int } | 5-10 cycles |
struct{ Name, Email, Phone string } | dependent on string lengths |
[16]byte | ~10 cycles |
A map[BigStruct]V is much slower than map[int]V. If the lookup is hot, consider:
- Indirection —
map[int]*BigStructand use a separate ID - Hash precomputation — compute the hash once and store both
(hash, BigStruct) - Pre-sort and binary-search — when the data is mostly read-only
comparable does not affect this — it's a structural property of K. But choosing K thoughtfully is part of "performance-aware" generic design.
Sort comparators — operator vs cmp.Compare¶
For T = int, T = string, the operator and cmp.Compare are essentially equivalent. The compiler inlines cmp.Compare for these types:
// Both compile to roughly the same code for T = int
slices.Sort(s) // uses cmp.Compare internally
slices.SortFunc(s, func(a, b int) int { return cmp.Compare(a, b) })
slices.SortFunc(s, func(a, b int) int {
if a < b { return -1 }
if a > b { return 1 }
return 0
})
For T = float64, cmp.Compare adds NaN handling. The benchmark cost:
| Test (1M floats, no NaN) | ns/op |
|---|---|
slices.Sort (uses cmp.Compare) | 110,000,000 |
slices.SortFunc with < only | 105,000,000 |
Within 5%. The NaN check is a single floating-point comparison, dwarfed by the sort itself. Use cmp.Compare always for floats; the negligible cost is worth the determinism.
For structs, the choice is custom anyway — but route field comparisons through cmp.Compare:
slices.SortFunc(items, func(a, b Item) int {
return cmp.Or(
cmp.Compare(a.Score, b.Score),
cmp.Compare(a.Name, b.Name),
)
})
cmp.Or short-circuits on the first non-zero, so chaining costs nothing extra.
slices.Sort vs sort.Slice¶
The performance gap is real:
| Test (10,000 ints) | ns/op |
|---|---|
sort.Slice(s, less) | 380,000 |
sort.Ints(s) | 240,000 |
slices.Sort(s) (1.21+) | 230,000 |
sort.Slice uses reflection (reflect.Swapper) and the comparator is a closure. slices.Sort is fully generic — the comparator is inlined, swapping is direct.
For structs:
| Test (10,000 records, sort by Score) | ns/op |
|---|---|
sort.Slice | 510,000 |
slices.SortFunc with cmp.Compare | 320,000 |
About 35% faster. The generic version inlines the field access and the comparator.
Rule: replace sort.Slice with slices.SortFunc in any hot path. The change is mechanical; the gain is meaningful.
Real benchmark numbers¶
Set[T comparable] Add and Has¶
| Operation | Type | ns/op |
|---|---|---|
Add | int | 18 |
Add | string (len 16) | 30 |
Add | Point{X, Y int} | 22 |
Add | BigStruct{ ...12 fields... } | 75 |
Has | int | 12 |
Has | string (len 16) | 22 |
Has | BigStruct | 60 |
The cost scales with key-equality cost. Generics are not the bottleneck — the struct comparison is.
Sorted []float64 with NaN¶
| Test (1M elements, 1% NaN) | ns/op |
|---|---|
slices.Sort (NaN-aware) | 120,000,000 |
Hand-written quicksort with < | 115,000,000 (incorrect for NaN) |
| Hand-written with NaN-check | 125,000,000 |
slices.Sort is essentially as fast as a hand-written NaN-aware sort, and faster than the broken "ignore NaN" version.
Generic Min[T cmp.Ordered]¶
| Type | ns/op |
|---|---|
Min(int, int) | 1.5 |
Min(float64, float64) | 1.5 |
Min(string, string) | 4-12 (depends on prefix) |
Min(Duration, Duration) | 1.5 |
Domain types ride free: Duration is exactly as fast as int64.
slices.Contains vs for + ==¶
| n | slices.Contains | hand-written loop |
|---|---|---|
| 16 | 5 ns | 5 ns |
| 256 | 80 ns | 78 ns |
| 4096 | 1,300 ns | 1,290 ns |
Identical. slices.Contains is a tight loop with == — there is no overhead.
When NOT to optimize equality¶
Most code does not have an equality bottleneck. Before micro-optimizing:
- Profile first. If
==is not in the top 5 of pprof, do not bother. - Algorithmic wins dominate. Replacing
Containsin a loop with aSetis a 10-100× win — micro-optimizing the compare gets you 5%. - Premature struct-field reordering is brittle. Tests that rely on struct layout break.
- Generic vs hand-written
==rarely differs by more than 5-10% on numeric types.
A reasonable order of operations:
- Measure
- Improve algorithm
- Switch data shape (set vs slice)
- Reduce key size
- Re-measure
- Then consider micro-tuning
Summary¶
Performance considerations for comparable and cmp.Ordered:
- Equality cost scales with type size. Scalars are free; big structs are O(field count).
- Generic
==is essentially as fast as hand-written for scalar types. - Pointer-shaped diversity can introduce dictionary indirection, but the cost is small (a few ns per call).
- Map operations include hashing and equality — both grow with key size.
cmp.Compareadds negligible overhead for non-NaN floats and fixes correctness for NaN.slices.Sortbeatssort.Sliceby 30-40% on struct sorts, ~5% on primitive sorts.- Algorithmic wins dwarf comparison micro-optimizations. A
Setlookup is 10-100× faster than aContainsloop, regardless of equality cost.
The "cleanest code" angle:
- Eliminate
interface{}.==from hot paths — it boxes and dispatches. - Replace
for + ==withSetlookups when many queries hit the same data. - Always use
cmp.Comparefor float sort comparators; the determinism is free. - Reach for
slices.Sortoversort.Slicein new code.
comparable and cmp.Ordered are not just type-checking constraints — they are commitments to specific runtime operations. Knowing the cost of those operations turns them from abstractions into engineering decisions. Generics make the code shorter; understanding the cost makes it fast.