Stdlib Generic Packages — Optimize¶
Table of Contents¶
- Pre-allocation
- Choose
CompactoverCompactFuncwhen possible BinarySearchoverIndexfor sorted datamaps.Cloneover manual copy loopsslices.Concatover chainedappend- Avoiding closure overhead
- Iterator vs slice tradeoffs
- Summary
Pre-allocation¶
Most performance wins with slices come from telling Go the final size upfront.
Slice from map keys¶
// Slow: starts at zero capacity, grows multiple times
keys := []string{}
for k := range m { keys = append(keys, k) }
// Fast: pre-sized
keys := make([]string, 0, len(m))
for k := range m { keys = append(keys, k) }
// Fastest (1.23+): runtime knows the count
keys := slices.Collect(maps.Keys(m))
slices.Collect allocates the slice with the right size hint when the iterator implements Len. For iter.Seq[K] from maps.Keys, the runtime passes the map's length, so the result is allocated once.
Concatenating slices¶
// Slow: may reallocate at each append
all := []int{}
all = append(all, a...)
all = append(all, b...)
all = append(all, c...)
// Fast: one allocation
all := slices.Concat(a, b, c) // 1.22+
slices.Concat walks all input slices to compute total length, allocates the destination once, and copies. Saves up to 30% on hot paths that build large slices from multiple sources.
Choose Compact over CompactFunc when possible¶
For comparable types, slices.Compact uses == directly:
For non-comparable types or custom equality, use CompactFunc:
Performance difference: Compact inlines the comparison for primitive types. CompactFunc calls a function value, which is almost always slower unless the closure inlines.
Benchmark sketch¶
| Operation | ns/op |
|---|---|
slices.Compact([]int{...}) | ~1.0n |
slices.CompactFunc([]int{...}, func(a, b int) bool { return a == b }) | ~1.4n |
Roughly 30-40% slower. Negligible for small slices, noticeable for large ones.
Rule¶
If == works for your type, use Compact. Reach for CompactFunc only when: - The type is non-comparable (slice, map, function as element) - You want a domain-specific equality (strings.EqualFold, ID-only matching, etc.)
BinarySearch over Index for sorted data¶
slices.Index is O(n). slices.BinarySearch is O(log n). The crossover is around n = 20.
Decision table¶
| Slice state | Use |
|---|---|
| Sorted | BinarySearch |
| Unsorted, will be searched once | Index |
| Unsorted, will be searched many times | Sort first, then BinarySearch |
| Predicate-based (not exact) | IndexFunc |
| Sorted by a derived key | BinarySearchFunc |
Benchmark sketch (n = 1,000)¶
| Operation | ns/op |
|---|---|
slices.Index(s, target) | ~250 |
slices.BinarySearch(s, target) | ~10 |
A 25× speedup once you pay the one-time sort cost.
Don't sort just to search once¶
If you only need to search once, plain Index is correct. Sorting is O(n log n) — more expensive than the linear scan it replaces. The win only materializes for repeated searches.
maps.Clone over manual copy loops¶
The runtime fast path in maps.Clone knows the internal hash table layout:
// Slow
dst := make(map[K]V, len(src))
for k, v := range src { dst[k] = v }
// Fast (2-3× faster on large maps)
dst := maps.Clone(src)
The runtime helper copies bucket layouts directly, skipping the hash recomputation that the for-loop version triggers.
When manual is needed¶
Manual copy is required when: - You want a deep copy (the runtime helper is shallow) - You want to filter or transform during copy - You have a partial map view
For all other cases, maps.Clone is faster and shorter.
slices.Concat over chained append¶
Chained append reallocates whenever capacity is exceeded:
slices.Concat walks all inputs once to sum lengths, allocates exactly once:
Benchmark sketch¶
| Operation | allocations |
|---|---|
Chained append (4 slices, total len 1000) | 2-3 |
slices.Concat | 1 |
The improvement scales with the number of input slices.
Avoiding closure overhead¶
SortFunc, IndexFunc, CompactFunc, EqualFunc, etc., take func values. The Go compiler tries to inline them, but does not always succeed.
Patterns that inline well¶
Patterns that inline less reliably¶
slices.SortFunc(s, func(a, b Person) int {
return cmp.Compare(a.Age, b.Age) // simple closure — usually inlines
})
slices.SortFunc(s, func(a, b Person) int {
if a.Age == b.Age && a.Name == b.Name { return 0 }
// ... complex body
}) // less likely to inline
Tip¶
Pass cmp.Compare directly when the slice element type is cmp.Ordered:
The compiler treats this as a direct reference and inlines aggressively.
Verify with -gcflags¶
Look for "inlining call to ..." messages. Generic functions show up with shape suffixes.
Iterator vs slice tradeoffs¶
In Go 1.23+, maps.Keys returns iter.Seq[K]. You have two paths:
Use the iterator directly¶
No allocation. The iterator yields keys lazily.
Materialize into a slice¶
One allocation, but you can sort, search, or pass the slice to other functions.
Decision¶
| Need | Choice |
|---|---|
| One-shot loop | iterator |
| Sort or search | materialize |
| Pass to other functions | materialize |
| Very large map, memory-sensitive | iterator |
For most application code the difference is invisible. For library hot paths, the iterator can save the slice allocation entirely.
Summary¶
The biggest performance levers when using slices, maps, and cmp:
- Pre-allocate with
make([]T, 0, n)or useslices.Concat/slices.Collect. - Prefer
CompactoverCompactFuncwhen==works. - Switch from
IndextoBinarySearchfor sorted data andn > 20. - Use
maps.Cloneinstead of manual copy loops. - Pass
cmp.Comparedirectly to*FuncAPIs when possible — better inlining. - Range over iterators to skip slice allocation when you only need a one-shot loop.
Most teams gain a measurable percentage off their CPU profile by applying just rules 1, 3, and 4. The remaining rules matter for hot paths in libraries and middleware.
A practical workflow:
- Write the obvious code with
slices/maps/cmp. - Profile with
pprof. - If a stdlib helper shows up hot, check the variant table — there is often a faster choice.
- Pre-allocate where the size is known.
- Re-profile.
The stdlib generic packages are well-optimized. Most performance work is choosing which function to call, not optimizing the function itself.