8.20 strconv — Optimize¶
Ten optimizations grounded in benchmarks. Each is a before/after pair with measurable improvement. Run
go test -bench=. -benchmem -benchtime=1sto verify.
O1 — Replace fmt.Sprintf("%d", x) with strconv.Itoa¶
Before¶
After¶
6× faster, half the allocations.
O2 — Replace Itoa with AppendInt into a pooled buffer¶
Before¶
func encode(values []int) string {
var b strings.Builder
for _, v := range values {
b.WriteString(strconv.Itoa(v))
b.WriteByte(',')
}
return b.String()
}
Each Itoa allocates a fresh string. For 100 values, 100 transient allocations.
After¶
func encode(values []int) string {
var b strings.Builder
b.Grow(len(values) * 6)
var buf [16]byte
for _, v := range values {
b.Write(strconv.AppendInt(buf[:0], int64(v), 10))
b.WriteByte(',')
}
return b.String()
}
buf is stack-allocated (escape analysis keeps it there). One allocation per encode call (the final string).
O3 — Pre-validate before parsing¶
Before¶
When called with malformed input frequently (e.g., from probe clients), each failure allocates a *NumError (~50 bytes).
After¶
func parsePort(s string) (int, error) {
if len(s) == 0 || len(s) > 5 {
return 0, errInvalidPort
}
for _, c := range []byte(s) {
if c < '0' || c > '9' {
return 0, errInvalidPort
}
}
return strconv.Atoi(s)
}
The fast-rejection path avoids the NumError allocation entirely. For hostile traffic mostly producing failures, this halves CPU and removes GC pressure.
O4 — Use Atoi instead of ParseInt(s, 10, 0)¶
Before¶
After¶
Atoi has an inline fast path for short decimals; ParseInt always goes through the general path. ~2× faster for short inputs.
O5 — Choose the right bitSize¶
Before¶
v, err := strconv.ParseInt(s, 10, 64)
if err != nil { return err }
if v > math.MaxInt32 || v < math.MinInt32 {
return errOverflow
}
return int32(v), nil
Two range checks: one inside ParseInt, one outside.
After¶
v, err := strconv.ParseInt(s, 10, 32)
if err != nil { return err } // includes overflow
return int32(v), nil
ParseInt does the range check itself when given bitSize=32. One fewer comparison; clearer intent.
O6 — Avoid string(b) round-trip for parsing¶
Before¶
After¶
For Go 1.20+:
func parseField(b []byte) (int, error) {
return strconv.Atoi(unsafe.String(unsafe.SliceData(b), len(b)))
}
Or, more portably, take advantage of the compiler optimization that elides string(b) when used as a function argument:
Confirm with -gcflags="-m": you want to see "does not escape" for the string(b) expression.
O7 — Replace FormatFloat(_, 'f', -1, 64) with 'g' when possible¶
Before¶
'f' with prec=-1 produces non-scientific output, but for very small or very large values, it can produce extremely long strings:
After¶
'g' switches to scientific notation when it's shorter. Smaller output, fewer bytes to allocate, faster downstream parsing.
Pick 'f' only when the consumer can't parse scientific notation.
O8 — Batch parse with worker pool¶
For parsing 10M strings to ints:
Before¶
results := make([]int, len(inputs))
for i, s := range inputs {
v, _ := strconv.Atoi(s)
results[i] = v
}
Single-threaded, ~50M ops/sec on a single core.
After¶
results := make([]int, len(inputs))
var wg sync.WaitGroup
chunk := (len(inputs) + runtime.NumCPU() - 1) / runtime.NumCPU()
for w := 0; w < runtime.NumCPU(); w++ {
start := w * chunk
end := start + chunk
if end > len(inputs) { end = len(inputs) }
wg.Add(1)
go func(start, end int) {
defer wg.Done()
for i := start; i < end; i++ {
results[i], _ = strconv.Atoi(inputs[i])
}
}(start, end)
}
wg.Wait()
Parallelizes across cores. For a CPU-bound workload, scales nearly linearly to NumCPU.
Caveat: writing to results[i] from multiple goroutines is safe only because each writes a unique index. Sharing a single slice header is fine; sharing a single index would race.
O9 — Reuse []byte for AppendInt output¶
Before¶
Each call allocates a fresh slice.
After¶
buf := make([]byte, 0, 32)
for _, v := range values {
buf = strconv.AppendInt(buf[:0], int64(v), 10)
process(buf)
}
One allocation total (the initial make). The buf[:0] reslice keeps the capacity; AppendInt writes into it.
Caveat: if process retains the slice past the next iteration, it must copy. The contract here is "valid for this iteration only".
O10 — Profile-driven Sprintf audit¶
This isn't an optimization recipe — it's the workflow.
If the top frames include:
fmt.(*pp).doPrintf→ look forSprintfin the calling code.runtime.convT64orconvT32→ look forinterface{}boxing (often fromfmt.Sprint*).runtime.mallocgcfromstrconv.FormatInt→ switch toAppendInt.runtime.mallocgcfromstrconv.NumError→ fast-reject before parsing.
Apply the fix. Re-profile. Repeat until the top frames are your business logic, not number conversions.
Bonus — When fmt is actually fine¶
The above optimizations matter when:
- Sustained throughput > 100k calls/sec on the path.
- A profile pinpoints
fmt/strconvas the top frames. - The code is reviewed and tested (no regression risk).
If your code converts numbers only on startup, in error messages, or once per HTTP request at < 1k req/s, the difference between fmt.Sprintf and strconv.Itoa is unmeasurable. Pick the clearest version.
Production rules:
| Path | Tool |
|---|---|
| Hot serialization (> 10k/sec) | Append* into pooled buffer |
| Warm code (1k–10k/sec) | Itoa/FormatFloat |
| Cold code (< 100/sec) | fmt.Sprintf for readability |
Checklist¶
After optimizing a number-heavy path:
- No
fmt.Sprintffor single-value conversions. -
Append*used in inner loops, results written to pooled buffers or stack arrays. -
bitSizeparameter matches destination type. - Errors checked, not silently discarded.
- No unnecessary
string(b)/[]byte(s)conversions. - Benchmark shows the expected improvement.
- Race detector (
go test -race) passes if concurrency was added. - The change is documented (commit message or comment) explaining the why.