net/http — Optimization¶
1. How to use this file¶
Fourteen scenarios where net/http code allocates more, blocks longer, or wastes connections versus what the stdlib actually offers. Each entry has a Before (code + benchmark) and a collapsible After (optimized code + benchmark + why + trade-offs + when NOT).
Anchored at Go 1.23, amd64, loopback. Numbers are reproducible-shape — run go test -bench=. -benchmem on your hardware before quoting them. net/http cost is dominated by four things: per-request connection setup, per-request allocations (buffers, headers, request IDs), unread response bodies blocking keep-alive, and synchronous timeouts that aren't actually set. Most wins remove one of those four from the hot path. Reading order: Ex. 1, 3, 10, 12 (the connection-reuse cluster), then Ex. 4, 5, 8 (the allocation cluster), then any order. Ex. 1, 12, 13 are the ones most senior reviews flag.
2. Exercise 1 — Per-request http.Client allocation¶
A worker calls a downstream API in a loop, building a fresh http.Client{} per call. Each client gets its own Transport, so every request opens a new TCP+TLS connection. Connection pooling never engages.
func fetch(url string) ([]byte, error) {
client := &http.Client{Timeout: 5 * time.Second}
resp, err := client.Get(url)
if err != nil { return nil, err }
defer resp.Body.Close()
return io.ReadAll(resp.Body)
}
After
One package-level client, shared. `http.Client` and its `Transport` are safe for concurrent use; the `Transport` is what holds the idle-connection pool. ~70× faster, ~4× less garbage. **Why faster:** Each new `Client` carries a fresh `Transport`, and `Transport` is where the keep-alive pool lives. Sharing the client means request 2+ reuses an idle TCP connection — no `connect()`, no TLS handshake, no new goroutine for `readLoop`/`writeLoop`. The per-request cost drops from "open a socket" to "pull a `persistConn` off a free list." **Trade-off:** A buggy long-lived `Transport` (leaked goroutines, stuck `persistConn`s) now affects every caller. One bad host can starve the per-host pool. Mitigate with `MaxIdleConnsPerHost` and `IdleConnTimeout` (see Ex. 3, 9). **When NOT:** One-shot CLIs that exit after one request — the savings vanish. Tests where you want isolation between cases. Code calling many unrelated hosts where pool sizing tuned for one host hurts another.3. Exercise 2 — io.ReadAll(resp.Body) for every response¶
A proxy reads every upstream response fully into memory before forwarding. Even 200 MB responses get buffered before a single byte goes to the downstream writer.
func proxy(w http.ResponseWriter, upstream *http.Response) error {
body, err := io.ReadAll(upstream.Body)
if err != nil { return err }
w.WriteHeader(upstream.StatusCode)
_, err = w.Write(body)
return err
}
After
Stream with `io.Copy`. The `bufio`-backed response body and the `ResponseWriter` already buffer — there's no reason to materialize the full payload. ~10× faster, ~6400× less memory. **Why faster:** `io.ReadAll` grows a `[]byte` with append — multiple `growslice` realloc/copy cycles, each capped by `MaxInt` but practically OOM-limited. `io.Copy` uses a 32 KB internal buffer and pumps it; peak memory stays bounded regardless of body size. Bytes hit the wire as they arrive, so client-perceived latency drops too. **Trade-off:** You can't retry mid-stream — partial bytes already left the writer. Mid-stream errors from upstream are visible only as truncated responses downstream. Logging response size requires a `countingWriter` wrapper instead of `len(body)`. **When NOT:** You need the full body for signing/verification before forwarding. The body is small enough (< 64 KB) that `io.ReadAll` fits in one allocation. You're parsing JSON — `json.NewDecoder` already streams; see Ex. 4.4. Exercise 3 — Default MaxIdleConnsPerHost = 2¶
A service fans out to one downstream API at high concurrency. The default Transport caps idle connections per host at 2, so the 3rd+ concurrent request closes its connection after use. The next request re-handshakes.
After
Set `MaxIdleConnsPerHost` to match your concurrency. `MaxConnsPerHost` caps the absolute ceiling; leave it 0 (unlimited) unless you're protecting the upstream. ~16× faster. **Why faster:** With the default cap of 2, concurrent requests beyond that close their TCP after returning to the pool — `putIdleConn` calls `c.close()` because the slot is full. New requests pay the full handshake. Raising the cap lets `getConn` find an idle connection instead of dialing. **Trade-off:** Each idle connection holds a file descriptor and ~16 KB of buffers. 100 idle conns × 10 hosts = 1000 FDs you must size `ulimit` for. `IdleConnTimeout` of 90 s means the server's idle timeout must be ≥ that, or you'll race a server-side close. Some load balancers (older AWS NLBs) silently drop idle conns at 350 s — set lower if affected. **When NOT:** Calling many unrelated hosts at low concurrency — wastes FDs. Calling APIs behind aggressive idle-killers (some serverless gateways close at 5 s) — set `IdleConnTimeout` shorter. Calling HTTP/2 servers where one connection multiplexes many streams; tune `MaxConcurrentStreams` server-side instead.5. Exercise 4 — json.Marshal then Write¶
An API handler marshals a response struct into a []byte, then writes it. The intermediate slice is allocated, populated, and immediately discarded.
func handler(w http.ResponseWriter, r *http.Request) {
resp := buildResponse(r)
data, err := json.Marshal(resp)
if err != nil { http.Error(w, err.Error(), 500); return }
w.Header().Set("Content-Type", "application/json")
w.Write(data)
}
After
`json.NewEncoder(w).Encode(resp)` writes directly to `w` using a small internal buffer. No full-payload intermediate slice. ~1.6× faster, ~3× less garbage. **Why faster:** `json.Marshal` calls `json.MarshalAppend(nil, v)`, growing a `bytes.Buffer` to hold the full payload, then copies into the response. `Encoder` streams into the writer with a reusable 4 KB scratch buffer. Allocs drop because no big `[]byte` materializes. **Trade-off:** `Encoder.Encode` appends a trailing newline (often desired, sometimes not — strip if your contract forbids it). Headers must be set before the first `Encode` call (which calls `Write`, locking in the status). Errors mid-encode mean a half-written body with a 200 already sent — log, don't retry. **When NOT:** You need the marshaled bytes for signing (webhook payloads, JWT). You're caching the result — marshal once, write many. Very small responses (< 256 B) where the intermediate slice fits in one alloc and the encoder's setup cost dominates.6. Exercise 5 — bytes.Buffer per request to slurp the body¶
A handler reads the request body into a fresh bytes.Buffer per call to compute a hash before parsing. Every request allocates a fresh 4 KB buffer that grows by append.
func handler(w http.ResponseWriter, r *http.Request) {
var buf bytes.Buffer
if _, err := io.Copy(&buf, r.Body); err != nil { http.Error(w, err.Error(), 400); return }
sum := sha256.Sum256(buf.Bytes())
process(buf.Bytes(), sum)
}
After
`sync.Pool` of `*bytes.Buffer`. Reset and return to the pool on the way out.var bufPool = sync.Pool{New: func() any { return new(bytes.Buffer) }}
func handler(w http.ResponseWriter, r *http.Request) {
buf := bufPool.Get().(*bytes.Buffer)
buf.Reset()
defer bufPool.Put(buf)
if _, err := io.Copy(buf, r.Body); err != nil { http.Error(w, err.Error(), 400); return }
sum := sha256.Sum256(buf.Bytes())
process(buf.Bytes(), sum)
}
7. Exercise 6 — strings.Contains routing¶
A handler dispatches by inspecting r.URL.Path with strings.Contains / HasPrefix. Every request walks the conditional chain; adding routes is O(routes).
func handler(w http.ResponseWriter, r *http.Request) {
switch {
case strings.HasPrefix(r.URL.Path, "/api/v1/users/"):
handleUsers(w, r)
case strings.HasPrefix(r.URL.Path, "/api/v1/orders/"):
handleOrders(w, r)
case strings.Contains(r.URL.Path, "/health"):
handleHealth(w, r)
default:
http.NotFound(w, r)
}
}
After
Go 1.22+ `ServeMux` patterns: method-aware, path-parametric, exact-match-first.mux := http.NewServeMux()
mux.HandleFunc("GET /api/v1/users/{id}", handleUser)
mux.HandleFunc("POST /api/v1/users", createUser)
mux.HandleFunc("GET /api/v1/orders/{id}", handleOrder)
mux.HandleFunc("GET /health", handleHealth)
http.ListenAndServe(":8080", mux)
func handleUser(w http.ResponseWriter, r *http.Request) {
id := r.PathValue("id")
// ...
}
8. Exercise 7 — Headers set via map literal¶
A client request builder constructs headers as a map[string][]string literal, then assigns to req.Header. The literal allocates a map and 1-element slices per key; req.Header may already be a usable empty map.
func send(url, token, traceID string) (*http.Response, error) {
req, _ := http.NewRequest("POST", url, nil)
req.Header = http.Header{
"Authorization": {"Bearer " + token},
"X-Trace-Id": {traceID},
"Content-Type": {"application/json"},
"Accept": {"application/json"},
}
return httpClient.Do(req)
}
After
Use `Set`/`Add` on the already-allocated `req.Header`. `Set` canonicalizes the key via `textproto.CanonicalMIMEHeaderKey` once; literals skip canonicalization, breaking case-insensitive lookups silently. ~1.8× faster, ~2× less garbage. Also: case-correct. **Why faster:** Literal `http.Header{...}` allocates a brand-new map (replacing the one `NewRequest` already created) and one `[]string` per key. `Set` reuses the existing map, growing it inline; values stored at canonical keys hit the lookup fast path. Allocation count drops because the per-key slice is amortized inside the map's value array. **Trade-off:** `Set` overwrites; `Add` appends. Mixing them on the same key is a bug source. Multi-valued headers (`Set-Cookie`, `Vary`) need `Add`. Canonicalization changes `x-trace-id` → `X-Trace-Id` — fine for HTTP but surprises stringly-typed downstream code. **When NOT:** Building a header set once at startup that you immediately copy onto every request (literal + `for k,v := range h { req.Header[k] = v }` is the same cost). Code paths where < 100 req/s makes 2 μs invisible.9. Exercise 8 — bufio.NewReader(body) per call¶
A handler wraps r.Body in bufio.NewReader per request to do line-by-line parsing. The 4 KB default buffer allocates every time.
func handler(w http.ResponseWriter, r *http.Request) {
br := bufio.NewReader(r.Body)
for {
line, err := br.ReadString('\n')
if errors.Is(err, io.EOF) { break }
if err != nil { http.Error(w, err.Error(), 400); return }
processLine(line)
}
}
After
Pool the `*bufio.Reader`. `Reset(io.Reader)` rebinds the underlying source without reallocating the buffer.var brPool = sync.Pool{
New: func() any { return bufio.NewReaderSize(nil, 4096) },
}
func handler(w http.ResponseWriter, r *http.Request) {
br := brPool.Get().(*bufio.Reader)
br.Reset(r.Body)
defer brPool.Put(br)
for {
line, err := br.ReadString('\n')
if errors.Is(err, io.EOF) { break }
if err != nil { http.Error(w, err.Error(), 400); return }
processLine(line)
}
}
10. Exercise 9 — Default IdleTimeout = 0¶
A server uses http.Server{} with no IdleTimeout. Idle keep-alive connections live forever from the server's side, but clients (or their middleboxes) close them at unpredictable times. Worse: with ReadTimeout also 0, a half-open connection from a vanished client never closes.
After
Set `IdleTimeout` to match the keep-alive policy you want. 90 s pairs well with most clients' 90 s `Transport.IdleConnTimeout`. ~2.7× faster on a keep-alive-heavy workload. **Why faster:** With `IdleTimeout` set, the server holds connections open between requests instead of relying on the OS / client to time them out. Clients reuse the TCP+TLS context; the server skips the handshake too. The `state` callback fires on idle transitions, letting the server proactively close idle conns instead of carrying zombies. **Trade-off:** Setting it too low (5 s) kills keep-alive — clients reconnect constantly. Setting it too high (10 min) starves FDs on memory-poor servers. Coordinate with load balancer / proxy idle timeouts: NLB defaults to 350 s, ALB to 60 s. Mismatch causes 502s on reused-but-closed connections. **When NOT:** Internal-only services behind a service mesh that handles keep-alive at the proxy layer (set server `IdleTimeout` short, let the mesh hold the pool). One-shot servers (CLI `go test -httptest`) — defaults are fine.11. Exercise 10 — TLS handshake per request¶
A client calls https://api.example.com but uses a fresh *http.Client each call (compounding Ex. 1). Every call does a full TLS 1.3 handshake — ~30 ms RTT on a typical link, dominated by certificate verification.
func fetch(url string) ([]byte, error) {
client := &http.Client{} // fresh Transport — no session reuse
resp, err := client.Get(url)
if err != nil { return nil, err }
defer resp.Body.Close()
return io.ReadAll(resp.Body)
}
After
Shared client + tuned `Transport`. With keep-alive, only the first request pays the handshake; subsequent requests reuse the TLS session over the same TCP. Even on cold connections, `Transport.TLSClientConfig.ClientSessionCache` lets resumption kick in. ~290× faster on warm connections. **Why faster:** TLS handshake is two RTTs in TLS 1.2, one in TLS 1.3 — plus a CPU spike for ECDHE + certificate chain validation (~5 ms on common hardware). Keep-alive skips the whole thing for request 2+. `ClientSessionCache` reuses session tickets across reconnects, dropping cold-start to a 0-RTT handshake. **Trade-off:** Shared sessions across goroutines is fine, but `ClientSessionCache` size of 64 means high-churn workloads still miss. Pinning HTTP/2 with `ForceAttemptHTTP2` means one TCP per host multiplexes — but a stuck stream blocks the connection. Some servers send `Connection: close` periodically — your pool will leak slots until cleanup. **When NOT:** You're calling a unique host every time (web crawler) — keep-alive doesn't help and the cache fills with one-shot sessions. mTLS environments where session resumption is disabled for compliance. Tests where you want isolated TLS state per case.12. Exercise 11 — httputil.DumpRequest on the hot path¶
A logging middleware uses httputil.DumpRequest(r, true) to print every request. The body is read into memory, replaced with a fresh reader, and the whole thing is formatted via fmt.
func loggingMiddleware(h http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
dump, err := httputil.DumpRequest(r, true)
if err == nil { log.Printf("req:\n%s", dump) }
h.ServeHTTP(w, r)
})
}
After
Structured logging with explicit fields. No body buffering, no full-request formatting; capture only what a debugger needs.func loggingMiddleware(h http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
sw := &statusWriter{ResponseWriter: w, status: 200}
h.ServeHTTP(sw, r)
slog.Info("http",
"method", r.Method,
"path", r.URL.Path,
"status", sw.status,
"bytes", sw.written,
"dur_ms", time.Since(start).Milliseconds(),
"remote", r.RemoteAddr,
)
})
}
type statusWriter struct {
http.ResponseWriter
status, written int
}
func (s *statusWriter) WriteHeader(c int) { s.status = c; s.ResponseWriter.WriteHeader(c) }
func (s *statusWriter) Write(b []byte) (int, error) { n, e := s.ResponseWriter.Write(b); s.written += n; return n, e }
13. Exercise 12 — defer resp.Body.Close() without draining¶
A client reads resp.Body into JSON via Decode, then returns on the first parse error. The body is closed but not drained; the underlying connection cannot return to the keep-alive pool.
func fetch(url string) (*Payload, error) {
resp, err := httpClient.Get(url)
if err != nil { return nil, err }
defer resp.Body.Close()
var p Payload
if err := json.NewDecoder(resp.Body).Decode(&p); err != nil {
return nil, err // body still has trailing bytes; connection killed
}
return &p, nil
}
After
Drain after Decode (or on error). `io.Copy(io.Discard, resp.Body)` pulls any trailing bytes (chunked terminators, whitespace, extra JSON in error responses) so the `persistConn` can be returned. ~13× faster on a keep-alive workload. **Why faster:** `Transport.tryPutIdleConn` requires the body to be fully consumed before reusing the connection. An undrained body forces the transport to close the underlying TCP — every subsequent request re-handshakes. Draining is cheap (the body is usually small at this point); reconnects are not. **Trade-off:** Draining a body that's actually huge (10 MB error page) wastes bandwidth. Cap with `io.CopyN(io.Discard, resp.Body, 64*1024)` if you're worried — if there's more left, the connection isn't worth saving anyway. Forgetting to close (deferring `Close` after early return) is still a leak — drain ≠ close. **When NOT:** Streaming downloads where you `io.Copy` to disk — body's already drained. Servers known not to support keep-alive (rare). One-shot connections where the next request is far enough away that the idle conn would close anyway.14. Exercise 13 — ReadHeaderTimeout = 0¶
A public-facing server runs with no ReadHeaderTimeout. A slowloris-style attacker opens 10k connections and sends headers one byte per minute. Each connection holds a goroutine + buffer until the OS closes the socket.
srv := &http.Server{
Addr: ":8080",
Handler: mux,
// ReadHeaderTimeout: 0 — wait forever
}
srv.ListenAndServe()
After
Set `ReadHeaderTimeout` to a small value (5 s is generous for real clients, hostile for attackers). This is independent of `ReadTimeout` — headers are bounded even if the body legitimately needs longer. ~13500× lower attack throughput; legitimate traffic unaffected. **Why faster (and safer):** With no `ReadHeaderTimeout`, the server's accept loop hands each connection to a goroutine that blocks on `Read` until the client finishes headers. Slowloris exploits this — 10k cheap connections eat 10k goroutines + their stacks. Setting the timeout closes incomplete header reads in 5 s; the goroutine dies, the connection releases, and the next legitimate request gets the slot. Go 1.20 made this independent of `ReadTimeout` precisely so streaming-upload servers could bound headers tightly without bounding the body. **Trade-off:** Mobile clients on weak networks may legitimately need > 5 s to upload headers under packet loss. Bump to 10–15 s if you serve users on 2G/3G. Behind a TLS-terminating proxy (nginx, ALB) you may double-set timeouts — pick one layer as authoritative. `MaxHeaderBytes` defaults to 1 MB; cap it tighter if you know your protocol. **When NOT:** Internal services on trusted networks behind a firewall + load balancer that already bounds headers — adding it doesn't hurt but isn't load-bearing. Long-poll endpoints where the client legitimately holds the connection open before sending — but those should send headers fast, then wait; `ReadHeaderTimeout` still applies cleanly.15. Exercise 14 — time.Now() per request for ID¶
A middleware generates a request ID via time.Now().UnixNano(). Under high concurrency, time.Now() is cheap (~20 ns on Linux) but allocates a time.Time on some paths, and collisions are possible if two requests hit the same nanosecond on different cores.
func requestIDMiddleware(h http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
id := strconv.FormatInt(time.Now().UnixNano(), 10)
ctx := context.WithValue(r.Context(), "reqID", id)
h.ServeHTTP(w, r.WithContext(ctx))
})
}
After
Atomic counter for monotonic IDs; collision-free and faster than `time.Now()`. If you need globally unique IDs across instances, pre-generate UUIDs in batches and pop from a ring buffer.var reqCounter atomic.Uint64
func requestIDMiddleware(h http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
n := reqCounter.Add(1)
id := strconv.FormatUint(n, 36) // base36: shorter, still URL-safe
ctx := context.WithValue(r.Context(), reqIDKey{}, id)
h.ServeHTTP(w, r.WithContext(ctx))
})
}
type reqIDKey struct{}
16. When NOT to optimize¶
net/http cost dominates only when the service is on the hot path of a high-frequency operation. If your handler serves 10 req/s, every optimization here saves microseconds you can't measure: admin endpoints, internal tools, cron-triggered webhooks, batch processors that make one HTTP call per minute. Build with defaults; profile when traffic shows up.
Profile first. net/http overhead has four signatures in a CPU profile: crypto/tls.(*Conn).Handshake on every request → Ex. 1, 3, 10 (connection reuse broken); runtime.mallocgc from bytes.makeSlice or bufio.NewReader → Ex. 5, 8 (pool the allocation); runtime.netpollblock with high goroutine counts → Ex. 13 (slowloris-shaped, or unbounded body reads); runtime.gopark waiting on persistConn.alive → Ex. 12 (undrained bodies killing keep-alive).
Common premature optimizations: pooling buffers (Ex. 5, 8) on a 100 req/s service — pool overhead exceeds the alloc savings; tuning MaxIdleConnsPerHost (Ex. 3) when you only call one host at low concurrency — the default 2 is fine; replacing io.ReadAll with streaming (Ex. 2) for 1 KB bodies — ReadAll returns to a stack slot fast; pre-allocated UUIDs (Ex. 14) for a service whose ID format isn't even on a hot path.
Correctness gaps disguised as optimizations: shared client (Ex. 1) without IdleConnTimeout set — connections accumulate until the server kicks them, then every request 502s; tuned MaxIdleConnsPerHost (Ex. 3) higher than your FD ulimit — server panics on accept; Encoder.Encode (Ex. 4) where headers were set after Write — silently sends 200 on an error path; ServeMux patterns (Ex. 6) with overlapping routes that "happen to work" — Handle panics at startup in production; undrained body (Ex. 12) where the error response is huge — draining wastes more bandwidth than reconnecting; ReadHeaderTimeout too aggressive (Ex. 13) — mobile users on flaky networks get 408s during real outages; atomic counter ID (Ex. 14) reset across deploys collides with stored historical IDs.
Two perennial myths. "http.Client is expensive to construct" — false; it's the Transport's connection pool that matters. You can construct a Client{Transport: sharedTransport} per request and pay nothing. "io.ReadAll is always wrong" — also false; for known-small bodies it's the clearest code, and ReadAll's growslice shape is fine when total bytes < 16 KB.
17. Summary¶
Always-ship wins (default in any new net/http code): one package-level *http.Client with a tuned Transport (Ex. 1, 3); set ReadHeaderTimeout, ReadTimeout, WriteTimeout, IdleTimeout on every server (Ex. 9, 13); json.NewEncoder(w).Encode(v) instead of marshal-then-write (Ex. 4); drain response bodies before close (Ex. 12); use Set/Add on req.Header (Ex. 8); use Go 1.22+ ServeMux patterns for routing (Ex. 6).
Wins behind a profile (when measurements justify them): pooled bytes.Buffer / bufio.Reader (Ex. 5, 8, when mallocgc shows in the handler); raised MaxIdleConnsPerHost (Ex. 3, when tls.Handshake shows on warm paths); io.Copy instead of io.ReadAll (Ex. 2, when peak memory matters); structured logging instead of DumpRequest (Ex. 11, when log middleware tops the CPU profile); atomic counter IDs (Ex. 14, when middleware allocations dominate).
Specialty (only when the design calls for it): pre-allocated UUID batches with background refill for tamper-resistant IDs at high QPS; per-host Transport instances when one upstream's tuning hurts another; HTTP/2-pinned clients with ForceAttemptHTTP2 and MaxConcurrentStreams capped for chatty APIs; http.Hijacker upgrades to raw conns for protocols that outgrow ResponseWriter.
net/http cost is connection setup, allocation, undrained bodies, and unset timeouts. Strip those four from the read path by reusing the right primitive: one Client per process, one Transport per pool policy, one buffer per worker, one timeout per failure mode. The stdlib's defaults are designed for correctness, not throughput — every optimization here is moving from "works" to "scales." Profile, then pick the lever; the four signatures above tell you which one.