Steady-State — Middle¶
Table of Contents¶
- Where we are
- Bounded queues — three shedding policies
- The worker pool, revisited
- Per-tenant resource isolation
- Goroutine budgets via
semaphore.Weighted - Connection pool steady-state —
sql.DB - Connection pool steady-state —
http.Transport - Connection pool steady-state — gRPC
- Leak budgets and accounting
- Cache TTL and bounded LRUs
- Background work — tickers and timers, done right
- Putting it together — a steady-state service
- Tricky points
- Common middle-tier mistakes
- Self-assessment
- Summary
Where we are¶
The junior page taught the three habits: bound the queue, cap the goroutines, pair every acquire with a release. With those three you avoid most production incidents. The middle page is about the next layer: making those habits robust under real workloads where tenants are unequal, upstreams are flaky, and queues are not the only resources you have to manage.
After this page you will:
- Choose between shed-on-full, block-on-full, and load-shedding based on workload semantics.
- Implement a per-tenant semaphore that bounds blast radius.
- Tune
sql.DB,http.Transport, and gRPCClientConnfor steady-state behaviour. - Define a leak budget and an alert that fires when the budget is exceeded.
- Build a cache that has a real bound, not just a hope.
Bounded queues — three shedding policies¶
When a queue is full, the producer faces a decision. There are three good policies; pick the one that matches your workload.
Policy 1 — Shed (drop)¶
The producer drops the new item and returns an error or increments a counter.
select {
case jobs <- job:
return nil
case <-ctx.Done():
return ctx.Err()
default:
droppedCounter.Inc()
return ErrQueueFull
}
Use when:
- Upstream can retry (or the data is best-effort: metrics, sampled logs, analytics events).
- Latency matters more than completeness.
- The cost of holding the item is high (e.g., a request that holds a transactional lock).
Policy 2 — Block¶
The producer waits until a slot opens. Latency degrades; no data is lost.
Use when:
- Data is mandatory (financial transactions, audit logs).
- Upstream can absorb back-pressure (a synchronous RPC handler).
- The context has a deadline so the wait is bounded.
Policy 3 — Load-shed (admission control)¶
A variant of shed: the producer rejects items probabilistically based on queue depth. Above a low watermark, ramp up the rejection probability. This smooths the transition from "fully accepting" to "fully rejecting."
func shedProbability(depth, cap int) float64 {
low := cap * 7 / 10
high := cap * 9 / 10
if depth < low {
return 0
}
if depth >= high {
return 1
}
return float64(depth-low) / float64(high-low)
}
func submitWithLoadShed(job Job) error {
if rand.Float64() < shedProbability(len(jobs), cap(jobs)) {
return ErrShed
}
select {
case jobs <- job:
return nil
default:
return ErrQueueFull
}
}
Use when:
- You want a graceful degradation curve instead of a cliff.
- Tail-latency matters and you can tolerate a small rejection rate to preserve headroom.
In practice, load-shedding shows up at the API gateway, not inside individual workers. But the principle applies anywhere there is a hot queue.
The worker pool, revisited¶
The junior pool was a fixed number of workers and a bounded queue. The middle pool adds:
- Per-worker metrics. In-flight count, processing time histogram, errors.
- Configurable policy. Shed, block, or block-with-deadline.
- Graceful stop. Drain in-flight jobs, deny new ones, bounded by a deadline.
type Pool struct {
jobs chan func()
wg sync.WaitGroup
metrics PoolMetrics
closed atomic.Bool
}
type PoolMetrics struct {
InFlight atomic.Int64
Submitted atomic.Int64
Dropped atomic.Int64
Completed atomic.Int64
DurationNS atomic.Int64
}
func NewPool(workers, queueSize int) *Pool {
p := &Pool{jobs: make(chan func(), queueSize)}
for i := 0; i < workers; i++ {
p.wg.Add(1)
go p.worker()
}
return p
}
func (p *Pool) worker() {
defer p.wg.Done()
for job := range p.jobs {
p.metrics.InFlight.Add(1)
start := time.Now()
job()
p.metrics.DurationNS.Add(int64(time.Since(start)))
p.metrics.Completed.Add(1)
p.metrics.InFlight.Add(-1)
}
}
type SubmitMode int
const (
Shed SubmitMode = iota
Block
BlockWithCtx
)
func (p *Pool) Submit(ctx context.Context, mode SubmitMode, job func()) error {
if p.closed.Load() {
return ErrClosed
}
p.metrics.Submitted.Add(1)
switch mode {
case Shed:
select {
case p.jobs <- job:
return nil
default:
p.metrics.Dropped.Add(1)
return ErrQueueFull
}
case Block:
p.jobs <- job
return nil
case BlockWithCtx:
select {
case p.jobs <- job:
return nil
case <-ctx.Done():
p.metrics.Dropped.Add(1)
return ctx.Err()
}
}
return errors.New("unknown mode")
}
func (p *Pool) Stop(ctx context.Context) error {
if !p.closed.CompareAndSwap(false, true) {
return ErrAlreadyClosed
}
close(p.jobs)
done := make(chan struct{})
go func() {
p.wg.Wait()
close(done)
}()
select {
case <-done:
return nil
case <-ctx.Done():
return ctx.Err()
}
}
What changed¶
- Submit takes a mode argument. The caller decides the policy.
- Metrics are atomic counters, exported separately.
- Stop is bounded by a context, so a stuck worker cannot block forever.
- A
closedflag prevents new submissions after Stop.
What to expose¶
Wire the metrics to your monitoring system:
pool_in_flight— current concurrent jobs.pool_submitted_total— counter of all submits.pool_dropped_total— counter of all drops.pool_completed_total— counter of all completions.pool_duration_seconds— histogram of job duration.
Then dashboards can show: drop rate as a fraction of submit rate, in-flight as a fraction of worker count, p99 duration over time.
Per-tenant resource isolation¶
A service that handles many tenants must not let one tenant's pathology consume all the resources. Otherwise: a single misbehaving customer's traffic spike makes every other customer's latency rise.
The semaphore-per-tenant pattern¶
Give each tenant a weighted semaphore. Calls from tenant T must Acquire the tenant's semaphore before doing work. If tenant T is oversubscribed, only T's calls queue. Other tenants are unaffected.
import "golang.org/x/sync/semaphore"
type TenantLimit struct {
mu sync.Mutex
sems map[string]*semaphore.Weighted
perTenant int64
lastUsed map[string]time.Time
idleTTL time.Duration
}
func NewTenantLimit(perTenant int64, idleTTL time.Duration) *TenantLimit {
t := &TenantLimit{
sems: make(map[string]*semaphore.Weighted),
perTenant: perTenant,
lastUsed: make(map[string]time.Time),
idleTTL: idleTTL,
}
go t.gc()
return t
}
func (t *TenantLimit) get(tenant string) *semaphore.Weighted {
t.mu.Lock()
defer t.mu.Unlock()
sem, ok := t.sems[tenant]
if !ok {
sem = semaphore.NewWeighted(t.perTenant)
t.sems[tenant] = sem
}
t.lastUsed[tenant] = time.Now()
return sem
}
func (t *TenantLimit) Do(ctx context.Context, tenant string, fn func()) error {
sem := t.get(tenant)
if err := sem.Acquire(ctx, 1); err != nil {
return err
}
defer sem.Release(1)
fn()
return nil
}
// gc removes tenants that haven't been used for idleTTL.
// Without this, the map grows forever for any service whose
// tenant IDs rotate.
func (t *TenantLimit) gc() {
ticker := time.NewTicker(t.idleTTL / 2)
defer ticker.Stop()
for range ticker.C {
cutoff := time.Now().Add(-t.idleTTL)
t.mu.Lock()
for tenant, last := range t.lastUsed {
if last.Before(cutoff) {
delete(t.sems, tenant)
delete(t.lastUsed, tenant)
}
}
t.mu.Unlock()
}
}
Why the GC matters¶
The first version of this pattern in many production services forgot the GC. It looked like:
func (t *TenantLimit) get(tenant string) *semaphore.Weighted {
if sem, ok := t.sems[tenant]; ok {
return sem
}
sem := semaphore.NewWeighted(t.perTenant)
t.sems[tenant] = sem
return sem
}
This works perfectly until tenant IDs rotate (session IDs, request IDs, ephemeral users). Then sems grows by one entry per request, forever. A classic steady-state bug. The GC goroutine is the price of admission.
When to use it¶
- Any multi-tenant service.
- Any service whose hot path can be parameterised by a "shard key" — customer ID, organisation ID, API key.
- Services that have explicit SLOs per tenant (you can give "premium" tenants a higher weight).
When not to¶
- A service with a small, fixed number of tenants (the per-tenant semaphore overhead may exceed the benefit).
- A service where tenants are effectively trusted to behave (intra-service calls between trusted teams).
Goroutine budgets via semaphore.Weighted¶
semaphore.Weighted is the standard way to cap concurrent goroutines for a given workload — alternative to or in combination with a fixed worker pool.
import "golang.org/x/sync/semaphore"
var globalSem = semaphore.NewWeighted(100) // 100 concurrent goroutines
func handler(ctx context.Context) error {
if err := globalSem.Acquire(ctx, 1); err != nil {
return err
}
defer globalSem.Release(1)
// do work
return nil
}
Compared with a worker pool¶
- A worker pool has fixed goroutines pulling from a queue. Best when jobs are short and the per-spawn cost matters.
- A semaphore allows the caller to choose when to spawn. Best when the caller already has work-specific code and just needs concurrency control.
Weighted variants¶
The "1" in Acquire(ctx, 1) is the weight. You can also have heterogeneous jobs:
// Small jobs cost 1, large jobs cost 10.
sem := semaphore.NewWeighted(100)
func small(ctx context.Context) error {
sem.Acquire(ctx, 1)
defer sem.Release(1)
// ...
return nil
}
func large(ctx context.Context) error {
sem.Acquire(ctx, 10)
defer sem.Release(10)
// ...
return nil
}
This lets you express "at most ten large jobs concurrently, or one hundred small jobs, or any combination that sums to one hundred."
Connection pool steady-state — sql.DB¶
database/sql.DB is a connection pool, not a connection. Configuration is your responsibility.
The four knobs¶
db.SetMaxOpenConns(25)
db.SetMaxIdleConns(25)
db.SetConnMaxLifetime(30 * time.Minute)
db.SetConnMaxIdleTime(5 * time.Minute)
MaxOpenConns— hard cap on total connections (in-use + idle). Beyond this, callers block onAcquireuntil a connection is returned.MaxIdleConns— pool of warm idle connections. Lower thanMaxOpenConnsis wasteful (the pool will close usable connections); equal is the most common choice.ConnMaxLifetime— max age before recycling. Defends against stale connections after database-side restarts and credential rotations.ConnMaxIdleTime— max idle time before closing. Defends against pools that grow during burst and never shrink.
Picking MaxOpenConns¶
Start from the database's max_connections divided by the number of replicas of your service. If Postgres is set to 200 and your service has eight replicas, you have twenty-five connections per replica before you risk denying service to other clients.
Then, within that budget, pick a number large enough to saturate your workload. A simple heuristic: if your peak QPS is Q and the median query time is T seconds, you need about Q * T concurrent connections. Add headroom (maybe 50%).
Example: Q = 200 RPS, T = 10 ms. Concurrent connections ≈ 2; with headroom, 5. Set MaxOpenConns = 5. This is much less than most teams default to, and it tends to be correct.
Picking ConnMaxLifetime¶
Half an hour is a reasonable default. Lower (5–10 minutes) if your database has frequent restarts or your DNS routing is dynamic. Higher (an hour) if dial cost is high and your environment is stable.
Monitoring¶
db.Stats() returns a sql.DBStats struct:
type DBStats struct {
MaxOpenConnections int
OpenConnections int
InUse int
Idle int
WaitCount int64 // cumulative
WaitDuration time.Duration // cumulative
MaxIdleClosed int64
MaxIdleTimeClosed int64
MaxLifetimeClosed int64
}
The two that matter for steady-state: WaitCount and WaitDuration. If they are climbing, your pool is too small. Export them as counters and compute the rate.
A common bug — leaked rows¶
rows, err := db.QueryContext(ctx, "SELECT ...")
if err != nil {
return err
}
// BUG: missing defer rows.Close()
for rows.Next() {
...
}
return nil
rows.Next() returns false at the end of a result set and closes the underlying connection. But if an error in the loop body causes an early return before the loop finishes, the connection is leaked back into the pool. Always:
The deferred Close is a no-op if the rows are already closed, so it is safe to add unconditionally.
Connection pool steady-state — http.Transport¶
The default http.Client uses a shared http.DefaultTransport. Its defaults are unsuitable for most production services.
The default¶
// What http.DefaultTransport looks like, approximately:
&http.Transport{
MaxIdleConns: 100,
MaxIdleConnsPerHost: 2, // very low
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
}
The killer is MaxIdleConnsPerHost = 2. If your service makes a hundred RPS to a single upstream, you can keep at most two connections warm. Every other call dials a fresh TCP+TLS connection, and you pay the latency and FD cost.
A production baseline¶
tr := &http.Transport{
MaxIdleConns: 200,
MaxIdleConnsPerHost: 50,
MaxConnsPerHost: 100, // hard cap on concurrent in-flight + idle
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 5 * time.Second,
ResponseHeaderTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
}
client := &http.Client{
Transport: tr,
Timeout: 30 * time.Second,
}
Why each value:
MaxIdleConnsPerHost = 50: enough to keep most calls on warm connections at typical fan-out.MaxConnsPerHost = 100: hard cap; calls queue inside Transport if exceeded. Prevents runaway connection counts during burst.IdleConnTimeout = 90s: long enough to amortise TLS handshakes across reasonable gaps, short enough that idle hosts release FDs.TLSHandshakeTimeout = 5s: protects against slow upstreams blocking new dials.Client.Timeout = 30s: a backstop. Every request must complete or fail within thirty seconds.
Always drain the body¶
The most common steady-state bug with http.Client is forgetting to drain the response body:
resp, err := client.Do(req)
if err != nil {
return err
}
defer func() {
io.Copy(io.Discard, resp.Body)
resp.Body.Close()
}()
Without the drain, the connection cannot return to the keep-alive pool. The next call dials a new connection. The pool fragments.
Per-host transport¶
If your service talks to many upstreams with different characteristics, give each its own transport. A slow upstream's connections won't be evicted by traffic to a fast upstream.
type UpstreamClient struct {
Slow *http.Client // tuned for slow upstream
Fast *http.Client // tuned for fast upstream
}
Connection pool steady-state — gRPC¶
gRPC multiplexes many streams over one HTTP/2 connection. The pool is implicit: one ClientConn per upstream.
Keepalive¶
The most important configuration for steady-state:
import "google.golang.org/grpc/keepalive"
conn, err := grpc.Dial(addr,
grpc.WithTransportCredentials(creds),
grpc.WithKeepaliveParams(keepalive.ClientParameters{
Time: 30 * time.Second, // ping every 30s
Timeout: 10 * time.Second, // wait 10s for ack
PermitWithoutStream: false, // only ping if streams in flight
}),
)
Why:
- Without keepalive, an HTTP/2 connection can sit idle for hours, traverse a NAT timeout, and become a half-open connection. The next call hangs.
- The server has matching
keepalive.ServerParametersandEnforcementPolicy. They must agree, or the server will disconnect you for pinging too often.
Connection reuse¶
Each ClientConn is itself a small pool internally. Reuse it across calls; never create one per request.
Load balancing¶
If your upstream has multiple instances, use grpc.WithDefaultServiceConfig to configure a load balancing policy (round_robin is a common default). Otherwise the client may pin to a single upstream pod and miss load balancing entirely.
Idle connections¶
gRPC does not have an IdleConnTimeout equivalent. A ClientConn lives until you explicitly Close() it. For long-running services this is fine; for services that create connections dynamically (e.g., one per tenant), you must track and close them.
Leak budgets and accounting¶
A leak budget is an explicit acceptance that some growth is allowed. The discipline is:
- Pick a budget. "Memory growth no greater than fifty megabytes per day."
- Verify deploys reset it. "We deploy at least weekly, so the maximum drift is three hundred fifty megabytes — well below the container limit."
- Alert on excursions. "Growth greater than the budget for two consecutive hours is a page."
Why leak budgets are useful¶
Without a budget, you have two options:
- "No growth ever." Hard to achieve, often requires more engineering than the leak is worth.
- "Growth is fine." No alert ever fires, until the day the leak rate spikes.
A budget converts a binary decision into a continuous one. You can ship the simpler code that has a small leak, as long as the leak is small and bounded.
Implementing the budget¶
A budget needs a measurement and an alert:
# Prometheus example: alert on heap growth beyond budget
deriv(go_memstats_heap_inuse_bytes[1h]) > 50 * 1024 * 1024 / 86400
# i.e., growth rate above 50 MB per day
The deriv function computes the slope. Multiply your budget by the appropriate time unit. Alert when the slope is sustained over an observation window.
When budgets break down¶
- Deploy cadence is irregular. A budget that assumes weekly deploys is invalidated by a two-week pause.
- Leak rate is non-linear. A leak that grows exponentially can be inside budget for a week and then explode in a day.
- Multiple leaks compose. Two budgets of fifty megabytes per day each become a hundred megabytes per day combined.
Treat the budget as a tool, not a guarantee. Periodically audit the actual leak rate; investigate any change.
Cache TTL and bounded LRUs¶
A map[K]V used as a cache is a leak unless you bound it. There are two ways to bound: by size or by age. Most production caches use both.
A bounded LRU¶
github.com/hashicorp/golang-lru/v2 is the standard. It is fixed-size and evicts the least-recently-used entry on overflow.
import lru "github.com/hashicorp/golang-lru/v2"
cache, _ := lru.New[string, []byte](10000) // hard cap at 10000 entries
cache.Add("foo", []byte("bar"))
v, ok := cache.Get("foo")
The cache is steady-state by construction: it cannot grow past 10000 entries.
TTL-based eviction¶
github.com/dgraph-io/ristretto or your own ticker-based sweeper:
type TTLCache struct {
mu sync.RWMutex
m map[string]ttlEntry
}
type ttlEntry struct {
value []byte
expiresAt time.Time
}
func (c *TTLCache) Set(k string, v []byte, ttl time.Duration) {
c.mu.Lock()
c.m[k] = ttlEntry{value: v, expiresAt: time.Now().Add(ttl)}
c.mu.Unlock()
}
func (c *TTLCache) Get(k string) ([]byte, bool) {
c.mu.RLock()
e, ok := c.m[k]
c.mu.RUnlock()
if !ok || time.Now().After(e.expiresAt) {
return nil, false
}
return e.value, true
}
func (c *TTLCache) Sweep() {
c.mu.Lock()
defer c.mu.Unlock()
now := time.Now()
for k, e := range c.m {
if now.After(e.expiresAt) {
delete(c.m, k)
}
}
}
func (c *TTLCache) RunSweeper(ctx context.Context, interval time.Duration) {
t := time.NewTicker(interval)
defer t.Stop()
for {
select {
case <-ctx.Done():
return
case <-t.C:
c.Sweep()
}
}
}
Lazy TTL alone is not steady-state — entries that are written and never read again will live until something evicts them. The sweeper is required.
Combining size and TTL¶
For most production caches, combine: an LRU bounds the worst-case memory, a TTL bounds the staleness. The ristretto library does both; so does groupcache/lru with a sweeper added.
Background work — tickers and timers, done right¶
time.Ticker — always Stop¶
t := time.NewTicker(1 * time.Second)
defer t.Stop() // required
for {
select {
case <-t.C:
// ...
case <-ctx.Done():
return
}
}
Without Stop, the ticker's internal goroutine keeps running until the program exits. If you create tickers in request handlers, this is a slow leak.
time.After — avoid in loops¶
Every iteration where ch fires creates a new timer that is never cleaned up until it eventually fires (five seconds later) or the runtime garbage-collects it. Under high ch rate, you accumulate thousands of timers.
// GOOD
t := time.NewTimer(5 * time.Second)
defer t.Stop()
for {
select {
case x := <-ch:
handle(x)
if !t.Stop() {
select {
case <-t.C:
default:
}
}
t.Reset(5 * time.Second)
case <-t.C:
// heartbeat
t.Reset(5 * time.Second)
}
}
This is uglier but reuses a single timer across iterations.
time.AfterFunc — also leaks on early exit¶
timer := time.AfterFunc(5*time.Second, func() { ... })
// later, when no longer needed:
timer.Stop()
Always stop the timer when you no longer want it to fire. Otherwise the function will still be scheduled on a goroutine when the time expires.
Putting it together — a steady-state service¶
package main
import (
"context"
"database/sql"
"io"
"log"
"net/http"
"os"
"os/signal"
"runtime/debug"
"syscall"
"time"
_ "net/http/pprof"
_ "github.com/jackc/pgx/v5/stdlib"
)
func main() {
debug.SetMemoryLimit(int64(3.6 * 1024 * 1024 * 1024)) // 3.6 GiB
ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGTERM, syscall.SIGINT)
defer cancel()
// Database with steady-state-friendly settings.
db, err := sql.Open("pgx", os.Getenv("DATABASE_URL"))
if err != nil {
log.Fatal(err)
}
defer db.Close()
db.SetMaxOpenConns(25)
db.SetMaxIdleConns(25)
db.SetConnMaxLifetime(30 * time.Minute)
db.SetConnMaxIdleTime(5 * time.Minute)
// HTTP client with steady-state-friendly transport.
tr := &http.Transport{
MaxIdleConns: 200,
MaxIdleConnsPerHost: 50,
MaxConnsPerHost: 100,
IdleConnTimeout: 90 * time.Second,
}
client := &http.Client{Transport: tr, Timeout: 30 * time.Second}
// Worker pool with bounded queue and shed-on-full.
pool := NewPool(4, 16)
defer pool.Stop(context.Background())
// Per-tenant isolation.
tenants := NewTenantLimit(10, 5*time.Minute)
// Localhost-only pprof.
go func() {
_ = http.ListenAndServe("127.0.0.1:6060", nil)
}()
// Application HTTP server.
srv := &http.Server{
Addr: ":8080",
Handler: handler(db, client, pool, tenants),
}
go func() {
if err := srv.ListenAndServe(); err != http.ErrServerClosed {
log.Println(err)
}
}()
<-ctx.Done()
// Graceful shutdown.
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 25*time.Second)
defer shutdownCancel()
_ = srv.Shutdown(shutdownCtx)
_ = pool.Stop(shutdownCtx)
}
func handler(db *sql.DB, client *http.Client, pool *Pool, tenants *TenantLimit) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
tenant := r.Header.Get("X-Tenant-ID")
if tenant == "" {
http.Error(w, "missing tenant", http.StatusBadRequest)
return
}
if err := tenants.Do(r.Context(), tenant, func() {
// do tenant-isolated work
_, _ = db.ExecContext(r.Context(), "SELECT 1")
resp, err := client.Get("https://upstream.example.com/")
if err != nil {
return
}
io.Copy(io.Discard, resp.Body)
resp.Body.Close()
}); err != nil {
http.Error(w, err.Error(), http.StatusServiceUnavailable)
}
})
}
Notice every steady-state habit is here:
GOMEMLIMITset explicitly.- Database with sized pool and lifetime.
- HTTP transport with sized pool and timeouts.
- Worker pool with bounded queue.
- Per-tenant isolation with idle GC.
- pprof on localhost only.
- Graceful shutdown bounded by a deadline.
This is roughly five hundred lines of boilerplate spread across utility packages. Every production Go service has some version of it.
Tricky points¶
MaxIdleConns interaction¶
If MaxIdleConnsPerHost = 50 but MaxIdleConns = 10, you can't actually keep fifty idle connections per host. The global cap dominates. Make sure your numbers are consistent.
MaxConnsPerHost = 0 means unlimited¶
This is the default. The first time you forget to set it, your service opens a hundred connections to one upstream during a burst and trips a rate limiter. Always set it.
sql.DB.Close() does not stop in-flight queries¶
If a query is running when you call db.Close(), the query continues to run on the database side. The client just stops waiting. In a graceful shutdown, use db.Stats().InUse to know when it's safe to call Close().
Tenant semaphore weights interact with downstream¶
A tenant semaphore of weight 10 lets one tenant use up to 10 concurrent goroutines. If those 10 goroutines each take a database connection, you have committed 10 of the 25 database connections to that one tenant. Plan the math.
time.NewTimer.Reset is tricky¶
The standard pattern requires draining the channel:
In Go 1.23+, Reset was simplified — calling it on a fired or stopped timer is safe. Check your Go version.
Common middle-tier mistakes¶
Mistake — Sizing the pool from the high-water mark¶
You measured peak concurrent queries at 50, so you set MaxOpenConns = 50. But the high-water mark was during an unusual incident. Most of the time, 25 is enough. Now your service permanently holds 25 connections it does not need, costing memory on the database side.
The right approach: size from steady-state, with headroom for short bursts. Use a circuit breaker to handle bursts that exceed the headroom.
Mistake — Tenant semaphore without GC¶
The single most common production bug in tenant isolation: the map grows forever. Always include a GC sweeper. Test the GC sweeper.
Mistake — IdleConnTimeout = 0 (unlimited)¶
Means idle connections live forever. Each one holds a TCP socket and a TLS session. Under traffic that fans out to many hosts, you accumulate hundreds or thousands of idle connections. Set a timeout, even if a generous one.
Mistake — Forgetting defer rows.Close()¶
Easy to forget, especially on error returns. Add a linter rule. Or use db.QueryRowContext (single row) whenever possible — it doesn't expose the leak path.
Self-assessment¶
- What are the three shedding policies, and when do you choose each?
- Why does a tenant semaphore map need a GC sweeper?
- What is the difference between
MaxIdleConnsandMaxIdleConnsPerHost? - Why is
MaxIdleConnsPerHost = 2(the default) usually wrong? - What is a leak budget, and how do you alert on it?
- Why is
time.Afterin a hot loop a leak? - What does
sql.DBStats.WaitCounttell you? - What's the difference between a worker pool and a
semaphore.Weightedfor capping concurrency?
Backpressure in detail¶
Backpressure is the mechanism by which a slow consumer slows down its producer. Without backpressure, a fast producer with a slow consumer produces unbounded queueing — exactly the steady-state failure mode we are trying to prevent.
Where backpressure happens¶
In a typical pipeline:
Each queue is a place where backpressure can apply. If queue 2 is full, stage A's submit blocks. If stage A's input is slow because of that, queue 1 fills. If queue 1 is full, ingress's accept blocks. The slow consumer at the end has propagated backpressure all the way to the front.
Implementing backpressure in Go¶
The simplest backpressure: a buffered channel acting as a queue, with the producer blocking on full.
ch := make(chan Job, 16)
// Producer:
ch <- job // blocks if full
// Consumer:
for j := range ch {
process(j)
}
For HTTP servers, backpressure means: return a 503 (or hold the connection) when the worker pool's queue is full.
func handler(w http.ResponseWriter, r *http.Request) {
err := pool.Submit(r.Context(), Shed, func() {
// work
})
if errors.Is(err, ErrQueueFull) {
w.Header().Set("Retry-After", "1")
http.Error(w, "service overloaded", http.StatusServiceUnavailable)
return
}
}
A 503 with Retry-After tells the load balancer to shift load. The load balancer may also remove the pod from rotation if 503s persist.
Kubernetes readiness gates¶
A pod that is overloaded can flip its readiness probe to fail:
var isReady atomic.Bool
isReady.Store(true)
http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
})
http.HandleFunc("/readyz", func(w http.ResponseWriter, r *http.Request) {
if !isReady.Load() {
w.WriteHeader(http.StatusServiceUnavailable)
return
}
w.WriteHeader(http.StatusOK)
})
// Monitor queue depth; flip readiness if sustained overload:
go func() {
for {
time.Sleep(1 * time.Second)
if pool.QueueDepth() > pool.Capacity()*8/10 {
isReady.Store(false)
} else {
isReady.Store(true)
}
}
}()
The load balancer removes the pod when readiness fails, restores it when readiness recovers. This is fleet-level backpressure.
Background-task scheduling¶
Long-running services have periodic work: cache eviction, log rotation, snapshot compaction, metric aggregation. Each of these is a small steady-state engineering problem of its own.
The basic pattern¶
func runPeriodic(ctx context.Context, name string, interval time.Duration, work func(context.Context) error) {
t := time.NewTicker(interval)
defer t.Stop()
for {
select {
case <-ctx.Done():
return
case <-t.C:
// Bound the work duration so the next tick is not delayed.
workCtx, cancel := context.WithTimeout(ctx, interval/2)
if err := work(workCtx); err != nil {
log.Printf("%s: %v", name, err)
}
cancel()
}
}
}
Three rules baked in:
- The ticker is bounded by
defer t.Stop()— no leak when the function returns. - The work runs with a context budget — it cannot block the next tick.
- Errors are logged but do not panic — the task survives transient failures.
Stagger across instances¶
If you have ten replicas all running a "snapshot every hour" task, they all run at minute zero of the hour — a synchronised spike that may overwhelm a downstream. Stagger by hashing the pod name:
hash := fnv.New64a()
hash.Write([]byte(os.Getenv("POD_NAME")))
offset := time.Duration(hash.Sum64() % uint64(interval))
time.Sleep(offset)
runPeriodic(ctx, name, interval, work)
Each pod's task fires at a different offset. The downstream sees smooth load.
Idle-time work¶
Some tasks should run only when the system is idle: a background reindex, a cache pre-warm. The pattern is to gate the work on a load signal.
func runIdle(ctx context.Context, work func(context.Context) error) {
t := time.NewTicker(30 * time.Second)
defer t.Stop()
for {
select {
case <-ctx.Done():
return
case <-t.C:
if !isIdle() {
continue
}
work(ctx)
}
}
}
func isIdle() bool {
var stats runtime.MemStats
runtime.ReadMemStats(&stats)
// simple heuristic: idle if GC pause is low
return stats.PauseNs[(stats.NumGC+255)%256] < uint64(time.Millisecond)
}
For more sophisticated load signals, sample the request queue depth, the worker pool's in-flight count, or the scheduler's latency from runtime/metrics.
Saturation metrics for the middle layer¶
The senior page goes deep on the USE method. At the middle level, you need at least four metrics per resource:
| Resource | Metric |
|---|---|
| Worker pool | queue depth |
| Worker pool | in-flight count |
| Worker pool | drop count (cumulative) |
| Worker pool | per-job duration histogram |
| Database pool | Stats().InUse |
| Database pool | Stats().WaitCount (cumulative) |
| Database pool | Stats().WaitDuration (cumulative) |
| Database pool | Stats().MaxLifetimeClosed (cumulative) |
| HTTP transport | outstanding requests |
| HTTP transport | dial failures (cumulative) |
| HTTP transport | response time histogram |
| Tenant semaphore | acquired weight by tenant |
| Tenant semaphore | acquire wait time histogram |
Wire each to your monitoring system. The dashboard then has one row per resource and one panel per metric. A glance at the dashboard tells you which resource is closest to saturation.
Building it once¶
A small library can wrap each resource and expose the metrics consistently:
package steady
import (
"database/sql"
"time"
)
type DBStatsExporter struct {
DB *sql.DB
Prefix string
Send func(name string, value float64)
}
func (e *DBStatsExporter) Start(ctx context.Context, interval time.Duration) {
t := time.NewTicker(interval)
defer t.Stop()
var lastWaitCount int64
var lastWaitDuration time.Duration
for {
select {
case <-ctx.Done():
return
case <-t.C:
s := e.DB.Stats()
e.Send(e.Prefix+"_in_use", float64(s.InUse))
e.Send(e.Prefix+"_idle", float64(s.Idle))
e.Send(e.Prefix+"_max_open", float64(s.MaxOpenConnections))
e.Send(e.Prefix+"_wait_count_per_sec",
float64(s.WaitCount-lastWaitCount)/interval.Seconds())
e.Send(e.Prefix+"_wait_seconds_per_sec",
(s.WaitDuration-lastWaitDuration).Seconds()/interval.Seconds())
lastWaitCount = s.WaitCount
lastWaitDuration = s.WaitDuration
}
}
}
Compose: one exporter per database, one per HTTP client, one per worker pool. Each pod ships consistent metrics.
Connection pool warm-up¶
After process start, the connection pool is empty. The first N requests pay the dial cost. If you serve a thousand RPS and MaxIdleConns = 25, the first twenty-five requests are slow; the rest are fast.
To smooth this out, pre-warm at startup:
func prewarmDB(ctx context.Context, db *sql.DB, n int) {
var wg sync.WaitGroup
for i := 0; i < n; i++ {
wg.Add(1)
go func() {
defer wg.Done()
ctx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
conn, err := db.Conn(ctx)
if err != nil {
return
}
conn.Close() // returns to pool
}()
}
wg.Wait()
}
func main() {
db, _ := sql.Open(...)
db.SetMaxOpenConns(25)
db.SetMaxIdleConns(25)
prewarmDB(context.Background(), db, 25)
// pool is now warm
}
Same idea for HTTP. Send a few warmup requests to each upstream during startup, before declaring readiness.
The trade-off¶
Warm-up delays "ready" by a second or two. In exchange, the first hundred user requests are fast, not slow. Acceptable for most services.
In Kubernetes, warmup happens between livenessProbe (pod is alive) and readinessProbe (pod can serve). If readinessProbe returns 200 only after warmup, the load balancer never sends a slow first request.
Resource ownership patterns¶
A subtle middle-tier concern: who owns a resource determines who closes it. Get this wrong and you have leaks.
Pattern: caller owns¶
func MakeClient() *http.Client {
return &http.Client{...}
}
func main() {
client := MakeClient()
// caller is responsible for not abandoning it
}
The function returns the resource; the caller decides its lifetime. Simple, but it places the burden on the caller.
Pattern: callee owns, returns handle¶
type Service struct {
db *sql.DB
}
func NewService() (*Service, error) {
db, err := sql.Open(...)
if err != nil {
return nil, err
}
return &Service{db: db}, nil
}
func (s *Service) Close() error {
return s.db.Close()
}
The callee owns the resource and exposes Close. The caller calls Close. This is the most common pattern for long-lived services.
Pattern: scoped (functional)¶
func WithDB(fn func(db *sql.DB) error) error {
db, err := sql.Open(...)
if err != nil {
return err
}
defer db.Close()
return fn(db)
}
The resource lives only inside the call. Caller cannot forget to close. Useful for short-lived resources (a database used only at startup, or a temporary file).
Pick the pattern that fits the lifetime of the resource. Mismatches (long-lived resource with functional scoping; short-lived resource with caller ownership) are common steady-state bugs.
A walk through realistic numbers¶
Let's design a hypothetical service end-to-end and pick every number.
The workload¶
A user-profile API. Peak QPS: 500. p50 latency target: 20 ms. p99 latency target: 200 ms. Container memory limit: 2 GiB. Fleet maximum: 20 pods.
The math¶
- Concurrent in-flight at peak ≈ 500 RPS × 0.02 s = 10 requests per pod.
- With burst absorption of 2x: 20 concurrent in-flight per pod.
The numbers¶
// GOMEMLIMIT: 90% of 2 GiB = 1.8 GiB
debug.SetMemoryLimit(int64(0.9 * 2 * 1024 * 1024 * 1024))
// Worker pool: 20 workers, queue twice that.
pool := NewPool(20, 40)
// Database pool: fleet max 20 pods, database max_connections 500;
// fair share is 25, take 80% to leave headroom: 20.
db.SetMaxOpenConns(20)
db.SetMaxIdleConns(20)
db.SetConnMaxLifetime(30 * time.Minute)
// HTTP transport to upstream A (a service we call about 100 times/s):
// concurrent in-flight ≈ 100 * 0.02 = 2; with 2x burst, 4. Round up to 8.
trA := &http.Transport{
MaxIdleConnsPerHost: 8,
MaxConnsPerHost: 20,
IdleConnTimeout: 90 * time.Second,
}
// Per-tenant semaphore: 5 concurrent in-flight per tenant.
tenants := NewTenantLimit(5, 5*time.Minute)
// Server timeouts.
srv := &http.Server{
ReadHeaderTimeout: 5 * time.Second,
ReadTimeout: 30 * time.Second,
WriteTimeout: 30 * time.Second,
IdleTimeout: 120 * time.Second,
MaxHeaderBytes: 1 << 16,
}
Each number has a justification. None is arbitrary. This is the discipline.
Edge cases at the middle tier¶
Bursty traffic at the queue limit¶
The queue is sized for typical burst, not extreme burst. An extreme burst causes drops or blocks. This is correct behaviour — the alternative is unbounded queueing and OOM. The team needs to be comfortable seeing occasional drops in the metric.
Tenant with weight > pool capacity¶
A tenant configured with weight 30 in a pool with capacity 20 cannot use all 30 even alone. The math has to be consistent. Validate at startup.
Connection pool with MaxOpenConns < MaxIdleConns¶
Go silently caps MaxIdleConns to MaxOpenConns. Don't rely on the silent fix; set them consistently.
Per-tenant pool of zero¶
A tenant configured with weight 0 (disabled) should fail fast on acquire, not block forever. semaphore.NewWeighted(0) behaves correctly; rolling your own probably does not.
Multi-goroutine Close¶
What happens when two goroutines both call pool.Stop()? You may panic on close(jobs). Guard with atomic.Bool.CompareAndSwap:
func (p *Pool) Stop(ctx context.Context) error {
if !p.closed.CompareAndSwap(false, true) {
return ErrAlreadyClosed
}
close(p.jobs)
// ...
}
Best practices¶
A consolidated list:
- Every long-lived resource has a name, a configured size, and a metric.
- Every queue has a capacity and a shedding policy.
- Every goroutine has an exit condition (context or close).
- Every
Open/Acquireis followed by adefer Close/Release. - Every cache has a bound (size, TTL, or both).
- Every periodic task has a ticker that is stopped and a context-bounded run budget.
- Every connection pool is pre-warmed at startup.
- Every per-tenant resource has an idle GC.
- Every metric is exported with its saturation indicator (queue depth, wait count, etc.).
- Every alert has a runbook entry.
If your service follows all ten, you have a middle-tier steady-state service. The senior page builds on this to make the service stable under adversarial conditions; the professional page hardens it against the rare and ugly failure modes.
Summary¶
Middle-tier steady-state is the layer above the basics:
- Bounded queues with explicit policies (shed, block, load-shed).
- Per-tenant isolation with idle GC.
- Connection pools sized to match the workload, not defaults.
- Cache structures with real bounds (LRU + TTL).
- Tickers and timers that are always
Stopped. - Leak budgets with alerts on excursions.
- Backpressure that propagates from slow consumer to fast producer.
- Background tasks that are staggered and budget-aware.
- Resource ownership made explicit.
The pattern across all of these is: every long-lived resource has a name, a bound, a metric, and an alert. The job of the middle-tier engineer is to enforce that discipline across the service. The senior page goes deeper into the architecture-level decisions; the professional page tells you what happens when one of these levers is set wrong in production.
Annex — Reusable building blocks¶
A few small utility types that every middle-tier service ends up writing. Lightly battle-tested versions of these exist in most teams' internal libraries.
A small runtime/metrics exporter¶
package rtm
import (
"context"
"runtime/metrics"
"time"
)
type Sink interface {
Gauge(name string, value float64)
Histogram(name string, mean, p99 float64)
}
func Run(ctx context.Context, sink Sink, interval time.Duration) {
samples := []metrics.Sample{
{Name: "/memory/classes/heap/objects:bytes"},
{Name: "/memory/classes/heap/free:bytes"},
{Name: "/memory/classes/heap/released:bytes"},
{Name: "/memory/classes/total:bytes"},
{Name: "/sched/goroutines:goroutines"},
{Name: "/gc/cycles/total:gc-cycles"},
{Name: "/gc/pauses:seconds"},
{Name: "/gc/cpu/percent:%"},
}
t := time.NewTicker(interval)
defer t.Stop()
for {
select {
case <-ctx.Done():
return
case <-t.C:
metrics.Read(samples)
for _, s := range samples {
switch s.Value.Kind() {
case metrics.KindUint64:
sink.Gauge(s.Name, float64(s.Value.Uint64()))
case metrics.KindFloat64:
sink.Gauge(s.Name, s.Value.Float64())
case metrics.KindFloat64Histogram:
h := s.Value.Float64Histogram()
sink.Histogram(s.Name, histMean(h), histQuantile(h, 0.99))
}
}
}
}
}
func histMean(h *metrics.Float64Histogram) float64 {
var total, count float64
for i := range h.Counts {
mid := (h.Buckets[i] + h.Buckets[i+1]) / 2
total += mid * float64(h.Counts[i])
count += float64(h.Counts[i])
}
if count == 0 {
return 0
}
return total / count
}
func histQuantile(h *metrics.Float64Histogram, q float64) float64 {
var total uint64
for _, c := range h.Counts {
total += c
}
target := uint64(float64(total) * q)
var seen uint64
for i, c := range h.Counts {
seen += c
if seen >= target {
return h.Buckets[i+1]
}
}
return h.Buckets[len(h.Buckets)-1]
}
A small DBStats exporter¶
package dbm
import (
"context"
"database/sql"
"time"
)
type Sink interface {
Gauge(name string, value float64)
}
func Run(ctx context.Context, db *sql.DB, prefix string, sink Sink, interval time.Duration) {
t := time.NewTicker(interval)
defer t.Stop()
var lastWait int64
var lastWaitDur time.Duration
for {
select {
case <-ctx.Done():
return
case <-t.C:
s := db.Stats()
sink.Gauge(prefix+".in_use", float64(s.InUse))
sink.Gauge(prefix+".idle", float64(s.Idle))
sink.Gauge(prefix+".max", float64(s.MaxOpenConnections))
sink.Gauge(prefix+".wait_count_delta", float64(s.WaitCount-lastWait))
sink.Gauge(prefix+".wait_duration_seconds_delta", (s.WaitDuration-lastWaitDur).Seconds())
lastWait = s.WaitCount
lastWaitDur = s.WaitDuration
}
}
}
A small bounded LRU wrapper¶
If you don't want to pull in hashicorp/golang-lru, here is a minimal implementation:
package lru
import "sync"
type LRU[K comparable, V any] struct {
mu sync.Mutex
cap int
items map[K]*node[K, V]
head *node[K, V] // most recent
tail *node[K, V] // least recent
}
type node[K comparable, V any] struct {
key K
val V
prev *node[K, V]
next *node[K, V]
}
func New[K comparable, V any](cap int) *LRU[K, V] {
return &LRU[K, V]{cap: cap, items: make(map[K]*node[K, V])}
}
func (l *LRU[K, V]) Get(k K) (V, bool) {
l.mu.Lock()
defer l.mu.Unlock()
n, ok := l.items[k]
if !ok {
var zero V
return zero, false
}
l.moveToFront(n)
return n.val, true
}
func (l *LRU[K, V]) Set(k K, v V) {
l.mu.Lock()
defer l.mu.Unlock()
if n, ok := l.items[k]; ok {
n.val = v
l.moveToFront(n)
return
}
n := &node[K, V]{key: k, val: v}
l.items[k] = n
l.pushFront(n)
if len(l.items) > l.cap {
old := l.tail
l.remove(old)
delete(l.items, old.key)
}
}
func (l *LRU[K, V]) moveToFront(n *node[K, V]) {
l.remove(n)
l.pushFront(n)
}
func (l *LRU[K, V]) pushFront(n *node[K, V]) {
if l.head == nil {
l.head, l.tail = n, n
return
}
n.next = l.head
l.head.prev = n
l.head = n
}
func (l *LRU[K, V]) remove(n *node[K, V]) {
if n.prev != nil {
n.prev.next = n.next
} else {
l.head = n.next
}
if n.next != nil {
n.next.prev = n.prev
} else {
l.tail = n.prev
}
n.prev, n.next = nil, nil
}
About a hundred lines. Acceptable for a small service; prefer the standard libraries for anything bigger.
A small graceful-shutdown helper¶
package gs
import (
"context"
"os/signal"
"syscall"
"time"
)
type Closer interface {
Close(ctx context.Context) error
}
func RunUntilSignal(timeout time.Duration, run func(ctx context.Context) error, closers ...Closer) error {
ctx, stop := signal.NotifyContext(context.Background(),
syscall.SIGTERM, syscall.SIGINT)
defer stop()
errCh := make(chan error, 1)
go func() {
errCh <- run(ctx)
}()
select {
case err := <-errCh:
return err
case <-ctx.Done():
}
shutdownCtx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()
for _, c := range closers {
_ = c.Close(shutdownCtx)
}
return nil
}
Wire your service up like:
Each of these utilities is small enough to copy into any service. They embed steady-state discipline without requiring a big library dependency.
Pattern catalogue summary¶
This middle page covered, in roughly this order:
- Three shedding policies (shed, block, load-shed).
- Worker pool with metrics and graceful stop.
- Per-tenant semaphore with idle GC.
- Goroutine budgets via
semaphore.Weighted. sql.DBtuning (MaxOpenConns,MaxIdleConns,ConnMaxLifetime).http.Transporttuning (MaxIdleConnsPerHost,MaxConnsPerHost).- gRPC keepalive.
- Leak budgets and slope-based alerts.
- Bounded LRU caches with TTL.
- Tickers and timers done right.
- Backpressure with readiness gates.
- Background task scheduling.
- Saturation metrics per resource.
- Connection pool warm-up.
- Resource ownership patterns.
Each pattern stands on its own. The senior page composes them into architecture; the professional page tells the war stories of what happens when they are missing.
A worked example — adding steady-state to an existing service¶
To make the patterns concrete, here is how you might convert an existing "works on my laptop" Go service into a steady-state-ready production service. The before and after.
The before — works but drifts¶
package main
import (
"database/sql"
"encoding/json"
"io"
"log"
"net/http"
_ "github.com/jackc/pgx/v5/stdlib"
)
var db *sql.DB
func main() {
var err error
db, err = sql.Open("pgx", "postgres://...")
if err != nil {
log.Fatal(err)
}
http.HandleFunc("/profile", handleProfile)
log.Fatal(http.ListenAndServe(":8080", nil))
}
func handleProfile(w http.ResponseWriter, r *http.Request) {
userID := r.URL.Query().Get("user_id")
var name string
err := db.QueryRow("SELECT name FROM users WHERE id=$1", userID).Scan(&name)
if err != nil {
http.Error(w, err.Error(), 500)
return
}
resp, _ := http.Get("https://upstream/" + userID)
var data map[string]any
_ = json.NewDecoder(resp.Body).Decode(&data)
json.NewEncoder(w).Encode(map[string]any{
"name": name,
"data": data,
})
}
What's wrong:
- No
GOMEMLIMIT. - No connection pool sizing.
http.Getuses the default transport withMaxIdleConnsPerHost = 2.resp.Bodyis not drained or closed.- No timeouts on the upstream call.
- No graceful shutdown.
- No metrics.
- No pprof.
The service may work in development. In production it will drift, leak FDs, and accumulate goroutines under sustained traffic.
The after — steady-state ready¶
package main
import (
"context"
"database/sql"
"encoding/json"
"io"
"log"
"net/http"
"os"
"os/signal"
"runtime/debug"
"syscall"
"time"
_ "github.com/jackc/pgx/v5/stdlib"
_ "net/http/pprof"
)
var (
db *sql.DB
client *http.Client
)
func main() {
debug.SetMemoryLimit(int64(0.9 * 2 * 1024 * 1024 * 1024)) // 1.8 GiB
var err error
db, err = sql.Open("pgx", os.Getenv("DATABASE_URL"))
if err != nil {
log.Fatal(err)
}
db.SetMaxOpenConns(20)
db.SetMaxIdleConns(20)
db.SetConnMaxLifetime(30 * time.Minute)
db.SetConnMaxIdleTime(5 * time.Minute)
defer db.Close()
client = &http.Client{
Transport: &http.Transport{
MaxIdleConns: 100,
MaxIdleConnsPerHost: 25,
MaxConnsPerHost: 50,
IdleConnTimeout: 90 * time.Second,
},
Timeout: 5 * time.Second,
}
// Localhost-only pprof
go func() {
_ = http.ListenAndServe("127.0.0.1:6060", nil)
}()
ctx, stop := signal.NotifyContext(context.Background(),
syscall.SIGTERM, syscall.SIGINT)
defer stop()
srv := &http.Server{
Addr: ":8080",
Handler: http.HandlerFunc(handleProfile),
ReadHeaderTimeout: 5 * time.Second,
ReadTimeout: 30 * time.Second,
WriteTimeout: 30 * time.Second,
IdleTimeout: 120 * time.Second,
MaxHeaderBytes: 1 << 16,
}
go func() {
if err := srv.ListenAndServe(); err != http.ErrServerClosed {
log.Println(err)
}
}()
<-ctx.Done()
shutdownCtx, cancel := context.WithTimeout(
context.Background(), 25*time.Second)
defer cancel()
_ = srv.Shutdown(shutdownCtx)
}
func handleProfile(w http.ResponseWriter, r *http.Request) {
userID := r.URL.Query().Get("user_id")
if userID == "" {
http.Error(w, "missing user_id", http.StatusBadRequest)
return
}
ctx, cancel := context.WithTimeout(r.Context(), 3*time.Second)
defer cancel()
var name string
err := db.QueryRowContext(ctx,
"SELECT name FROM users WHERE id=$1", userID).Scan(&name)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
req, _ := http.NewRequestWithContext(ctx, "GET",
"https://upstream/"+userID, nil)
resp, err := client.Do(req)
if err != nil {
http.Error(w, err.Error(), http.StatusBadGateway)
return
}
defer func() {
io.Copy(io.Discard, resp.Body)
resp.Body.Close()
}()
var data map[string]any
if err := json.NewDecoder(resp.Body).Decode(&data); err != nil {
http.Error(w, err.Error(), http.StatusBadGateway)
return
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(map[string]any{
"name": name,
"data": data,
})
}
What changed:
GOMEMLIMITset at startup.- Database pool explicitly sized with sane defaults.
- Custom
http.Transportwith higher per-host idle conn limit. - Response body always drained and closed.
- Explicit per-call context with 3-second timeout.
- Server-side timeouts for read/write/idle.
signal.NotifyContexttriggers graceful shutdown.srv.Shutdownbounded by a 25-second deadline.- pprof on a localhost listener.
The added code is about fifty lines. The service is now production-grade.
What's still missing¶
This service is good but not great. For "great," add:
- Metrics: per-route latency, queue depth if added.
- Saturation alerts:
db.Stats().WaitCount, etc. - A worker pool with per-tenant isolation (only if multi-tenant).
- A circuit breaker for the upstream call.
- A chaos harness running in CI.
- Runbooks for each alert.
The senior page covers each of these. For most middle-tier services, the "after" code above is already a major upgrade.
Trade-off: simplicity vs robustness¶
The "after" code is more complex than the "before." This is the price of admission to steady-state engineering. Three observations:
- The complexity is largely boilerplate. Every service has a similar shape. Extract it into a shared library and the per-service cost drops.
- The complexity prevents real incidents. Each line corresponds to a class of failure that the "before" code would suffer.
- The complexity is teachable. Once you have written it once, you can write it again in fifteen minutes. The first time is the expensive one.
The trade-off is a one-time engineering cost against an ongoing operational benefit. For services with non-trivial lifetimes, the trade pays off quickly.
Closing for the middle tier¶
The middle page has been long because the middle tier is wide. The junior page is one habit applied to one thing; the middle page is the same habits applied to many things, with the trade-offs that emerge.
If you finish the middle page and feel "I understand each pattern but the connection is unclear," that is normal. The connection is the senior page's subject: architecture, alerts, harnesses, the design of a service that is steady at the system level, not just the line level.
Move on to the senior page when you are ready to think about steady-state as a property of systems, not of code.