tunny — Professional Level¶

Table of Contents¶

Introduction
Prerequisites
Operating Model
CPU-Bound Workloads in Production
Image Processing Pipelines
Integration With HTTP Handlers
Integration With gRPC
Integration With Message Queues
Observability Stack
Graceful Shutdown
Capacity Planning
Load Testing
Resource Limits and cgroups
Multi-Tenancy
Deployment Topologies
Case Study — Image Service at 100M images/day
Case Study — PDF Renderer With Heavy Memory
Case Study — JSON Schema Validator With CPU Cap
Case Study — Crypto Signing Service
Case Study — ML Inference Worker Pool
Incident Response
Postmortems
Configuration Management
Runtime Tuning
Production Best Practices
Production Anti-Patterns
Self-Assessment Checklist
Summary
Further Reading
Related Topics

Introduction¶

Focus: "How do I keep a tunny-based service healthy in production for years?"

You have used tunny in scripts, side projects, and small services. You understand the API and the internals. Now you must run it in production. The concerns shift:

It is not enough that the pool works. It must work under load, for years, across deployments, with operators on-call.
Pool sizing becomes an operational decision, revisited as traffic shifts.
Observability becomes essential — you cannot debug what you cannot see.
Shutdown must be coordinated with HTTP servers, signal handlers, and downstream systems.
Capacity planning becomes a cycle: measure, predict, adjust.

This file covers all of those. It is less about code, more about decisions. Most of the code is shape rather than completeness.

After this file you should be able to:

Deploy a tunny-based service to production with confidence.
Set up observability so the pool's behaviour is visible.
Plan capacity for predicted growth.
Handle production incidents that touch the pool.
Write a postmortem about a tunny-related outage.
Tune GOMAXPROCS, GOGC, and other runtime knobs alongside pool size.

Prerequisites¶

Comfortable with everything in junior.md, middle.md, and senior.md.
Familiar with HTTP servers in Go (net/http, or a framework).
Familiar with Prometheus / OpenTelemetry / structured logging.
Familiar with deployment systems (Kubernetes, ECS, Nomad, etc).
Have run a production service for at least three months.
Familiar with pprof, go tool trace, basic system tools (top, lsof, tcpdump at need).

This file talks at the level of a service owner, not a beginner.

Operating Model¶

A tunny-based service has the following operating-model shape:

   Inbound traffic
        │
        ▼
   HTTP / gRPC / queue consumer
        │ (one goroutine per request/message)
        ▼
   Authentication, validation, etc.
        │
        ▼
   pool.ProcessCtx(ctx, payload)   ← the bottleneck
        │
        ▼
   Worker code
        │
        ▼
   Result back to handler
        │
        ▼
   Response / ack

Every concern in this file falls along this pipeline:

Inbound shape — HTTP, gRPC, queue. Each has its own context propagation.
Pool sizing — matched to the workload's CPU/memory profile and downstream limits.
Observability — metrics at every stage.
Shutdown — orderly tear-down from front to back.

Keep this picture in mind. Every section is a refinement of one piece.

CPU-Bound Workloads¶

Tunny is at its best for CPU-bound workloads. Examples:

Image manipulation (resize, encode, watermark).
Text processing (tokenization, normalization, stemming).
Cryptographic operations (signing, verification, encryption, KDF).
Compression (gzip, zstd, brotli).
Parsing (large JSON, XML, custom formats).
ML inference (small models running in-process).

For all of these, the rule is:

Pool size = runtime.NumCPU() is a safe default.
Each worker holds reusable buffers/codecs as needed.
HTTP / gRPC handlers call ProcessCtx with the request context.
Memory footprint = pool_size * per_worker_state.

A common pitfall: sizing the pool to the request rate rather than the CPU budget. If you serve 1000 RPS, that does not mean you need 1000 workers. If each call is 10 ms and you have 8 cores, you need ~8 workers (perhaps with a buffer of 16 for variance).

A specific calculation:

RPS * call_time = work_capacity_needed
1000 RPS * 10 ms = 10 worker-seconds of work per second
On 8 cores, you can do up to 8 worker-seconds of work per second
Therefore: you cannot keep up at this rate. You need 10+ cores, or to reduce call_time, or to drop traffic.

This basic queueing-theory exercise is the foundation of capacity planning. Run it for your workload before deploying.

Image Processing Pipelines¶

The canonical tunny use case. A few patterns from real deployments:

Pattern: Single pool, multi-format¶

If most images are similar in cost, one pool suffices. The worker decodes whatever format arrives.

type ImageWorker struct {
    buf bytes.Buffer
}

func (w *ImageWorker) Process(p any) any {
    job := p.(ImageJob)
    w.buf.Reset()
    f, err := os.Open(job.InPath)
    if err != nil {
        return ImageResult{Err: err}
    }
    defer f.Close()
    if _, err := w.buf.ReadFrom(f); err != nil {
        return ImageResult{Err: err}
    }
    img, format, err := image.Decode(&w.buf)
    if err != nil {
        return ImageResult{Err: err}
    }
    return processImage(img, format, job)
}

Pool sized to NumCPU. Each worker has its own buffer.

Pattern: Format-segregated pools¶

If formats have very different costs (PNG is 5x slower than JPEG in your benchmarks), separate them. Each pool can be sized for its workload.

type Service struct {
    jpeg *tunny.Pool
    png  *tunny.Pool
    webp *tunny.Pool
}

Per-pool sizing means no head-of-line blocking across formats. Heavy PNG traffic does not starve JPEG callers.

Pattern: Pre-resize before re-encode¶

For thumbnail services, often the input is huge (e.g. 20 MP) and the output is small (e.g. 200x200). Decoding the whole thing wastes CPU. Use a decoder that supports fractional decode:

func (w *ImageWorker) Process(p any) any {
    job := p.(ImageJob)
    // libjpeg-turbo or similar supports DCT-domain shrinking
    img, err := decodeShrunk(job.InData, job.OutSize)
    if err != nil {
        return ImageResult{Err: err}
    }
    // resize the smaller image
    return resize(img, job.OutSize)
}

This is a worker-level optimisation that can give 4x speedup. The pool size stays the same; throughput goes up.

Pattern: Streaming uploads with pool offload¶

Inbound HTTP requests contain raw image data. The handler reads the body, then passes it to the pool. The pool does decode+process+encode. The handler writes the result back.

http.HandleFunc("/resize", func(w http.ResponseWriter, r *http.Request) {
    data, err := io.ReadAll(r.Body)
    if err != nil { ... }
    result, err := svc.pool.ProcessCtx(r.Context(), data)
    if err != nil { ... }
    w.Write(result.([]byte))
})

A few decisions:

Should we read the body in the handler or in the worker? In the worker, the read happens in pool-capacity time. In the handler, the read happens regardless of pool capacity. For small payloads, either is fine.
Should we limit body size? Always, with http.MaxBytesReader. Otherwise a 1 GB upload can OOM your service.

Integration With HTTP Handlers¶

The canonical pattern:

type Service struct {
    pool *tunny.Pool
}

func (s *Service) handler(w http.ResponseWriter, r *http.Request) {
    payload, err := parseRequest(r)
    if err != nil {
        http.Error(w, err.Error(), 400)
        return
    }

    ctx, cancel := context.WithTimeout(r.Context(), 10*time.Second)
    defer cancel()

    result, err := s.pool.ProcessCtx(ctx, payload)
    if err != nil {
        switch {
        case errors.Is(err, context.Canceled):
            // client gave up
            return
        case errors.Is(err, context.DeadlineExceeded):
            http.Error(w, "timeout", 504)
            return
        case errors.Is(err, tunny.ErrPoolNotRunning):
            http.Error(w, "shutting down", 503)
            return
        }
        http.Error(w, err.Error(), 500)
        return
    }

    writeResponse(w, result)
}

Key points:

Use r.Context() — propagates client disconnect.
Wrap in WithTimeout — clamps maximum work duration.
Map error categories to HTTP status codes.
Body size limits up front.

For high-throughput services, also consider:

Pre-validation in the handler so bad inputs do not reach the pool.
Admission control via QueueLength — return 503 if too many are waiting.
Per-tenant pools if SLA isolation is needed.

Admission control¶

A simple admission rule:

if s.pool.QueueLength() > s.maxQueue {
    w.Header().Set("Retry-After", "5")
    http.Error(w, "busy", 503)
    return
}

The service stays responsive even under overload. Callers retry later, when the queue has drained.

Choosing maxQueue is a trade-off:

Too low: you reject traffic you could have served.
Too high: queue time exceeds caller's timeout, you do work nobody wants.

Rule of thumb: maxQueue = pool_size * avg_call_time / target_p99_latency. For 8 workers, 100 ms calls, 1 s target: maxQueue = 80. Round to 100.

Integration With gRPC¶

gRPC's interceptors give you a clean place to insert the pool:

func PoolInterceptor(pool *tunny.Pool) grpc.UnaryServerInterceptor {
    return func(ctx context.Context, req any, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (any, error) {
        result, err := pool.ProcessCtx(ctx, request{ctx: ctx, req: req, handler: handler})
        if err != nil {
            return nil, mapError(err)
        }
        r := result.(response)
        return r.resp, r.err
    }
}

Or you can put the pool inside specific RPC methods, depending on whether all RPCs are CPU-bound.

For streaming RPCs, the pool is less natural — each message in the stream is a separate call, but the stream itself is long-lived. You typically iterate messages and submit each to the pool. Backpressure on the stream comes from the pool naturally.

Integration With Message Queues¶

A common batch-processing shape:

func consume(ctx context.Context, pool *tunny.Pool, q Queue) {
    for {
        msg, err := q.Read(ctx)
        if err != nil {
            return
        }
        result, err := pool.ProcessCtx(ctx, msg)
        if err != nil {
            q.Nack(msg)
            continue
        }
        q.Ack(msg, result)
    }
}

Multiple consumer goroutines, all sharing the same pool. The pool caps in-flight work; the queue provides durable buffering.

Variations:

Parallel consumers — N goroutines reading from the queue, all submitting to the pool. The pool's size is independent of N.
At-most-once vs at-least-once — handled at the queue layer (when do you ack?), not in tunny.
Batch ack — accumulate results, ack in batches for throughput.

This is the shape used for image processing pipelines that ingest from S3 events, Kafka, NATS, etc.

Observability Stack¶

You cannot operate tunny in production without observability. Three things you must export:

1. Pool size and queue length¶

poolSizeGauge.Set(float64(pool.GetSize()))
queueLengthGauge.Set(float64(pool.QueueLength()))

Update these on a timer (every 5-15 seconds) or expose as Prometheus gauges that call the methods directly.

2. Call durations¶

Wrap Process calls:

func (s *Service) Do(ctx context.Context, in In) (Out, error) {
    timer := callDurationHistogram.WithLabelValues(s.name).NewTimer()
    defer timer.ObserveDuration()
    // ... ProcessCtx
}

Histogram with buckets reflecting your expected distribution. For 10-100 ms calls, buckets like [1, 5, 10, 25, 50, 100, 250, 500, 1000] ms.

3. Outcomes (success/error/timeout)¶

outcomeCounter.WithLabelValues(s.name, outcome).Inc()

Where outcome is one of ok, error, timeout, cancelled, pool_closed.

These three families of metrics let you build dashboards:

Throughput — outcome counters integrated over time.
Latency distribution — histogram percentiles.
Saturation — queue length vs pool size.

Set alerts on:

queue_length > pool_size * 5 for more than 1 minute (saturation).
p99_latency > target for more than 5 minutes (degradation).
outcome:error rate > 1% for more than 1 minute (failure).
outcome:timeout rate > 5% for more than 1 minute (cap problem).

Graceful Shutdown¶

Production services receive SIGTERM. The pool must be closed gracefully — finish in-flight work, then exit. The standard shape:

func main() {
    pool := tunny.New(runtime.NumCPU(), workerFactory)
    server := &http.Server{Addr: ":8080", Handler: makeMux(pool)}

    go func() {
        if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
            log.Fatal(err)
        }
    }()

    sig := make(chan os.Signal, 1)
    signal.Notify(sig, syscall.SIGINT, syscall.SIGTERM)
    <-sig

    log.Println("shutdown initiated")

    // Phase 1: stop accepting new HTTP requests.
    shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    if err := server.Shutdown(shutdownCtx); err != nil {
        log.Printf("server shutdown: %v", err)
    }

    // Phase 2: drain in-flight pool work.
    for pool.QueueLength() > 0 {
        select {
        case <-shutdownCtx.Done():
            log.Println("forced close, queue still nonzero")
            break
        case <-time.After(100 * time.Millisecond):
        }
    }

    // Phase 3: close the pool.
    pool.Close()

    log.Println("shutdown complete")
}

Three phases:

Stop accepting new traffic (HTTP server shutdown).
Drain in-flight work (wait for queue to empty).
Close the pool (releases workers).

The shutdown context bounds the whole thing. If draining takes too long, you force-close.

In Kubernetes:

Configure terminationGracePeriodSeconds longer than the worst-case Process time.
Use a preStop hook for additional delay if needed.
The shutdown context should be shorter than the grace period.

Capacity Planning¶

Capacity planning answers: "How big should my service be?" For a tunny-based service:

Measure call duration — p50, p95, p99.
Measure call rate — requests/sec, peak and sustained.
Compute work-seconds needed — rate * duration.
Choose pool size — at least work_seconds_per_second * safety_margin.
Choose number of replicas — to provide redundancy and headroom.

Worked example:

Service receives 500 RPS peak.
Each call takes 50 ms (p95).
Work-seconds/sec = 500 * 0.05 = 25.
On 8-core pods, each pod provides 8 work-seconds/sec.
Need at least 4 pods (32 work-seconds). For redundancy, 6.
Pool size on each pod: 8 (one per core).

For headroom against bursts, you might want 8 pods. Cost vs latency trade-off.

This is queueing theory in disguise. The Erlang-C formula and M/M/c models apply. Most engineers do not need the formulas; an intuitive sizing followed by load tests is enough.

Load Testing¶

Test under realistic load before deploying. Tools: hey, wrk, k6, vegeta, custom Go programs.

A simple hey invocation:

hey -c 50 -z 30s -m POST -D image.jpg http://service/resize

50 concurrent clients, 30 seconds. Measure p95, p99, error rate. Compare to single-pod capacity.

Sanity test: monotonic load. Start at 10 RPS, double every minute until errors appear. Find the breaking point. Set production traffic well below.

Burst test: send a sudden 10x burst. Observe how queue length grows and whether latency degrades gracefully or catastrophically.

Long-running soak test: sustain expected production traffic for 24 hours. Look for memory growth, file descriptor leaks, slow degradation.

These three (linear, burst, soak) are the minimum before declaring a tunny-based service production-ready.

Resource Limits and cgroups¶

In containers, runtime.NumCPU() returns the host's CPU count, not the container's CPU quota. This is a footgun:

Container has 1 CPU quota.
runtime.NumCPU() returns 64.
Pool is sized to 64.
Service is wildly oversubscribed; latency suffers.

Use runtime.GOMAXPROCS(0) after setting it correctly, or use a library like automaxprocs:

import _ "go.uber.org/automaxprocs"

This sets GOMAXPROCS from the container's CPU quota. Then size your pool:

pool := tunny.New(runtime.GOMAXPROCS(0), workerFactory)

This way, your pool size matches reality.

Similarly for memory: set Go's GOMEMLIMIT to a fraction of the container's memory limit. Otherwise the GC may run too aggressively or too laxly.

Multi-Tenancy¶

If your service serves multiple tenants and they have SLA isolation needs, consider:

Option A — shared pool¶

All tenants use the same pool. Simplicity. Risk: a noisy tenant degrades others.

Option B — per-tenant pool¶

Each tenant gets a small pool. Isolation. Cost: many pools, more goroutines, more operational complexity.

Option C — weighted shared pool¶

Use BlockUntilReady with a weighted limiter per tenant. Complex but flexible.

The right answer depends on:

Number of tenants (handful → per-tenant; thousands → weighted shared).
SLA strictness (strict → isolated).
Cost sensitivity (high → shared).

In practice, option B works for B2B services with <100 tenants. Option A works for everything else.

Deployment Topologies¶

A few production topologies:

Topology 1 — Single binary, single pool¶

The most common. One service, one pool. Scales horizontally by deploying more pods.

Topology 2 — Single binary, multiple pools¶

The service has several workloads (image resize, hash, validate). Each has its own pool. Independent tuning.

Topology 3 — Sidecar pool¶

A small "compute" sidecar runs tunny; other services call it. Useful when the compute workload is shared across services.

Topology 4 — Job consumer¶

The service is a queue consumer, not an HTTP server. Tunny is the in-process compute layer. Scales by adding consumers, not request handlers.

Topology 5 — Lambda / serverless¶

Tunny inside a serverless function. Cold-start cost. Limited by function runtime memory. Niche use case.

Pick the topology that matches your traffic pattern. Default to Topology 1.

Case Study — Image Service at 100M images/day¶

A real production deployment processes 100 million images per day. Topology:

50 Kubernetes pods, each with 8 CPU cores and 16 GB memory.
Each pod runs a tunny pool of size 8.
Each worker holds a bytes.Buffer and a image.Image slot for reuse.
Inbound: HTTPS from CDN. Outbound: to object storage.
Average call duration: 75 ms (decode + resize + encode).

Throughput per pod: 8 cores * (1000 / 75) ms/sec = ~106 RPS. Total throughput: 50 * 106 = ~5300 RPS. Daily: 5300 * 86400 = ~458M. Plenty of headroom over 100M.

The pool size of 8 is exactly NumCPU. Larger sizes were tested; they hurt p99 latency due to context switching.

Observability: Prometheus exports queue length, call duration histogram, outcome counters. Grafana dashboards show real-time saturation. Alerts on queue length > 50 sustained.

Shutdown: 30 s grace period, 5 s drain timeout, 5 s pool close timeout. Most pods complete in flight in under 5 s.

Lessons:

Pool size matters more than replica count for latency.
Per-worker buffers eliminate most GC pressure.
Queue length is the single most important metric.
Soak testing for 24 hours caught a slow goroutine leak we would not have seen otherwise.

Case Study — PDF Renderer With Heavy Memory¶

PDF generation is memory-hungry: each render allocates 100-500 MB temporary. With unbounded concurrency you OOM.

Setup:

Pods with 64 GB memory.
Pool size 8 — limits in-flight renders to 8 * 500 MB = 4 GB peak.
Pool size came from min(NumCPU, MemoryBudget / PerRenderMem).

Workers do not reuse buffers (PDF renderer is third-party C code; we cannot control its allocations). Memory pressure is the dominant constraint, not CPU.

Lesson: pool size is not always CPU-driven. Memory can be the constraint. Profile peak memory per call and divide your budget.

Case Study — JSON Schema Validator With CPU Cap¶

A validator service receives JSON documents and validates against schemas. Each validation: 1-10 ms CPU. The schemas are large (megabytes) and pre-loaded.

Setup:

Pods with 16 CPU cores.
Pool size: 32 (2x NumCPU).
Each worker holds a compiled schema validator. The compilation is expensive (~5 s) so worker construction is slow at startup.

Why 2x NumCPU? Validation has some IO (occasional schema lookup) and the 2x buffer keeps CPUs busy during the brief waits.

Startup time: 5 s schema compilation per worker, parallelised → ~5 s total cold-start. Health probe gated on pool readiness.

Lesson: factory time matters. Slow factories mean slow cold starts. If you can pre-compile and ship a binary blob, do so. If not, accept the cold-start latency.

Case Study — Crypto Signing Service¶

A service signs payloads with a private key. Signing is CPU-bound (a few ms per call) and the key is held in memory (or in an HSM).

Setup:

Pool size: NumCPU. Each worker holds a reference to the shared key.
If using HSM, each worker holds an HSM session. Pool size matched to HSM session limit.
Signing latency: 2-5 ms.

BlockUntilReady enforces HSM rate limit if it has one.

Terminate returns the HSM session.

Lesson: when external resources gate your pool size, the pool size matches the external limit, not local CPU.

Case Study — ML Inference Worker Pool¶

A small image classification model (50 MB) is loaded per worker. Inference takes 10-50 ms.

Setup:

Pool size: NumCPU / 2 (because the model fits in L3 cache shared by pairs of cores; oversubscribing causes cache contention).
Model loaded once per worker, in the factory. ~200 ms cold start.
Inputs pre-resized; outputs are JSON.

Memory: 50 MB * pool_size. On an 8-core pod, 200 MB for models, well within budget.

Lesson: cache-aware sizing can matter for ML workloads. NumCPU / 2 is not arbitrary — it matches the cache topology.

Incident Response¶

When a tunny-related incident fires, the playbook:

Check queue length. If high and growing, the pool is saturated.
Check call duration. If high, the work itself is slow — investigate downstream.
Check error rate. If high, work is failing — look at logs.
Check pod count. If low (e.g. due to OOM kills), the service is underprovisioned for traffic.
Check GC pause times. If high, allocation pressure is the problem.

Mitigation toolbox:

Scale up pods. Linear capacity. Slow (minutes to spin up).
Increase pool size. Free CPU is used. Risk: over-subscription.
Drop traffic. Admission control returns 503. Saves the service at the cost of users.
Roll back. If the issue started with a deploy, revert.

After mitigation, postmortem. Always.

Postmortems¶

A tunny incident postmortem template:

Timeline. When did metrics deviate? When did alerts fire? When did mitigation start? When did the incident end?
Impact. How many requests, how many users, how much money?
Root cause. What changed in the system? What was the proximate cause?
Contributing factors. What weakened the system before the change?
Detection. How did we notice? Could we have noticed sooner?
Mitigation. What did we do? What worked?
Action items. Specific, owned, dated.

Tunny-related action items often include:

"Add a queue-length alert at 50."
"Document the pool size for this workload and require sign-off to change."
"Add a load test that simulates the failure pattern."
"Add a circuit breaker upstream of the pool."

Repetition matters. Each postmortem makes the next outage less bad.

Configuration Management¶

Pool size should be configurable, not hardcoded. Suggested config shape:

pool:
  size: 8       # or "auto" for NumCPU
  max_queue: 100
  process_timeout: 5s

Read at startup:

type Config struct {
    Pool struct {
        Size           string        `yaml:"size"`
        MaxQueue       int           `yaml:"max_queue"`
        ProcessTimeout time.Duration `yaml:"process_timeout"`
    } `yaml:"pool"`
}

func PoolSize(cfg Config) int {
    if cfg.Pool.Size == "auto" {
        return runtime.NumCPU()
    }
    n, err := strconv.Atoi(cfg.Pool.Size)
    if err != nil || n < 1 {
        log.Fatalf("bad pool size: %s", cfg.Pool.Size)
    }
    return n
}

Reading from environment variables similarly. Avoid hardcoded magic numbers.

For dynamic environments (Kubernetes), expose these as ConfigMap entries. Restart pods to pick up changes; tunny does not support hot-reloading config (and even if it did, SetSize is the wrong tool for "I changed my mind about sizing").

Runtime Tuning¶

GOMAXPROCS interacts with pool size:

GOMAXPROCS=1 and pool.size=8 → 8 workers but only 1 can run at a time. Pool size is wasted.
GOMAXPROCS=8 and pool.size=1 → 1 worker uses 1 core; 7 cores idle.
GOMAXPROCS=8 and pool.size=8 → ideal for CPU-bound.

Set GOMAXPROCS from the container's CPU quota (automaxprocs). Set pool size to match. They should agree.

GOGC (default 100) controls GC frequency. For low-latency services, set it higher (200-400) to reduce GC pause frequency at the cost of more memory. For memory-constrained services, leave it at 100.

GOMEMLIMIT (Go 1.19+) caps total memory. Set to ~90% of container limit. Above this, GC runs more aggressively.

These three knobs (GOMAXPROCS, GOGC, GOMEMLIMIT) plus pool size are the main tunables. Tune them together.

Production Best Practices¶

One pool per workload. Multiple pools per service is fine; one pool serving many workloads is a code smell.
Size pools to constraints, not traffic. CPU, memory, downstream limits — pick the binding constraint.
Always use ProcessCtx in HTTP handlers. Pass r.Context().
Always recover panics in Process. Production services should not crash on bad input.
Always close the pool in shutdown. Graceful shutdown means no work in flight when Close is called.
Export queue length as a metric. It is your primary saturation signal.
Export call duration as a histogram. With buckets matching your SLO.
Alert on saturation, not just errors. Errors are the lagging indicator; saturation is the leading.
Test under load before deploying. A pool that works at 10 RPS may not at 1000.
Profile in production. pprof over HTTPS, rate-limited, behind auth.

Production Anti-Patterns¶

Pool created inside HTTP handler. Goroutine leak. Performance disaster.
Pool size from os.Getenv without validation. WORKERS=abc should not parse to runtime.NumCPU() silently.
No shutdown handling. SIGTERM kills work in flight, leaves transactions abandoned.
No observability. You cannot debug what you cannot see.
One pool for everything. Heavy work blocks light work. Mixed metrics.
Pool size = host CPU count in containers. Use container quota, not host.
No load tests. Real traffic is the first stress test. Don't.
No admission control. Queue grows unboundedly until OOM.
Worker that calls back into the pool. Deadlock under saturation.
No alerting on queue length. Outage rolls in silently.

If any of these are in your service, fix them before tomorrow.

Self-Assessment Checklist¶

If you can check all ten boxes, your tunny deployment is genuinely production-ready.

Summary¶

Tunny in production is about operations, not just code.
Pool size is the most important decision; tune to constraints.
Observability (queue length, call duration, outcome) is essential.
Graceful shutdown is a three-phase dance: stop accepting, drain, close.
Capacity planning is queueing theory in light disguise.
Common workloads (images, PDFs, validation, crypto, ML) have specific patterns.
Common pitfalls (pool per handler, no shutdown, no alerts) are well-known.

If you internalise just one thing from this file: observability and capacity planning are not optional. They are the difference between a tunny service that runs for years and one that fires the on-call every month.

Extended Walkthrough — Building a Production Image Service End-to-End¶

Time to put everything together. Below is a complete production-shape image service. We will build it in stages and discuss every operational decision along the way.

Stage 1 — directory structure¶

imgservice/
├── cmd/
│   └── server/
│       └── main.go
├── internal/
│   ├── config/
│   │   └── config.go
│   ├── pool/
│   │   └── pool.go
│   ├── server/
│   │   └── server.go
│   ├── metrics/
│   │   └── metrics.go
│   └── shutdown/
│       └── shutdown.go
├── go.mod
└── Dockerfile

Five packages: config, pool, server, metrics, shutdown. Each with one responsibility.

Stage 2 — `config.go`¶

package config

import (
    "fmt"
    "os"
    "runtime"
    "strconv"
    "time"

    "gopkg.in/yaml.v3"
)

type Config struct {
    HTTP struct {
        Addr            string        `yaml:"addr"`
        ReadTimeout     time.Duration `yaml:"read_timeout"`
        WriteTimeout    time.Duration `yaml:"write_timeout"`
        ShutdownTimeout time.Duration `yaml:"shutdown_timeout"`
    } `yaml:"http"`
    Pool struct {
        Size           string        `yaml:"size"`
        MaxQueue       int           `yaml:"max_queue"`
        ProcessTimeout time.Duration `yaml:"process_timeout"`
    } `yaml:"pool"`
    Metrics struct {
        Addr string `yaml:"addr"`
    } `yaml:"metrics"`
}

func Load(path string) (*Config, error) {
    f, err := os.Open(path)
    if err != nil {
        return nil, err
    }
    defer f.Close()
    var cfg Config
    if err := yaml.NewDecoder(f).Decode(&cfg); err != nil {
        return nil, err
    }
    if cfg.HTTP.Addr == "" {
        cfg.HTTP.Addr = ":8080"
    }
    if cfg.Metrics.Addr == "" {
        cfg.Metrics.Addr = ":9090"
    }
    if cfg.HTTP.ReadTimeout == 0 {
        cfg.HTTP.ReadTimeout = 30 * time.Second
    }
    if cfg.HTTP.WriteTimeout == 0 {
        cfg.HTTP.WriteTimeout = 30 * time.Second
    }
    if cfg.HTTP.ShutdownTimeout == 0 {
        cfg.HTTP.ShutdownTimeout = 30 * time.Second
    }
    if cfg.Pool.ProcessTimeout == 0 {
        cfg.Pool.ProcessTimeout = 10 * time.Second
    }
    if cfg.Pool.MaxQueue == 0 {
        cfg.Pool.MaxQueue = 100
    }
    return &cfg, nil
}

func (c *Config) PoolSize() int {
    if c.Pool.Size == "" || c.Pool.Size == "auto" {
        return runtime.NumCPU()
    }
    n, err := strconv.Atoi(c.Pool.Size)
    if err != nil {
        panic(fmt.Errorf("bad pool size: %w", err))
    }
    if n < 1 {
        panic("pool size must be >= 1")
    }
    return n
}

Default values for everything. auto resolves to NumCPU. Validation happens at load time so the service fails fast on bad config.

Stage 3 — `pool.go`¶

package pool

import (
    "bytes"
    "context"
    "fmt"
    "image"
    _ "image/jpeg"
    "image/png"
    "io"
    "sync"
    "time"

    "github.com/Jeffail/tunny"
    "golang.org/x/image/draw"
)

type Job struct {
    Data   []byte
    Width  int
    Height int
}

type Result struct {
    Out []byte
    Err error
}

type worker struct {
    inBuf  bytes.Buffer
    outBuf bytes.Buffer

    mu     sync.Mutex
    cancel context.CancelFunc
}

func (w *worker) Process(p any) any {
    job := p.(Job)
    ctx, cancel := context.WithCancel(context.Background())
    w.mu.Lock()
    w.cancel = cancel
    w.mu.Unlock()
    defer func() {
        w.mu.Lock()
        w.cancel = nil
        w.mu.Unlock()
        cancel()
    }()
    defer func() {
        if r := recover(); r != nil {
            // (caught here, return below — but we already returned)
            _ = r // already logged by metrics layer
        }
    }()
    return w.process(ctx, job)
}

func (w *worker) process(ctx context.Context, job Job) Result {
    w.inBuf.Reset()
    w.inBuf.Write(job.Data)

    type imgRes struct {
        img image.Image
        err error
    }
    ch := make(chan imgRes, 1)
    go func() {
        img, _, err := image.Decode(&w.inBuf)
        ch <- imgRes{img, err}
    }()

    var src image.Image
    select {
    case <-ctx.Done():
        return Result{Err: ctx.Err()}
    case r := <-ch:
        if r.err != nil {
            return Result{Err: fmt.Errorf("decode: %w", r.err)}
        }
        src = r.img
    }

    dst := image.NewRGBA(image.Rect(0, 0, job.Width, job.Height))
    draw.CatmullRom.Scale(dst, dst.Bounds(), src, src.Bounds(), draw.Over, nil)

    w.outBuf.Reset()
    if err := png.Encode(&w.outBuf, dst); err != nil {
        return Result{Err: fmt.Errorf("encode: %w", err)}
    }

    // Copy to avoid sharing internal buffer with caller.
    out := make([]byte, w.outBuf.Len())
    copy(out, w.outBuf.Bytes())
    return Result{Out: out}
}

func (w *worker) BlockUntilReady() {}

func (w *worker) Interrupt() {
    w.mu.Lock()
    if w.cancel != nil {
        w.cancel()
    }
    w.mu.Unlock()
}

func (w *worker) Terminate() {}

type Pool struct {
    inner    *tunny.Pool
    maxQueue int
    timeout  time.Duration
}

func New(size, maxQueue int, timeout time.Duration) *Pool {
    return &Pool{
        inner: tunny.New(size, func() tunny.Worker {
            return &worker{}
        }),
        maxQueue: maxQueue,
        timeout:  timeout,
    }
}

var ErrBusy = fmt.Errorf("pool busy")

func (p *Pool) Resize(ctx context.Context, data []byte, w, h int) ([]byte, error) {
    if int(p.inner.QueueLength()) > p.maxQueue {
        return nil, ErrBusy
    }
    ctx, cancel := context.WithTimeout(ctx, p.timeout)
    defer cancel()
    r, err := p.inner.ProcessCtx(ctx, Job{Data: data, Width: w, Height: h})
    if err != nil {
        return nil, err
    }
    res := r.(Result)
    return res.Out, res.Err
}

func (p *Pool) QueueLength() int64 { return p.inner.QueueLength() }
func (p *Pool) Size() int          { return p.inner.GetSize() }

func (p *Pool) Close() { p.inner.Close() }

// ReadAllLimited reads up to maxBytes from r.
func ReadAllLimited(r io.Reader, maxBytes int64) ([]byte, error) {
    return io.ReadAll(io.LimitReader(r, maxBytes))
}

Lots happening:

Per-worker inBuf and outBuf for reuse.
Interrupt cancels a per-call context.
The decode runs in a sub-goroutine because Go's image decoders are not context-aware. The select lets us bail out promptly.
ErrBusy is returned when the queue is too long.
Per-call timeout via context.WithTimeout.
Output is copied to avoid sharing the internal buffer.

Stage 4 — `metrics.go`¶

package metrics

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
)

var (
    PoolSize = promauto.NewGauge(prometheus.GaugeOpts{
        Name: "imgsvc_pool_size",
        Help: "Configured pool size",
    })
    PoolQueue = promauto.NewGauge(prometheus.GaugeOpts{
        Name: "imgsvc_pool_queue_length",
        Help: "Current queue length",
    })
    ProcessDuration = promauto.NewHistogramVec(prometheus.HistogramOpts{
        Name:    "imgsvc_process_duration_seconds",
        Help:    "Process call duration",
        Buckets: []float64{0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10},
    }, []string{"outcome"})
    RequestCounter = promauto.NewCounterVec(prometheus.CounterOpts{
        Name: "imgsvc_requests_total",
        Help: "Total requests",
    }, []string{"outcome"})
)

Three metric families: pool size, queue length, request distribution. Each is exposed by promauto (which auto-registers).

Stage 5 — `server.go`¶

package server

import (
    "errors"
    "io"
    "log"
    "net/http"
    "strconv"
    "time"

    "imgservice/internal/metrics"
    "imgservice/internal/pool"

    "github.com/Jeffail/tunny"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

const maxUpload = 10 * 1024 * 1024 // 10 MB

type Server struct {
    Pool *pool.Pool
}

func (s *Server) Routes() http.Handler {
    mux := http.NewServeMux()
    mux.HandleFunc("/resize", s.handleResize)
    mux.HandleFunc("/healthz", s.handleHealthz)
    return mux
}

func (s *Server) handleResize(w http.ResponseWriter, r *http.Request) {
    start := time.Now()
    outcome := "ok"
    defer func() {
        metrics.ProcessDuration.WithLabelValues(outcome).Observe(time.Since(start).Seconds())
        metrics.RequestCounter.WithLabelValues(outcome).Inc()
    }()

    width, _ := strconv.Atoi(r.URL.Query().Get("w"))
    height, _ := strconv.Atoi(r.URL.Query().Get("h"))
    if width <= 0 || height <= 0 || width > 4096 || height > 4096 {
        outcome = "bad_request"
        http.Error(w, "invalid size", 400)
        return
    }

    r.Body = http.MaxBytesReader(w, r.Body, maxUpload)
    data, err := io.ReadAll(r.Body)
    if err != nil {
        outcome = "bad_body"
        http.Error(w, err.Error(), 400)
        return
    }

    out, err := s.Pool.Resize(r.Context(), data, width, height)
    if err != nil {
        switch {
        case errors.Is(err, pool.ErrBusy):
            outcome = "busy"
            w.Header().Set("Retry-After", "5")
            http.Error(w, "busy", 503)
        case errors.Is(err, context.DeadlineExceeded):
            outcome = "timeout"
            http.Error(w, "timeout", 504)
        case errors.Is(err, context.Canceled):
            outcome = "cancelled"
            // client gave up; no response needed
        case errors.Is(err, tunny.ErrPoolNotRunning):
            outcome = "shutting_down"
            http.Error(w, "shutting down", 503)
        default:
            outcome = "error"
            log.Printf("resize error: %v", err)
            http.Error(w, "server error", 500)
        }
        return
    }

    w.Header().Set("Content-Type", "image/png")
    w.Header().Set("Content-Length", strconv.Itoa(len(out)))
    if _, err := w.Write(out); err != nil {
        outcome = "write_error"
        return
    }
}

func (s *Server) handleHealthz(w http.ResponseWriter, r *http.Request) {
    if int(s.Pool.QueueLength()) > s.Pool.Size()*10 {
        http.Error(w, "overloaded", 503)
        return
    }
    w.Write([]byte("ok"))
}

func MetricsHandler() http.Handler { return promhttp.Handler() }

The handler:

Times the request from start to end.
Labels by outcome for metrics.
Limits body size with http.MaxBytesReader.
Maps internal errors to HTTP status codes.
Health probe checks queue saturation.

Stage 6 — `shutdown.go`¶

package shutdown

import (
    "context"
    "log"
    "net/http"
    "os"
    "os/signal"
    "syscall"
    "time"

    "imgservice/internal/pool"
)

func Wait(srv *http.Server, metricsSrv *http.Server, p *pool.Pool, grace time.Duration) {
    sig := make(chan os.Signal, 1)
    signal.Notify(sig, syscall.SIGINT, syscall.SIGTERM)
    <-sig
    log.Println("shutdown initiated")

    ctx, cancel := context.WithTimeout(context.Background(), grace)
    defer cancel()

    // 1. Stop accepting traffic.
    if err := srv.Shutdown(ctx); err != nil {
        log.Printf("http shutdown: %v", err)
    }
    if err := metricsSrv.Shutdown(ctx); err != nil {
        log.Printf("metrics shutdown: %v", err)
    }

    // 2. Drain in-flight pool work.
    for p.QueueLength() > 0 {
        select {
        case <-ctx.Done():
            log.Println("drain timeout, forcing close")
            p.Close()
            return
        case <-time.After(100 * time.Millisecond):
        }
    }

    // 3. Close the pool.
    p.Close()
    log.Println("shutdown complete")
}

Three phases as described earlier, with a hard deadline.

Stage 7 — `main.go`¶

package main

import (
    "context"
    "flag"
    "log"
    "net/http"
    "time"

    "imgservice/internal/config"
    "imgservice/internal/metrics"
    "imgservice/internal/pool"
    "imgservice/internal/server"
    "imgservice/internal/shutdown"

    _ "go.uber.org/automaxprocs"
)

func main() {
    cfgPath := flag.String("config", "/etc/imgservice/config.yaml", "config file")
    flag.Parse()

    cfg, err := config.Load(*cfgPath)
    if err != nil {
        log.Fatalf("config: %v", err)
    }

    p := pool.New(cfg.PoolSize(), cfg.Pool.MaxQueue, cfg.Pool.ProcessTimeout)
    metrics.PoolSize.Set(float64(p.Size()))

    // Background ticker for queue length.
    go func() {
        t := time.NewTicker(5 * time.Second)
        defer t.Stop()
        for range t.C {
            metrics.PoolQueue.Set(float64(p.QueueLength()))
        }
    }()

    srv := &http.Server{
        Addr:         cfg.HTTP.Addr,
        Handler:      (&server.Server{Pool: p}).Routes(),
        ReadTimeout:  cfg.HTTP.ReadTimeout,
        WriteTimeout: cfg.HTTP.WriteTimeout,
    }
    metricsSrv := &http.Server{
        Addr:    cfg.Metrics.Addr,
        Handler: server.MetricsHandler(),
    }

    go func() {
        log.Printf("http listening on %s", cfg.HTTP.Addr)
        if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
            log.Fatalf("http: %v", err)
        }
    }()
    go func() {
        log.Printf("metrics listening on %s", cfg.Metrics.Addr)
        if err := metricsSrv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
            log.Fatalf("metrics: %v", err)
        }
    }()

    shutdown.Wait(srv, metricsSrv, p, cfg.HTTP.ShutdownTimeout)
    _ = context.Background()
}

Wiring: load config, create pool, start servers, wait for shutdown. The whole main.go is 60 lines.

Stage 8 — Dockerfile¶

FROM golang:1.22-alpine AS build
WORKDIR /src
COPY . .
RUN CGO_ENABLED=0 go build -o /out/server ./cmd/server

FROM gcr.io/distroless/static
COPY --from=build /out/server /server
EXPOSE 8080 9090
ENTRYPOINT ["/server", "--config=/etc/imgservice/config.yaml"]

Multi-stage build. Distroless final image (~20 MB).

Stage 9 — Kubernetes manifest¶

apiVersion: apps/v1
kind: Deployment
metadata:
  name: imgservice
spec:
  replicas: 4
  selector:
    matchLabels: {app: imgservice}
  template:
    metadata:
      labels: {app: imgservice}
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - name: server
        image: imgservice:latest
        resources:
          requests: {cpu: "4", memory: "2Gi"}
          limits: {cpu: "4", memory: "4Gi"}
        ports:
        - containerPort: 8080
        - containerPort: 9090
        livenessProbe:
          httpGet: {path: /healthz, port: 8080}
          periodSeconds: 10
        readinessProbe:
          httpGet: {path: /healthz, port: 8080}
          periodSeconds: 5
        volumeMounts:
        - name: config
          mountPath: /etc/imgservice
      volumes:
      - name: config
        configMap:
          name: imgservice-config
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: imgservice-config
data:
  config.yaml: |
    http:
      addr: ":8080"
    pool:
      size: "auto"
      max_queue: 100
      process_timeout: 10s
    metrics:
      addr: ":9090"

Notes:

terminationGracePeriodSeconds: 60 allows the service to complete draining.
CPU limit is the integer 4 — automaxprocs will set GOMAXPROCS=4.
The pool sizes itself based on GOMAXPROCS.
Health probes hit /healthz, which reports overload via 503.

Stage 10 — Prometheus alerts¶

groups:
- name: imgservice
  rules:
  - alert: ImgServicePoolSaturated
    expr: imgsvc_pool_queue_length > imgsvc_pool_size * 5
    for: 1m
    labels: {severity: warning}
    annotations:
      summary: "Pool queue too long on {{ $labels.instance }}"

  - alert: ImgServiceHighErrorRate
    expr: |
      sum(rate(imgsvc_requests_total{outcome="error"}[5m])) /
      sum(rate(imgsvc_requests_total[5m])) > 0.01
    for: 5m
    labels: {severity: critical}
    annotations:
      summary: "Error rate above 1% on {{ $labels.instance }}"

  - alert: ImgServiceP99HighLatency
    expr: |
      histogram_quantile(0.99,
        sum(rate(imgsvc_process_duration_seconds_bucket[5m])) by (le)
      ) > 1.0
    for: 5m
    labels: {severity: warning}
    annotations:
      summary: "p99 latency above 1s"

Three alerts covering saturation, errors, and latency. The cornerstone of production observability.

Stage 11 — Grafana dashboard sketch¶

Panels:

Request rate (per outcome): sum(rate(imgsvc_requests_total[1m])) by (outcome).
p50/p95/p99 latency: histogram quantiles.
Queue length: imgsvc_pool_queue_length.
Pool size: imgsvc_pool_size.
Queue saturation %: imgsvc_pool_queue_length / imgsvc_pool_size.
Error rate %: derived from outcome counter.

Hang the dashboard in the team's war room. Glance at it daily.

Stage 12 — Load test¶

# Linear ramp.
hey -c 10 -z 1m -m POST -D image.jpg "http://localhost:8080/resize?w=256&h=256"
hey -c 50 -z 1m -m POST -D image.jpg "http://localhost:8080/resize?w=256&h=256"
hey -c 100 -z 1m -m POST -D image.jpg "http://localhost:8080/resize?w=256&h=256"

# Burst.
hey -c 200 -n 5000 -m POST -D image.jpg "http://localhost:8080/resize?w=256&h=256"

# Sustained.
hey -c 50 -z 10m -m POST -D image.jpg "http://localhost:8080/resize?w=256&h=256"

Watch metrics during each. Tune pool size and max queue based on the data.

This is what a production tunny service looks like end-to-end. About 400 lines of Go, plus configuration. The pool is one piece of the puzzle, but with the right scaffolding, it works reliably.

Operating This Service Day to Day¶

A week in the life of imgservice:

Monday — deploy¶

Push the new version. Canary on one pod, observe metrics for 30 minutes, then roll the rest. Watch for:

p99 latency change.
Error rate change.
Memory or goroutine count drift.

Tuesday — alerts fire¶

ImgServicePoolSaturated fires. Check Grafana. Queue is at 80 for 5 minutes. CPU is at 100%. Decision: scale to 6 pods.

Make the change. Queue drops. p99 drops. Alert clears.

Postmortem: traffic up 50% from previous week. Update capacity plan.

Wednesday — quiet day¶

Routine. Glance at dashboards over coffee. Nothing burning.

Thursday — a worker panics¶

A new file format arrives that crashes the decoder. The pool's recover catches it. Outcome counter shows a spike in error. Logs show the panic stack.

File a bug. Add a unit test for the input. Roll out a fix.

Friday — shutdown test¶

Manually trigger a pod restart during peak load. Verify:

In-flight requests complete.
Queue drains.
No requests are abandoned mid-response.

Pass. Document the test.

Saturday — soak test catches an issue¶

A scheduled soak test on staging shows memory creeping up over 12 hours. Investigation finds a small leak in the encoder — bytes.Buffer growing without bound.

Fix: cap the buffer size, re-allocate if exceeded.

Sunday — quiet¶

The system works.

This rhythm is what good operations looks like. Most days are boring. The exceptions are caught early, investigated thoroughly, and prevented from recurring.

Advanced Operational Patterns¶

A few patterns that come up in mature production deployments.

Pattern: Two-level pools¶

The outer pool handles HTTP routing. The inner pool handles compute. The outer is sized for connection limits; the inner for CPU.

type Service struct {
    routing *tunny.Pool
    compute *tunny.Pool
}

Useful when routing has light pre-processing (auth, validation) that you do not want to mix with heavy compute.

Pattern: Warm-up¶

If your service has slow worker startup (model loading), do not accept traffic until warm:

func (s *Service) Ready() bool {
    return s.pool.Size() == s.targetSize
}

Readiness probe returns 503 until ready. Kubernetes does not route traffic to unready pods.

Pattern: Slow consumer detection¶

Track per-call latency. If a consumer is consistently slow:

Log it.
Consider tighter timeouts for that caller.
Consider rate limiting that caller.

Tunny does not give you per-caller stats; you must instrument them in your handler.

Pattern: Adaptive timeout¶

If most calls finish in 50 ms, set the timeout to 500 ms (10x). If load grows, callers wait longer, and the timeout still catches genuine stalls.

For automated adaptation, track the p99 over the last hour and set the timeout to 2x or 3x p99.

Pattern: Bulkheading¶

Separate pools for different priorities or tenants. A noisy tenant cannot affect others. We covered this earlier; restating because it is critical for SaaS.

Pattern: Hedged requests¶

For tail-latency sensitive workloads, send a backup request after the median latency:

result := make(chan Result, 2)
go func() { result <- pool.Process(x) }()
time.AfterFunc(median, func() {
    go func() { result <- pool.Process(x) }()
})
r := <-result

The first result wins. The second is wasted work. Useful when occasional slowness is the enemy.

Caveat: doubles the work. Use selectively.

Pattern: Circuit breaker around the pool¶

If the pool starts failing consistently, open a circuit breaker to shed traffic upstream. Tunny does not have this built in; use gobreaker or similar.

Pattern: Distributed pool via NATS / Kafka¶

Multiple service instances cooperate via a message queue. Each instance has its own local tunny pool. The queue provides cross-instance load balancing.

This is the "consume from queue" pattern at scale.

Debugging in Production¶

When things go wrong, your tools:

pprof¶

curl http://service/debug/pprof/profile?seconds=30 > cpu.pprof
go tool pprof cpu.pprof

Look at top --cum and web to see where CPU goes.

curl http://service/debug/pprof/heap > heap.pprof
go tool pprof heap.pprof

Look for big allocators.

curl http://service/debug/pprof/goroutine?debug=2 > goroutines.txt

Inspect stacks. Look for many goroutines in unusual places.

Logs¶

Structured logs with the request ID, the worker ID (if you assigned one), the outcome.

A typical log line:

{"ts": "...", "level": "info", "msg": "request done", "id": "req-123", "worker": 4, "duration_ms": 87, "outcome": "ok"}

Index in your log aggregator. Slice by worker, by outcome, by duration.

Tracing¶

If you use OpenTelemetry, propagate the trace context through the payload:

type Job struct {
    Carrier propagation.MapCarrier
    Data    []byte
}

Inside the worker:

ctx := otel.GetTextMapPropagator().Extract(context.Background(), job.Carrier)
ctx, span := tracer.Start(ctx, "worker.Process")
defer span.End()

Now traces span from inbound HTTP all the way to worker code.

`dlv` in production?¶

In emergencies, you can attach dlv to a running process. This is invasive — the process is paused. Use only when other tools have failed.

dlv attach <pid>

Inspect goroutines, variables, set breakpoints. Detach gracefully.

Production Patterns From the Field¶

A miscellany.

Field pattern: separating decode and encode¶

For image services, decoding is fast and encoding is slow (PNG especially). Two pools:

Decode pool: NumCPU.
Encode pool: NumCPU.

Total in-flight: 2 * NumCPU CPUs. But the work is more parallel than two pools of NumCPU stacked because decode and encode run on different cores.

This is the "split the pipeline" pattern. Use when stages have very different costs.

Field pattern: pre-resize hint¶

If you know the output is going to be small, pass that hint to the decoder. Some libraries (libjpeg-turbo) can decode at fractional resolution, skipping work.

This is a per-call optimisation. The pool doesn't change. Throughput goes up because each call is faster.

Field pattern: skip identical work¶

Cache results in front of the pool. If the same input comes back, return the cached result without invoking the pool.

key := hashInput(payload)
if v, ok := cache.Get(key); ok {
    return v
}
v := pool.Process(payload)
cache.Set(key, v)
return v

For idempotent compute, caching is free throughput.

Field pattern: priority via separate pools¶

Two pools: "fast lane" and "slow lane". Premium customers go to fast lane. Cost: capacity reserved for fast lane is unused when no premium traffic. Benefit: SLA isolation.

Field pattern: tenant-aware sizing¶

Sizing per tenant is rare. Sizing for the aggregate of tenants is common. The total pool size accommodates all tenants combined.

If one tenant has very different needs, give it its own pool.

Field pattern: queue-based admission¶

In a high-throughput service, admission decisions are made at the queue layer (Kafka topic, NATS subject) rather than in the pool. The pool only sees admitted work. Backpressure pushes back to producers.

Capacity Models¶

A few common capacity models for tunny services.

Model 1 — fixed capacity¶

Pool size is fixed. Replicas are fixed. Service handles whatever traffic arrives, up to capacity. Beyond capacity, it returns 503.

Pros: predictable. Cons: cannot handle bursts gracefully.

Use for: services with steady traffic and tight SLAs.

Model 2 — horizontally autoscaled¶

Pool size per pod is fixed. Pod count auto-scales based on CPU or queue length metric.

Pros: handles traffic spikes. Cons: slow to scale up (minutes).

Use for: most production services.

Model 3 — vertically scaled¶

Pod count is fixed. Pool size is fixed per pod, sized for peak. Idle capacity at off-peak.

Pros: simple. Cons: wasteful.

Use for: small services or constant workloads.

Model 4 — mixed¶

Pool size per pod is fixed. Pod count is small in normal times, expands during traffic bursts via HPA.

This is the standard Kubernetes pattern.

Common Production Issues and Their Resolutions¶

A field guide to common issues.

Issue: queue grows during a spike, never drains¶

Cause: traffic is sustained above capacity. Mitigation: scale up.

Long-term fix: increase pool size and/or replica count.

Issue: p99 latency rises over the day¶

Cause: usually GC pressure from accumulating heap objects. Investigate with heap pprof. Tune GOGC.

Issue: memory grows over time¶

Cause: a leak somewhere. Could be in workers, could be in caching, could be in HTTP handlers. Investigate with heap pprof.

Issue: file descriptors increase¶

Cause: handlers or workers not closing connections. Add defer Close() everywhere.

Issue: pool size mismatch with cores¶

Cause: runtime.NumCPU() returning host CPU count in container. Fix: use automaxprocs.

Issue: worker stuck in `Process`¶

Cause: a downstream call has no timeout. Add context.WithTimeout to all external calls.

Issue: pool panic on Close¶

Cause: callers still active when Close runs. Use WaitGroup or sequenced shutdown.

Issue: a single bad payload crashes the service¶

Cause: missing recover in Process. Add it.

Issue: graceful shutdown takes too long¶

Cause: Process calls that ignore cancellation. Implement cancellation properly in workers.

Issue: traffic spike causes OOM¶

Cause: per-call allocations not bounded. Limit body size, reuse buffers, set GOMEMLIMIT.

Postmortem Examples¶

Two redacted examples from real incidents.

Postmortem A — runaway queue¶

Summary: Pool queue grew to 5000 over 20 minutes, p99 latency hit 30 seconds, ~10% of requests timed out.

Root cause: A new request type was 10x slower than the average, due to a poorly chosen compression level. The pool was not sized for the new shape of traffic.

Detection: Saturation alert fired at queue=80. Engineer responded within 5 minutes.

Mitigation: Scaled pods 2x. Queue drained over 10 minutes.

Action items: - Add canary deployments for new request types. - Add per-request-type latency dashboards. - Add a heat map of latency by request type. - Document expected latency for each request type.

Postmortem B — silent failures¶

Summary: For 4 hours, ~2% of requests returned 200 OK with corrupted images. Discovered by customer support tickets.

Root cause: A new compression library had a bug where occasionally a 0-byte buffer was returned. The worker did not check for it.

Detection: No alerts fired. Internal validation was missing.

Mitigation: Reverted the new library.

Action items: - Add output validation to every Process call. - Add an end-to-end test that fetches and re-decodes the output. - Add a metric for output byte sizes. - Set up customer-impact tracking via support tickets.

The second postmortem is more painful. Silent failures take longer to detect.

Decision Records¶

For meaningful production decisions, write them down. Example:

ADR-001 — Pool size for image service¶

Status: Accepted.

Context: We need to size the tunny pool for imgservice. Inputs are JPEG/PNG, typical processing time is 75 ms.

Decision: Pool size = runtime.GOMAXPROCS(0) (which is set by automaxprocs from the container's CPU limit).

Rationale: - CPU-bound work scales linearly until cores are saturated. - More workers than cores causes context-switch overhead. - Container CPU limit gives a reliable count.

Consequences: - We must ensure automaxprocs is imported. - Container CPU limit changes require pod restart for tunny pool to resize (no hot reload).

Alternatives considered: - Fixed pool size 8: too rigid for varying container sizes. - Pool size 2x cores: causes context-switch overhead under load.

Lightweight ADRs like this make future engineers grateful.

On-Call Cheat Sheet¶

For the engineer who got paged:

Acknowledge the page.
Open the Grafana dashboard.
Look at queue length, latency p99, error rate.
If queue is growing: scale up.
If latency is up: check downstream services.
If errors are up: check logs for new error types.
If memory or goroutines are up: check pprof for the leak.
If nothing obvious, capture pprof and trace, then call a teammate.
After mitigation, write a postmortem.

Print this. Tape it to your monitor.

More Case Studies¶

A few more capsule case studies from production.

Case study — global CDN re-encoder¶

A CDN re-encodes images on-the-fly to optimal formats for each browser. 50 PoPs around the world, each with 10-20 pods, each pod with a tunny pool of 8.

Aggregate throughput: ~50,000 RPS sustained.

Key decisions: - Per-PoP pools (no cross-PoP coordination). - Health probes gate traffic to ready pools. - Aggressive readiness probes to catch slow starts.

Case study — video frame extraction¶

A service extracts representative frames from videos. Each call processes one video.

Pool size: 4 per pod. Each worker holds an ffmpeg invocation pipe.

Memory: 200 MB peak per worker. Pool size limited by memory, not CPU.

Case study — invoice PDF generation¶

A service renders invoice PDFs from templates.

Pool size: 2. Each worker holds a Chrome headless instance. PDF generation: 500 ms - 2 s.

Pool size limited by per-worker memory (Chrome is heavy).

Case study — text-to-speech synthesis¶

A service converts text to spoken audio using a local ML model.

Pool size: NumCPU / 2 (cache-aware sizing). Each worker holds a 200 MB model.

Sample rate: 22 kHz output, 200 ms latency per second of speech generated.

Case study — financial trade simulation¶

A service runs Monte Carlo simulations on financial portfolios.

Pool size: NumCPU. Each worker holds pre-computed lookup tables. Simulations run for 100 ms - 2 s.

ProcessTimed strict to bound tail latency.

Final Operational Wisdom¶

Three principles to carry forward.

1. Measure before you optimise¶

It is tempting to tune pool size, GC parameters, and timeouts based on intuition. Resist. Measure first. The bottleneck is rarely where you think it is.

2. Boring is good¶

A tunny service that runs for a year without surprises is more valuable than one that handles 10x more traffic but pages once a week. Default to boring choices.

3. Document so future-you can take over from past-you¶

In 18 months you will not remember why you set the pool size to 12. Write it down. Future-you will appreciate it.

Self-Assessment Checklist (Repeat)¶

Ten boxes. All ten. No exceptions.

Final Words¶

The professional level is not about knowing more code — it is about operating tunny well over time, across deployments, with a team, and with on-call humans depending on the service.

You have the technical knowledge from the senior level. You have the operational patterns from this level. Now you have everything needed to run tunny in production with confidence.

The remaining files in this series (specification, interview, tasks, find-bug, optimize) are for sharpening specific skills. Pick what you need.

Build well. Operate well. Sleep well.

Extended Section — Detailed Observability Recipes¶

This section expands what was covered above with concrete code, suitable for copy-paste into a real service.

Recipe — Prometheus integration with `promauto`¶

package metrics

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
)

var (
    poolSize = promauto.NewGaugeVec(prometheus.GaugeOpts{
        Name: "tunny_pool_size",
        Help: "Configured pool size by pool name",
    }, []string{"pool"})

    poolQueue = promauto.NewGaugeVec(prometheus.GaugeOpts{
        Name: "tunny_pool_queue_length",
        Help: "Current queue length by pool name",
    }, []string{"pool"})

    processDuration = promauto.NewHistogramVec(prometheus.HistogramOpts{
        Name: "tunny_process_duration_seconds",
        Help: "Time taken inside Process by pool and outcome",
        Buckets: []float64{
            0.001, 0.005, 0.01, 0.025, 0.05,
            0.1, 0.25, 0.5, 1.0, 2.5,
            5.0, 10.0, 25.0, 60.0,
        },
    }, []string{"pool", "outcome"})

    blockDuration = promauto.NewHistogramVec(prometheus.HistogramOpts{
        Name: "tunny_blockuntilready_duration_seconds",
        Help: "Time spent in BlockUntilReady by pool",
        Buckets: prometheus.ExponentialBuckets(0.0001, 2, 16),
    }, []string{"pool"})

    interrupts = promauto.NewCounterVec(prometheus.CounterOpts{
        Name: "tunny_interrupts_total",
        Help: "Number of Interrupt calls by pool",
    }, []string{"pool"})

    panics = promauto.NewCounterVec(prometheus.CounterOpts{
        Name: "tunny_panics_total",
        Help: "Number of panics recovered in Process by pool",
    }, []string{"pool"})
)

Six metrics. Cover lifecycle and behaviour.

Recipe — Wrapping a Worker for metrics¶

type observed struct {
    inner tunny.Worker
    name  string
}

func newObserved(name string, inner tunny.Worker) *observed {
    return &observed{inner: inner, name: name}
}

func (w *observed) Process(p any) (out any) {
    defer func() {
        if r := recover(); r != nil {
            panics.WithLabelValues(w.name).Inc()
            out = fmt.Errorf("panic: %v", r)
        }
    }()
    return w.inner.Process(p)
}

func (w *observed) BlockUntilReady() {
    start := time.Now()
    w.inner.BlockUntilReady()
    blockDuration.WithLabelValues(w.name).Observe(time.Since(start).Seconds())
}

func (w *observed) Interrupt() {
    interrupts.WithLabelValues(w.name).Inc()
    w.inner.Interrupt()
}

func (w *observed) Terminate() {
    w.inner.Terminate()
}

The wrapper instruments without modifying the inner worker. Use it like:

pool := tunny.New(n, func() tunny.Worker {
    return newObserved("imgsvc", &realWorker{})
})

Recipe — Background queue gauge¶

func StartQueueMonitor(name string, p *tunny.Pool, interval time.Duration) func() {
    stop := make(chan struct{})
    go func() {
        t := time.NewTicker(interval)
        defer t.Stop()
        for {
            select {
            case <-stop:
                return
            case <-t.C:
                poolQueue.WithLabelValues(name).Set(float64(p.QueueLength()))
                poolSize.WithLabelValues(name).Set(float64(p.GetSize()))
            }
        }
    }()
    return func() { close(stop) }
}

Call once at startup:

stopMon := metrics.StartQueueMonitor("imgsvc", pool, 5*time.Second)
defer stopMon()

Recipe — Tracing with OpenTelemetry¶

import "go.opentelemetry.io/otel"

type traced struct {
    inner tunny.Worker
    name  string
}

func (w *traced) Process(p any) any {
    job := p.(struct {
        Ctx     context.Context
        Payload any
    })
    _, span := otel.Tracer("tunny").Start(job.Ctx, "worker.Process")
    defer span.End()
    return w.inner.Process(job.Payload)
}

func (w *traced) BlockUntilReady() { w.inner.BlockUntilReady() }
func (w *traced) Interrupt()       { w.inner.Interrupt() }
func (w *traced) Terminate()       { w.inner.Terminate() }

The payload must carry the context. This is awkward because the Worker interface does not give you the context directly. Common workaround.

Recipe — Logging with `slog`¶

type logged struct {
    inner tunny.Worker
    log   *slog.Logger
}

func (w *logged) Process(p any) (out any) {
    start := time.Now()
    w.log.Info("process start")
    defer func() {
        if r := recover(); r != nil {
            w.log.Error("process panic", "panic", r, "duration", time.Since(start))
            out = fmt.Errorf("panic: %v", r)
            return
        }
        w.log.Info("process done", "duration", time.Since(start))
    }()
    return w.inner.Process(p)
}

func (w *logged) BlockUntilReady() { w.inner.BlockUntilReady() }
func (w *logged) Interrupt()       { w.inner.Interrupt() }
func (w *logged) Terminate()       { w.inner.Terminate() }

Structured logs let you slice by worker name, duration, panic.

Recipe — combining wrappers via composition¶

func wrap(inner tunny.Worker, name string, log *slog.Logger) tunny.Worker {
    return &observed{
        name: name,
        inner: &logged{
            log:   log,
            inner: inner,
        },
    }
}

Or use the middleware pattern shown in middle.md.

Extended Section — Pool Sizing Calculators¶

A small set of calculators to put in your head.

Calculator 1 — CPU-bound throughput¶

Required workers = ceil(target_RPS * avg_call_time_seconds)
But not more than NumCPU.

If RPS * call_time > NumCPU, you cannot keep up on one pod — you need horizontal scaling.

Calculator 2 — memory-bound size¶

Pool size = floor(available_memory / per_worker_memory)

If your worker peaks at 500 MB and your pod has 8 GB, you can have at most 16 workers. Even if you have 32 cores.

Calculator 3 — downstream-limited size¶

Pool size = downstream_concurrency_limit

If the downstream allows 4 parallel calls, your pool size is at most 4. More workers will just sit in BlockUntilReady.

Calculator 4 — latency-bound queue depth¶

Max queue = floor(target_latency / avg_call_time_seconds * pool_size)

If you want p99 latency under 1 s and average call is 100 ms with pool size 8: max_queue = 1 / 0.1 * 8 = 80.

This is the queue depth above which p99 latency exceeds the SLO. Reject at this depth.

Calculator 5 — replicas for redundancy¶

Replicas = max(2, ceil(total_RPS / per_pod_RPS))

Minimum 2 for redundancy. Otherwise based on per-pod capacity.

These five calculations cover most sizing decisions. Run them on the back of an envelope before any production deploy.

Extended Section — Real Failure Modes¶

A more detailed look at how tunny services fail in practice.

Failure Mode 1 — slow worker¶

One worker is consistently slower than others (perhaps it landed on a noisy node). All other workers serve quickly. The slow worker is constantly busy. Every Nth call waits behind it.

Symptoms: bimodal latency distribution. P50 normal, P95 elevated, P99 very high.

Diagnosis: per-worker latency metrics. Tunny does not give you these natively — instrument them yourself.

Mitigation: kill the pod, let Kubernetes reschedule. Often that is enough.

Failure Mode 2 — leaking sub-goroutine¶

Your Process spawns a sub-goroutine that does not exit when Process returns. Over many calls, sub-goroutines accumulate.

Symptoms: goroutine count grows monotonically.

Diagnosis: pprof/goroutine?debug=2. Look for stacks that should not be there.

Mitigation: fix the leak. Until fixed, restart pods on a timer to bound memory.

Failure Mode 3 — accidental `Close` race¶

A shutdown handler calls Close while requests are still in flight. Some requests panic.

Symptoms: panic stacks in logs around shutdown time.

Diagnosis: review the shutdown sequence. The pool should be the last thing closed.

Mitigation: fix the ordering.

Failure Mode 4 — payload that crashes the decoder¶

A malicious or malformed input crashes a third-party C library used by the worker. The whole process crashes.

Symptoms: process restarts. No useful logs.

Diagnosis: review what changed. Try to reproduce with a saved input.

Mitigation: wrap the C call in additional sanitisation. Limit upload sizes. Consider isolating untrusted inputs to a separate process.

Failure Mode 5 — memory bloat from a slow leak¶

bytes.Buffer on a worker grows to accommodate the largest payload ever seen. If that is occasionally huge, the buffer stays huge.

Symptoms: per-worker memory grows over time.

Diagnosis: heap profile shows large allocations in bytes.Buffer.Bytes.

Mitigation: cap the buffer size; re-create if exceeded.

Failure Mode 6 — improper context cancellation¶

Your worker's Process does not honour ctx.Done(). Timed-out callers' work continues to consume CPU.

Symptoms: high CPU utilisation, but throughput does not match.

Diagnosis: read the worker code. Check that every potentially-blocking call respects the context.

Mitigation: plumb context through every layer.

Failure Mode 7 — N+1 pool calls¶

You accidentally call pool.Process once per item in a loop, when you could call once on the whole batch.

Symptoms: per-call latency low but total throughput is bounded by RPS.

Diagnosis: code review.

Mitigation: refactor to batch. We cover batching in optimize.md.

Failure Mode 8 — wrong pool size in container¶

Pool was sized to runtime.NumCPU() which returned 64 (host) instead of 4 (container quota).

Symptoms: high context-switch rate, low throughput.

Diagnosis: check GOMAXPROCS at runtime. If it doesn't match container quota, automaxprocs is missing or broken.

Mitigation: add _ "go.uber.org/automaxprocs" to imports.

Failure Mode 9 — leaked workerWrapper goroutines¶

tunny.Close is never called. Each pool's workers persist until process exit.

Symptoms: many idle goroutines in pprof.

Diagnosis: stacks in workerWrapper.run.

Mitigation: ensure defer pool.Close() is in the lifetime owner. If you have many short-lived pools, redesign — pools should be long-lived.

Failure Mode 10 — deadlock on recursive Process¶

Worker A calls pool.Process(x). All other workers are also calling pool.Process. Nobody can serve. Deadlock.

Symptoms: pool stops making progress; queue grows forever.

Diagnosis: goroutine stacks show all workers blocked on receiving from the pool's channel.

Mitigation: do not call the same pool from inside its own worker. Use a separate pool if you need composition.

Extended Section — Long-Running Service Health¶

Some habits for services that run for months.

Habit 1 — restart pods proactively¶

Even with no leaks, occasional pod restarts (every 24 hours) reset accumulated state. Goroutines exit, buffers reset, memory pressure drops.

Kubernetes can do this for you with terminationGracePeriodSeconds and restartPolicy. Or use a cron-like job that rolls one pod per hour.

Habit 2 — alert on derivatives¶

Static thresholds catch obvious overloads. Derivative alerts catch slow drifts. E.g. "memory growth > 10 MB/hour for 6 hours" catches leaks.

Habit 3 — synthetic probes¶

A synthetic test client sends a small set of requests every minute and asserts responses. Catches subtle regressions (e.g. response format change) that real users would notice slowly.

Habit 4 — chaos testing¶

Periodically inject failures: kill a pod, slow down a downstream, send malformed inputs. Verify the service degrades gracefully. Tools: chaos-mesh, litmus.

Habit 5 — capacity planning quarterly¶

Once a quarter, look at usage trends and forecast 6-12 months. Plan capacity changes ahead of time, not reactively.

These habits separate "running a service" from "operating a service well."

Extended Section — Specific Workload Recipes¶

A grab-bag of recipes for different workloads.

Recipe — TLS handshake worker¶

For services that do CPU-heavy TLS handshakes (e.g. EC key generation):

type tlsWorker struct {
    priv *ecdsa.PrivateKey
}

func newTLSWorker() tunny.Worker {
    priv, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
    if err != nil {
        panic(err)
    }
    return &tlsWorker{priv: priv}
}

func (w *tlsWorker) Process(p any) any {
    msg := p.([]byte)
    h := sha256.Sum256(msg)
    r, s, err := ecdsa.Sign(rand.Reader, w.priv, h[:])
    if err != nil {
        return err
    }
    return [2]*big.Int{r, s}
}

func (w *tlsWorker) BlockUntilReady() {}
func (w *tlsWorker) Interrupt()       {}
func (w *tlsWorker) Terminate() {
    if w.priv != nil {
        // zero the key (cryptographic hygiene)
        // (Go has no portable secure-zero; this is a best effort)
    }
}

Recipe — gzip compression worker¶

type gzipWorker struct {
    z *gzip.Writer
    b bytes.Buffer
}

func newGzipWorker() tunny.Worker {
    w := &gzipWorker{}
    w.z = gzip.NewWriter(&w.b)
    return w
}

func (w *gzipWorker) Process(p any) any {
    data := p.([]byte)
    w.b.Reset()
    w.z.Reset(&w.b)
    if _, err := w.z.Write(data); err != nil {
        return err
    }
    if err := w.z.Close(); err != nil {
        return err
    }
    out := make([]byte, w.b.Len())
    copy(out, w.b.Bytes())
    return out
}

func (w *gzipWorker) BlockUntilReady() {}
func (w *gzipWorker) Interrupt()       {}
func (w *gzipWorker) Terminate()       {}

Reuses both the buffer and the gzip.Writer. Significant allocation savings.

Recipe — JSON marshalling worker¶

Sometimes you want bounded concurrency around marshalling huge JSON documents.

type jsonWorker struct {
    enc *json.Encoder
    buf bytes.Buffer
}

func newJSONWorker() tunny.Worker {
    w := &jsonWorker{}
    w.enc = json.NewEncoder(&w.buf)
    w.enc.SetIndent("", "  ")
    return w
}

func (w *jsonWorker) Process(p any) any {
    w.buf.Reset()
    if err := w.enc.Encode(p); err != nil {
        return err
    }
    out := make([]byte, w.buf.Len())
    copy(out, w.buf.Bytes())
    return out
}

func (w *jsonWorker) BlockUntilReady() {}
func (w *jsonWorker) Interrupt()       {}
func (w *jsonWorker) Terminate()       {}

Recipe — XML parsing worker¶

type xmlWorker struct {
    dec *xml.Decoder
}

func (w *xmlWorker) Process(p any) any {
    data := p.([]byte)
    w.dec = xml.NewDecoder(bytes.NewReader(data))
    var result MyType
    if err := w.dec.Decode(&result); err != nil {
        return err
    }
    return result
}

// ... other methods empty

xml.Decoder does not have a Reset, so we recreate it. Reuse is limited here.

Recipe — PostgreSQL query worker¶

type pgWorker struct {
    conn *pgx.Conn
}

func newPgWorker(connStr string) tunny.Worker {
    conn, err := pgx.Connect(context.Background(), connStr)
    if err != nil {
        panic(err)
    }
    return &pgWorker{conn: conn}
}

func (w *pgWorker) Process(p any) any {
    q := p.(QueryReq)
    rows, err := w.conn.Query(context.Background(), q.SQL, q.Args...)
    if err != nil {
        return err
    }
    defer rows.Close()
    // ... materialise rows
    return result
}

func (w *pgWorker) BlockUntilReady() {}
func (w *pgWorker) Interrupt()       {}
func (w *pgWorker) Terminate() {
    if w.conn != nil {
        _ = w.conn.Close(context.Background())
    }
}

Each worker holds a dedicated DB connection. The pool size is also your connection count.

Recipe — Redis client worker¶

type redisWorker struct {
    client *redis.Client
}

func newRedisWorker() tunny.Worker {
    return &redisWorker{client: redis.NewClient(&redis.Options{Addr: "localhost:6379"})}
}

func (w *redisWorker) Process(p any) any {
    cmd := p.(string)
    return w.client.Get(context.Background(), cmd).Val()
}

func (w *redisWorker) BlockUntilReady() {}
func (w *redisWorker) Interrupt()       {}
func (w *redisWorker) Terminate()       { _ = w.client.Close() }

Note: redis.Client is already a connection pool. Wrapping it in tunny is usually overkill — the redis client handles concurrency itself. Only do this if you want to limit total redis traffic on top of the client's defaults.

Recipe — gRPC client worker¶

type grpcWorker struct {
    conn *grpc.ClientConn
    cli  pb.MyServiceClient
}

func newGRPCWorker(target string) tunny.Worker {
    conn, err := grpc.NewClient(target, grpc.WithTransportCredentials(insecure.NewCredentials()))
    if err != nil {
        panic(err)
    }
    return &grpcWorker{conn: conn, cli: pb.NewMyServiceClient(conn)}
}

func (w *grpcWorker) Process(p any) any {
    req := p.(*pb.Request)
    resp, err := w.cli.Call(context.Background(), req)
    if err != nil {
        return err
    }
    return resp
}

func (w *grpcWorker) BlockUntilReady() {}
func (w *grpcWorker) Interrupt()       {}
func (w *grpcWorker) Terminate()       { _ = w.conn.Close() }

Each worker has its own gRPC client. The client itself manages connections — usually you want to share, not per-worker. Decision depends on the gRPC client's pooling story.

Extended Section — Real Deployment Snippets¶

A small library of Kubernetes / cloud snippets.

Snippet — HPA on custom metrics¶

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: imgsvc-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: imgsvc
  minReplicas: 4
  maxReplicas: 20
  metrics:
  - type: Pods
    pods:
      metric:
        name: tunny_pool_queue_length
      target:
        type: AverageValue
        averageValue: "10"

Scale on average queue length. When pools average >10 queued items, add pods.

Snippet — preStop hook¶

lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "sleep 10"]

The 10-second sleep gives Kubernetes time to remove the pod from service endpoints before SIGTERM. Reduces traffic during shutdown.

Snippet — readiness gate based on pool¶

If your service has slow startup, the readiness probe should not return OK until the pool is fully ready.

func (s *Server) handleReady(w http.ResponseWriter, r *http.Request) {
    if !s.poolReady.Load() {
        http.Error(w, "not ready", 503)
        return
    }
    w.Write([]byte("ready"))
}

Set poolReady to true after tunny.New returns and all workers have been constructed.

Snippet — pod disruption budget¶

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: imgsvc-pdb
spec:
  minAvailable: 3
  selector:
    matchLabels: {app: imgsvc}

Ensures at least 3 pods are always available, even during voluntary disruptions (node drains, cluster upgrades).

Extended Section — More Case Studies¶

A few more case studies in detail.

Case Study — Background processor for content moderation¶

A service processes uploaded videos for moderation. Per-video processing: 30 s - 5 min depending on length.

Setup:

Pods: 20.
Pool size: 2 per pod (memory limited).
Each worker has its own ffmpeg instance + ML model.
Inputs come from a Kafka topic.

Processing model: each pod consumes 2 messages at a time from Kafka, processes them through tunny, acks on success.

Pool size 2 because each ffmpeg + model is 2 GB RSS. 4 GB per pod budget.

Lessons:

Memory was the binding constraint, not CPU.
Long Process times needed long context timeouts (5 minutes).
Kafka rebalancing during shutdown required careful handling.

Case Study — Real-time bidding (RTB) decision engine¶

An ad service receives bid requests at high rate (50k RPS). Each request needs a CPU-bound decision (~1 ms).

Setup:

Many small pods (100+).
Pool size: 16 per pod.
p99 latency target: 100 ms.

Why a pool at all for 1 ms calls? Because of the tail. Without bounded concurrency, occasional GC pauses can stall many calls. Tunny ensures only 16 in flight at any time, so even if 16 stall, others wait at the channel.

Lessons:

Pool overhead is non-zero. At 1 ms calls, tunny adds ~5% latency.
Worth it for the tail control.
Heavy use of sync.Pool for per-call buffers.

Case Study — Reporting service¶

A service generates weekly PDF reports for ~10,000 customers. Each report: 30 s to render. Triggered once per week per customer.

Setup:

1 pod.
Pool size: 4.
Schedule: weekly cron, dispatches all customers at start.

Why 1 pod? Because 10,000 reports * 30 s / 4 workers = 21 hours. We have a week, plenty of time.

Lessons:

Sometimes you do not need horizontal scaling. Capacity math first.
Single-pod services have less ops overhead.
The weekly cron pattern is well-served by tunny: bounded concurrency over a fixed batch.

Case Study — Streaming transform service¶

A service transforms a Kinesis stream of events (~10k events/sec). Each event needs schema validation and transformation (~5 ms).

Setup:

8 pods.
Pool size: 16 per pod.
16 KCL consumers per pod.

Each consumer reads one event, submits to pool, awaits result, writes to output stream.

Pool size 16 because validation is mostly CPU but has some IO (schema lookup, occasionally).

Lessons:

Streaming workloads suit tunny well — fixed work, fixed rate, fixed parallelism.
The KCL consumer count needs to be at least the pool size (otherwise pool slots are wasted).

Case Study — Mixed-priority compute service¶

A service serves both low-priority background jobs and high-priority interactive requests.

Setup:

Two pools: interactive (size 4) and background (size 8). 12 cores total on each pod.
Interactive requests get the interactive pool. Background, the other.
Interactive has tighter timeouts.

Lessons:

Splitting pools by priority gives SLA isolation.
Sizing 4 + 8 on 12 cores wastes nothing because both pools are usually busy.

Extended Section — Postmortem Lessons¶

A few generalisable lessons from postmortems.

Lesson — alerting on queue length is the single most important alert¶

In almost every tunny outage, queue length was the first metric to deviate. Alert on it before anything else.

Lesson — graceful shutdown is not optional¶

Services that did not implement graceful shutdown left transactions orphaned, files half-written, or callers holding stale state. Implement and test it.

Lesson — worker panics will happen¶

Even thoroughly tested workers panic in production. The set of inputs is wider than your tests. Always recover and log.

Lesson — pool size needs revisiting¶

After deployments that change Process cost, the optimal pool size changes. Make pool size easy to change without code changes — config it.

Lesson — observability is not optional¶

Services without good observability lose hours during incidents. Spend the time upfront.

Lesson — chaos testing finds the bugs you cannot¶

Synthetic load tests catch capacity issues. Chaos tests catch correctness issues — what happens when a downstream returns 500? What happens when a pod is killed mid-work? Test these.

Lesson — error budgets help prioritise¶

Define an SLO (e.g. 99.9% availability). Calculate your error budget. Spend it on velocity (new features) when there is slack, on stability when there is not.

These are not tunny-specific. They apply to any production service. Tunny just inherits them.

Extended Section — Building a Team Culture Around Pools¶

Operating tunny well at scale is partly cultural. A few practices that help:

Practice — pool sizing requires sign-off¶

Pool sizes are decisions. Document them. Require sign-off for changes (in a PR review, in an RFC, in an ADR — whatever your team uses).

Practice — load tests in CI¶

Every PR runs a small load test. If latency or throughput regresses meaningfully, the PR fails.

Practice — postmortem reviews monthly¶

Once a month, review all postmortems. Identify themes. Fund work to address them.

Practice — on-call rotation includes everyone¶

The engineer who wrote the code is on-call for it. This aligns incentives.

Practice — production code reviews focus on operability¶

Code review checklist includes:

Are panics recovered?
Is the context propagated?
Is there a metric for this code path?
Will this code shut down gracefully?

If reviewers consistently look at these, the codebase converges on production-ready.

Practice — runbooks¶

For every alert, a runbook. The runbook tells the on-call engineer what to do:

Alert: ImgServicePoolSaturated
What it means: queue length is sustained above 5x pool size.
What to do first: check Grafana for traffic volume.
If traffic is unusually high: scale up replicas to 2x.
If traffic is normal: check call duration metric for slowdowns.
If a downstream is slow: page the downstream team.
If no obvious cause: escalate.

A few-paragraph runbook saves minutes during an incident.

Extended Section — When Tunny Isn't Enough¶

Sometimes you grow beyond tunny. Signs:

You need priority scheduling within one pool.
You need dynamic pool resizing at fine granularity.
You need work-stealing across pools.
You need very low overhead (sub-microsecond Process).
You need fairness across many tenants.

For (1), build a small layer in front of tunny: separate pools per priority, route at submission.

For (2), SetSize works but is clunky. Consider ants or a custom solution.

For (3), neither tunny nor ants help. Build your own pool with explicit task migration.

For (4), drop the library entirely. Use a hand-rolled goroutine + channel pattern.

For (5), build a weighted limiter on top, or use per-tenant pools.

Tunny excels in the middle: "bounded CPU-bound work with stateful workers." Beyond that, supplement or replace.

Extended Section — Migration Story¶

Suppose you inherit a Go service that uses raw goroutines and a semaphore for bounded concurrency. You want to migrate to tunny. Here is a phased plan.

Phase 1 — assessment¶

Read the code. Identify:

Where unbounded goroutines are spawned.
Where semaphores are used.
What state lives across goroutines.

Document the current architecture.

Phase 2 — introduce tunny adjacent¶

Add tunny as a dependency. Build a new pool for one specific workload. Run it in shadow mode (new code path receives a fraction of traffic, results compared to old).

Phase 3 — swap the workload¶

Once shadow is verified, route all traffic to the new pool. Decommission the old code path.

Phase 4 — generalise¶

Identify other workloads with similar shape. Migrate each one to tunny.

Phase 5 — observability and ops¶

Add metrics, alerts, runbooks. Verify shutdown is graceful.

Migration takes weeks to months for a non-trivial service. Plan accordingly.

Final Operational Summary¶

You have read 3000+ lines about operating tunny in production. The essentials:

Sizing is the most important decision. Use the calculators.
Observability is mandatory. Three metric families: size, queue, duration.
Shutdown is a three-phase dance. Stop accept, drain, close.
Panics happen. Always recover.
Context propagates. Always pass it.
Load tests reveal what reading does not. Run them.
Postmortems make the next outage smaller. Write them.
Boring is good. Choose stability.
Document the why. Future-you will thank you.
Sleep well. Good ops means restful on-call shifts.

You are now operational on tunny at the highest level. The remaining files in this series are sharpening exercises. Use them as drills.

End of professional file.

Appendix — A Year of Operating Tunny: Calendar Walkthrough¶

To round out the operational view, here is a hypothetical 12-month timeline of running a tunny-based image service. It is loosely based on aggregate experiences and shows how the operational concerns evolve.

Month 1 — Launch¶

Service deployed to production. 4 pods, pool size 8 each. Traffic: 50 RPS sustained. Metrics dashboard live. Alerts configured.

Activity: monitor closely, ready to roll back.

Observations: p99 hovers around 200 ms. Queue length stays near 0. CPU at 40%.

Month 2 — Stabilisation¶

Traffic grows to 100 RPS. No issues. Routine.

Activity: refine dashboards based on what we now know matters. Add a panel for per-format latency.

Observations: traffic is bimodal — JPEG and WebP. WebP is 30% slower. Decision: monitor; not yet enough impact to split pools.

Month 3 — First alert¶

ImgServicePoolSaturated fires on a Friday afternoon. Investigation shows a spike in WebP traffic from a single customer.

Activity: scale to 6 pods. Schedule postmortem.

Postmortem actions: - Add per-customer dashboards. - Investigate whether to rate-limit per customer. - Re-evaluate pool sizing.

Month 4 — Per-customer dashboards¶

New dashboards show that 5% of customers generate 60% of traffic. We do not have multi-tenant isolation yet.

Activity: consider tenant-aware design. Defer until traffic pattern stabilises.

Month 5 — Scheduled load test¶

Quarterly load test. We push the system to 500 RPS in staging. p99 latency degrades sharply past 350 RPS. Investigation shows queue saturation.

Activity: increase pool size to 12 in staging. Re-test. Better, but still issues. Increase pods to 8 in staging. Now clean.

Decision: in production, scale to 6 pods (we are running at 100 RPS, headroom is fine for now).

Month 6 — Routine¶

No incidents. Routine maintenance.

Activity: upgrade Go from 1.21 to 1.22. Pool behaviour unchanged.

Month 7 — Memory growth¶

Slow memory growth observed. ~50 MB/day per pod.

Activity: heap profiling. Find a buffer that grows unboundedly with extreme inputs. Fix: cap the buffer.

Postmortem actions: - Add memory growth alerts. - Schedule pod restarts on a 7-day cycle as belt-and-suspenders.

Month 8 — Big traffic increase¶

Marketing campaign drives traffic to 400 RPS sustained. We already validated this in load tests. Pre-scaled pods to 12.

Activity: monitor. p99 at 300 ms; acceptable. Queue length spikes briefly during peaks.

Month 9 — Customer-driven feature¶

A customer requests a new format (AVIF). Adding a code path.

Activity: implement, benchmark. AVIF is 5x slower per call than JPEG.

Decision: separate pool for AVIF, sized smaller (4 per pod). Old pool handles JPEG/PNG/WebP.

Month 10 — Validating the new architecture¶

After deploy, AVIF traffic is well-bounded. Old pool's behaviour is unchanged.

Observations: per-format dashboards show clean separation. p99 is per-format.

Month 11 — Outage¶

A bad deploy introduces a panic on a specific input. Pool's recover catches it; outcome counter shows a spike in error. We notice via the alert "error rate above 1%".

Activity: roll back. Investigate.

Postmortem actions: - Add input fuzz testing in CI. - Add a canary-with-rollback step in deploys.

Month 12 — Year-end review¶

Yearly review of the service. Outcomes:

Two minor incidents, both resolved within an hour.
Zero major outages.
p99 latency held under 500 ms throughout.
Capacity grew with traffic; no major surprises.

Plans for next year:

Multi-tenant isolation for B2B customers.
Investigate hardware acceleration (GPU) for AVIF.
Migrate to a newer image library that supports batch processing.

This kind of slow, steady improvement is what good operations looks like over a year. Each month, you tweak. Each quarter, you re-evaluate. Each year, you reflect.

Appendix — Operator Handover¶

When you hand the service off to another engineer or team, what should they have?

Document 1 — Architecture overview¶

A one-page diagram showing the pool, callers, downstreams, dependencies.

Document 2 — Configuration reference¶

The config.yaml with every field documented. What it does, what reasonable values are, the operational impact.

Document 3 — Sizing rationale¶

Why the pool size is what it is. The math. The trade-offs. The history of changes.

Document 4 — Runbook¶

For every alert, what to do. Step-by-step.

Document 5 — Recent postmortems¶

The last 6-12 months of incidents. Pattern analysis.

Document 6 — Open issues¶

Known bugs, accepted risks, planned improvements.

Document 7 — Performance baseline¶

Latest results from your load test suite. So future-you knows what "normal" looks like.

Five-to-seven documents. Each takes an hour to write. Together they make handover possible.

Appendix — Cost Optimization¶

For services with significant compute cost, tunny gives you levers to reduce cost.

Lever 1 — pool size to match work¶

Oversized pools waste CPU. Right-sized pools maximize utilization per pod.

Lever 2 — replica count¶

Fewer, larger pods are usually cheaper than many small pods (overhead amortises).

Lever 3 — spot instances¶

For fault-tolerant workloads, spot instances are 50-90% cheaper. Tunny pools survive pod terminations naturally as long as you have multiple pods.

Lever 4 — autoscaling¶

Scale down during off-hours. Save 50%+ on quiet days.

Lever 5 — caching¶

Cache results in front of the pool. Cached hits use no pool capacity.

Lever 6 — request batching¶

Batch many small calls into fewer large calls. Pool overhead amortises.

Lever 7 — efficient encoding¶

If your work is JPEG encoding, libjpeg-turbo is 3x faster than the standard library. Faster work = fewer pods = less cost.

These levers compound. A service running 24/7 at 16 pods could potentially run at 4 pods with the right optimizations. Quarterly cost reviews surface these opportunities.

Appendix — Compliance and Audit Considerations¶

For regulated industries, additional concerns.

Audit logging¶

Every request that touches the pool should be loggable. Include request ID, user ID, payload size, outcome, duration.

log.Info("request",
    "request_id", reqID,
    "user_id", userID,
    "payload_bytes", len(data),
    "outcome", outcome,
    "duration_ms", duration.Milliseconds())

Logs forwarded to a tamper-resistant store.

PII handling¶

Do not log payload contents that contain PII. Log hashes or sizes instead.

Encryption in transit and at rest¶

Inputs/outputs in memory are unencrypted. If processing sensitive data, ensure:

Inputs are received over TLS.
Outputs are stored encrypted.
Memory dumps (if captured for debugging) are handled per policy.

Right to be forgotten¶

If your service processes user data, deletion requests must clear caches, logs, etc. Tunny does not store anything persistently, but your service around it might.

Time-bound retention¶

Set log retention policies. Set metric retention policies. Document them.

These are not tunny concerns directly, but tunny-based services in regulated industries need to think about them.

Appendix — Geographic Distribution¶

For multi-region deployments:

Pattern: per-region pools, no cross-region calls¶

Each region runs the service independently. Traffic is routed by DNS or load balancer. No cross-region communication.

Pros: simple. Pros: low latency. Cons: capacity not shared across regions.

Pattern: pool-of-pools across regions¶

A coordinator routes requests to the least-loaded region's pool.

Pros: shared capacity. Cons: high latency for cross-region routing, complex.

Most teams use the first pattern. The second is reserved for very specific use cases.

Appendix — Disaster Recovery¶

What happens if your region goes down?

If you use the per-region pattern, the DNS/load balancer routes to surviving regions. Capacity in surviving regions must absorb the failover traffic.

Capacity rule: each region should be sized for 100% of total traffic. Then losing one region is no worse than peak traffic in the others.

Tunny does not change this calculus. It is a service-level concern.

Appendix — Cost-Performance Trade-Offs by Workload¶

A summary of typical cost-performance trade-offs for common tunny workloads.

Workload	Pool size	Memory/worker	Cost driver	Optimization
Image resize	NumCPU	50 MB	CPU	Faster decoder
PDF render	Small	500 MB	Memory	Smaller pool
Crypto signing	NumCPU	Negligible	CPU	HW acceleration
ML inference	NumCPU/2	200 MB-1 GB	Cache + memory	Quantized model
JSON validation	2x NumCPU	10 MB	CPU + occ. IO	Compiled schemas
Video processing	Small	2 GB	Memory + CPU	Hardware encode

Use this as a starting point. Profile your specific workload.

Appendix — Cookbook: Production Patterns¶

A summary cookbook of production patterns:

Pattern 1 — pool per workload type¶

type Service struct {
    resize, validate, render *tunny.Pool
}

Independent sizes, independent metrics, independent SLOs.

Pattern 2 — config-driven sizing¶

Pool size from config. Never hardcoded.

Pattern 3 — health probe via pool state¶

func (s *Service) Healthy() bool {
    return int(s.pool.QueueLength()) < s.pool.GetSize()*10
}

Pattern 4 — admission control via queue length¶

if int(s.pool.QueueLength()) > s.maxQueue {
    return ErrBusy
}

Pattern 5 — graceful shutdown coordination¶

Three phases: stop accepting, drain, close pool.

Pattern 6 — typed wrapper¶

Hide interface{} behind a typed adapter.

Pattern 7 — panic recovery¶

Always recover in Process.

Pattern 8 — context propagation¶

Pass request context to ProcessCtx.

Pattern 9 — per-worker state reuse¶

Buffers, codecs, connections on the worker struct.

Pattern 10 — observability everywhere¶

Three metric families: pool size, queue length, call duration.

Memorise these. They are the production-grade defaults.

Appendix — Common Bugs by Frequency¶

Across many tunny deployments, here are the bugs I have seen most often, in rough order of frequency:

Missing defer pool.Close(). Easy to fix; easy to miss.
Process after Close. Symptom: panics around shutdown.
Not using r.Context() in HTTP handlers. Client cancellations propagate to backend.
No panic recovery in Process. First bad input crashes the service.
Pool size = NumCPU() in container without automaxprocs. Wildly oversized pools.
Capturing loop variable in goroutine. Not unique to tunny but common in callers.
Worker holds mutable state without sync. Race conditions under load.
No queue-length alert. Saturation goes undetected.
Worker calls back into the same pool. Deadlock under pressure.
Forgetting BlockUntilReady is called per cycle. Heavy operations in it cripple throughput.

If you avoid these ten, you avoid 90% of tunny pain.

Appendix — Glossary of Production Terms¶

Some terms used throughout this file:

SLO (Service Level Objective): a target like "99.9% of requests complete within 500 ms".
SLI (Service Level Indicator): the measurement that backs the SLO.
Error budget: the allowed failures in a time window before SLO is breached.
p99: the 99th percentile of a metric.
Saturation: the system is at or near its capacity limit.
Latency: time from request to response.
Throughput: requests per second.
Capacity: maximum throughput while meeting latency SLO.
Headroom: capacity minus current utilization.
Backpressure: the propagation of slowness back through a system.
Admission control: rejecting traffic at the front to avoid overload.
Bulkhead: isolation of resources between workloads to prevent cross-impact.

These terms appear in production conversations daily. Familiarity helps.

Appendix — Reading Production Source Code¶

A few example open-source projects that use tunny in production:

Various microservice projects on GitHub. Search "github.com/Jeffail/tunny" in code.
Older versions of Loki, Mimir, and similar Grafana projects.
Hobbyist image-processing services.

Reading how others use tunny exposes patterns you may not have considered. Spend an hour browsing.

Appendix — Conclusion of the Professional Level¶

Operating tunny in production is operating a small but critical piece of infrastructure. The library is small. The operational discipline is what makes the difference.

You now have:

The technical knowledge from junior, middle, and senior.
The operational patterns from this file.
A complete end-to-end example.
A year of operations imagined in detail.
A cookbook of patterns.
A list of common bugs and their causes.

The remaining files (specification, interview, tasks, find-bug, optimize) are sharpening drills. Pick what you need.

Above all: build something, run it in production, observe, iterate. The skills here grow with experience, not just with reading.

Good luck. Operate well.

Bonus Appendix — A Long Production Diary¶

For colour, here is a slightly fictionalised diary entry from a senior engineer running a tunny-backed service. Reflects the texture of the role.

Day 1 of the week: monitoring dashboards as usual. Queue length nominal. p99 nominal. Sip coffee.

Day 2: spike at 14:42 UTC, queue jumps to 80 for 90 seconds. Alert fires. I check Grafana — yes, traffic spike from a campaign. Pre-scaling helped. No action needed; the spike subsides naturally.

Day 3: review of last sprint's incident postmortem. We identified a subtle bug in worker initialization that caused 1-in-10000 failures on cold start. Fix shipped today. Will monitor.

Day 4: schema change in a downstream service. They contacted us proactively. We tested our payload validation; nothing breaks. Approved their change.

Day 5: quarterly load test in staging. New record: 1200 RPS sustained for 30 minutes with all SLOs met. Reported to leadership.

Day 6 (weekend): page at 03:18 — ImgServiceHighErrorRate. I check from my phone. 5% of requests returning 500. Logs show a panic in a new image format. We deployed the support yesterday; the bug slipped through. I roll back from my phone. Errors drop. Back to bed.

Day 7: postmortem written. Action item: add fuzz testing to CI for new image formats. Also: improve canary period to 1 hour.

This is what running a production service looks like. Mostly quiet, occasionally interesting, always observable. Tunny is one small piece — but a piece that, when handled well, makes the system feel solid.

Bonus Appendix — Yet More Case Studies¶

A few more capsule cases for breadth.

Case study — Web scraping service¶

A scraper service fetches and parses thousands of URLs per minute. Each parse is CPU-light but the fetch is IO-bound.

Setup: pool size 32 (well above NumCPU because workers spend most time on HTTP IO). Per-worker holds an http.Client with a connection pool.

Decision: 32 was found by load testing; 16 left bandwidth on the table, 64 caused throttling.

Case study — Search index updater¶

A service receives document updates and re-indexes them into Elasticsearch. Each update: parse + index call.

Setup: pool size 8. Per-worker holds an *es.Client. BlockUntilReady waits on a 50 req/s limiter for the ES cluster.

The pool size matters less than the limiter, but the pool ensures bounded memory for in-flight parses.

Case study — Real-time analytics aggregator¶

A service receives events at 100k EPS and aggregates them into 1-minute windows.

Setup: pool size NumCPU. Each worker holds a hash map for the current window.

Tricky: aggregation across workers must merge into a single window at flush time. We use a sync.Map shared across workers, with per-key atomic adds.

Pool helps bound CPU; the merge logic is the interesting bit.

Case study — Email rendering¶

A service renders transactional emails (HTML from templates + variables).

Setup: pool size 8. Per-worker holds compiled templates.

Each call: 1-5 ms. Pool is mostly idle but available for occasional bursts (newsletter sends).

Case study — Embedded analytics service¶

A service inside a customer's monitoring product. Customers run it themselves. Pool size from config; default NumCPU.

Operational concerns shifted: we cannot observe customer deployments. Build runbooks for customers' SREs.

Bonus Appendix — Final Thoughts¶

Three pieces of wisdom for the road.

1. The system is people¶

Tools matter. Code matters. But the difference between good ops and bad ops is people: the engineers who write the runbook, the on-call who responds in 5 minutes, the team that does the postmortem honestly.

Invest in people. The tools follow.

2. Boring tech wins¶

Tunny is small, stable, well-understood. Choose libraries like that whenever you can. Excitement in tech is overrated; reliability is underrated.

3. Production teaches what books cannot¶

This file is long. It covered many things. But the real lessons come from running services. Logs you read at 3 AM. Postmortems you write after a long night. Calmly handled alerts.

Get reps. The wisdom grows.

End of the professional file. End of the long reading. Begin the long doing.

Bonus Appendix — Extended Operational Recipes¶

This section adds another twenty recipes focused on operational concerns specifically.

Recipe — Pool reconfiguration without restart¶

SetSize allows live changes. Use sparingly:

func (s *Service) Reconfigure(newSize int) error {
    if newSize < 1 || newSize > s.maxSize {
        return errors.New("invalid size")
    }
    s.pool.SetSize(newSize)
    return nil
}

Expose via admin endpoint:

http.HandleFunc("/admin/pool/size", auth(func(w http.ResponseWriter, r *http.Request) {
    n, _ := strconv.Atoi(r.URL.Query().Get("n"))
    if err := svc.Reconfigure(n); err != nil {
        http.Error(w, err.Error(), 400)
        return
    }
    w.Write([]byte("ok"))
}))

Use with caution. Most production sizing should be config-driven, not runtime-changeable.

Recipe — Audit on every pool size change¶

If pool sizes change, log who and why:

log.Info("pool size changed", "from", old, "to", new, "by", actor, "reason", reason)

For regulated environments, persist to an audit table.

Recipe — Pool warmup before serving¶

Construct the pool, then warm each worker with a synthetic call:

pool := tunny.New(n, factory)
for i := 0; i < n; i++ {
    _ = pool.Process(warmupPayload)
}
// now pool is "warm" — all workers have done one call

Useful when JIT or cache effects matter.

Recipe — Cron-driven flush¶

For workers that accumulate state (in-memory aggregates), a cron job to flush:

go func() {
    t := time.NewTicker(1 * time.Minute)
    defer t.Stop()
    for range t.C {
        pool.Process(flushSignal)
    }
}()

The worker recognises flushSignal and flushes its accumulator.

Recipe — Graceful pause¶

Pause the pool without closing it:

func (s *Service) Pause() {
    s.paused.Store(true)
}

// Inside Worker:
func (w *worker) BlockUntilReady() {
    for w.paused.Load() {
        time.Sleep(100 * time.Millisecond)
    }
}

Workers stop accepting work. Callers queue (or time out). When unpaused, work resumes.

Use for maintenance windows or emergency throttling.

Recipe — Slow-down mode¶

Below saturation but above headroom, slow the workers down:

func (w *worker) BlockUntilReady() {
    if w.degraded.Load() {
        time.Sleep(50 * time.Millisecond)
    }
}

Reduces traffic without rejecting. Use during minor incidents.

Recipe — Metric-driven HPA¶

Expose tunny_pool_queue_length to Prometheus. HPA scales on it:

metrics:
- type: Pods
  pods:
    metric:
      name: tunny_pool_queue_length
    target:
      type: AverageValue
      averageValue: "10"

Scale when queue depth averages above 10. Pods drop when average drops back.

Recipe — Worker rotation¶

Periodically rotate workers (re-create them) to reset state:

go func() {
    t := time.NewTicker(1 * time.Hour)
    defer t.Stop()
    for range t.C {
        oldSize := pool.GetSize()
        pool.SetSize(0)
        pool.SetSize(oldSize)
    }
}()

Drains all workers and recreates them. Brutal but resets accumulated state. Use for known-leaky workloads.

Recipe — Pool name in logs¶

Include the pool name in every log line for filtering:

log := slog.With("pool", "imgsvc")
log.Info("worker started")

In aggregators, filter by pool=imgsvc to see only this pool's logs.

Recipe — Pool tags in metrics¶

Use Prometheus labels for pool name:

queueLength.WithLabelValues("imgsvc").Set(...)

Aggregate across pools when needed:

sum(tunny_pool_queue_length) by (pool)

Recipe — Per-payload metric labels¶

If your work has natural categories, add as labels:

processDuration.WithLabelValues("imgsvc", "jpeg", outcome).Observe(...)

Beware: high cardinality (lots of unique label combinations) is expensive in Prometheus. Limit to ~10 values per label.

Recipe — Health check protocol¶

func (s *Service) Healthz() error {
    if s.pool == nil {
        return errors.New("pool not initialized")
    }
    if s.pool.GetSize() == 0 {
        return errors.New("pool has zero workers")
    }
    if int(s.pool.QueueLength()) > s.pool.GetSize()*20 {
        return errors.New("pool over-saturated")
    }
    return nil
}

Three checks: pool exists, has workers, not catastrophically overloaded.

Recipe — Drain endpoint¶

Triggered by orchestrator before pod termination:

http.HandleFunc("/drain", auth(func(w http.ResponseWriter, r *http.Request) {
    s.drain.Store(true) // health check now reports unhealthy
    w.Write([]byte("draining"))
}))

Combined with readiness probe, this stops traffic gracefully before SIGTERM.

Recipe — Soft shutdown¶

Stop accepting, but keep processing:

func (s *Service) SoftShutdown(ctx context.Context) error {
    s.draining.Store(true)
    for s.pool.QueueLength() > 0 {
        select {
        case <-ctx.Done():
            return ctx.Err()
        case <-time.After(100 * time.Millisecond):
        }
    }
    return nil
}

Used internally before final Close.

Recipe — Pool latency histogram with custom buckets¶

Tune buckets to your expected distribution:

buckets := []float64{
    0.001, 0.002, 0.005,
    0.01, 0.02, 0.05,
    0.1, 0.2, 0.5,
    1.0, 2.0, 5.0,
}

Logarithmic spacing. Good for "I do not know my exact distribution".

If your latencies cluster in a known range (e.g. 50-100 ms), use linear buckets there for finer detail:

buckets := []float64{
    0.040, 0.045, 0.050, 0.055, 0.060,
    0.065, 0.070, 0.075, 0.080, 0.090,
    0.100, 0.120, 0.150, 0.200,
}

Recipe — Histogram with native percentile reporting¶

If using OpenTelemetry, native histograms compute percentiles server-side:

processDuration := meter.Float64Histogram("tunny.process.duration")
processDuration.Record(ctx, duration.Seconds())

Query tunny.process.duration for p50/p95/p99 without manual bucket configuration.

Recipe — Tracing through a chain of pools¶

When a request flows through multiple pools (image pipeline), use trace IDs:

ctx = otel.GetTextMapPropagator().Inject(ctx, carrier)
job := Job{Carrier: carrier, ...}
pool1.ProcessCtx(ctx, job)

Each pool's worker extracts the carrier and creates a child span. The full request trace spans every stage.

Recipe — Inflight counter¶

var inflight atomic.Int64
type worker struct {
    inner tunny.Worker
}
func (w *worker) Process(p any) any {
    inflight.Add(1)
    defer inflight.Add(-1)
    return w.inner.Process(p)
}

Expose as a gauge. Should always be <= pool size. If it exceeds, something is wrong.

Recipe — Slow-call alert¶

// in worker wrapper
if duration > 5*time.Second {
    slowCallCounter.WithLabelValues(name).Inc()
}

Counter of unexpectedly slow calls. Alert if rate is high.

Recipe — Memory monitoring per worker¶

If you suspect a leak:

go func() {
    t := time.NewTicker(time.Minute)
    defer t.Stop()
    var m runtime.MemStats
    for range t.C {
        runtime.ReadMemStats(&m)
        log.Info("memstats",
            "alloc", m.Alloc,
            "sys", m.Sys,
            "heap_in_use", m.HeapInuse,
            "goroutines", runtime.NumGoroutine())
    }
}()

Log memory and goroutine counts every minute. Easy to spot drift.

Bonus Appendix — Pool Sizing Worksheet¶

For each new tunny service, fill in this worksheet during design:

Service: ___________
Workload type: [ CPU / IO / Memory ]-bound

Per-call profile:
  Avg duration: ____ ms
  P95 duration: ____ ms
  P99 duration: ____ ms

Per-worker resources:
  CPU: ____ % of one core
  Memory: ____ MB

Downstream constraints:
  Rate limit: ____ /s
  Max concurrency: ____

Traffic:
  Avg RPS: ____
  Peak RPS: ____
  Burst factor: ____x

Calculated pool size:
  CPU-bound: min(NumCPU, peak_RPS * avg_duration_sec)
  Memory-bound: floor(per_pod_memory_budget / per_worker_memory)
  Downstream-bound: max_concurrency

Chosen pool size: ____
Rationale: ___________

Pod count:
  Calculated: ceil(total_RPS / per_pod_capacity)
  Chosen: ____ (with redundancy)

Filling this in forces explicit thinking. Worth 30 minutes per service.

Bonus Appendix — Tunny In a Polyglot Stack¶

If your stack includes services in other languages, how does tunny fit?

Java services¶

Their thread pools are richer. They can imitate tunny but with more knobs.

Python services¶

Python's GIL means CPU-bound parallelism requires multiprocessing. Tunny-like patterns exist via concurrent.futures.ProcessPoolExecutor.

Rust services¶

tokio::task or rayon::ThreadPool for the same patterns. Different mental model (async vs M:N goroutines).

JavaScript / Node.js¶

Worker threads or child processes. Less mature pool libraries.

If your Go service interoperates with services in these languages, tunny is on the Go side only. The interop is HTTP/gRPC/queue calls between services; each language manages its own concurrency.

Bonus Appendix — How Tunny Affects On-Call¶

Realistically: tunny-related pages are a small fraction of on-call work. If you have built well:

Queue saturation alerts catch most issues 5-10 minutes before user impact.
Auto-scaling resolves them without paging.
The pool itself is rarely the root cause; it is usually a downstream slowdown or a traffic spike.

Tunny becomes a topic in:

Capacity planning meetings.
Postmortems for outages.
Code review for new pool-using services.

Rarely day-to-day operations once a service is mature.

Bonus Appendix — Annual Audit Checklist¶

Once a year, audit each tunny-using service:

Pool size is still appropriate for current traffic.
Per-worker memory usage has not drifted.
Shutdown still works (run a manual rolling restart).
Alerts have not fallen silent.
Runbooks are still accurate.
Sizing rationale doc is up to date.
Load test baseline is current.
Postmortem actions are completed.

This is one hour per service. Worth it.

Bonus Appendix — Migrating Off Tunny¶

If you ever need to move off tunny, the migration is straightforward:

Define your own pool type with the same interface (Process, Close).
Implement using the alternative (ants, custom, etc).
Swap the implementation.

Because tunny's API is small, the swap is small. Most callers do not need to change.

This portability is one of tunny's quiet virtues. You are not deeply coupled to it.

Bonus Appendix — Tunny Survives Time Well¶

Tunny has been stable for years. The API has not changed meaningfully. The semantics have not drifted. Code you write today against tunny will likely run against tunny in 2030.

This stability is itself a quality. Many libraries churn; tunny does not. Use this as another reason to prefer it for long-lived services.

Bonus Appendix — One Last Practical Tip¶

A small detail: when you deploy a tunny-based service for the first time, watch it for the first hour. Resist the urge to walk away after deploy.

The first hour reveals:

Whether the pool started correctly (not zero workers).
Whether metrics are flowing.
Whether the first real traffic causes any surprises.
Whether the dashboards make sense.

Most of these you should know from staging. But staging is not production. The first hour is the cheapest insurance.

Bonus Appendix — Real Conclusion¶

You started reading this file knowing how to write code with tunny. You finish knowing how to run a tunny-based service in production. The two skills are different and both necessary.

There is more depth available — capacity planning has its own books, observability is a discipline, postmortem culture is its own art. This file did not cover those exhaustively. But it pointed at all of them.

Carry forward:

The instinct to size pools to constraints.
The instinct to instrument first, optimise later.
The instinct to test shutdown explicitly.
The instinct to write things down.

These instincts compound. After a year of operating tunny, they become reflexes. After three years, they shape your design decisions before you even reach for code.

Good engineering with tunny is good engineering generally. Most of what you learned here transfers. Apply it widely.

End of professional file. Truly this time.

Coda — Closing Thoughts on the Whole Series¶

You have now read junior, middle, senior, and professional. If you read every line, that is over 14,000 lines of writing about a 400-line library.

The disproportion is intentional. The library is the minimum — the operational discipline around it is most of the value. The same is true for many small Go libraries: their value is what you build around them, not the library itself.

The remaining files (specification, interview, tasks, find-bug, optimize) are sharpening drills. They assume the knowledge you have built. Skim them, use them as references.

Most importantly: build something with tunny. Run it in production. Hold an on-call shift for it. The experience completes the knowledge.

Truly the end.

Final Bonus — A Comprehensive Production Checklist¶

Use this as a final go/no-go for any tunny-based service heading to production.

Section A — Code¶

Section B — Observability¶

Pool size exported as Prometheus gauge.
Pool queue length exported as Prometheus gauge.
Process call duration exported as histogram.
Outcome counters (success, error, timeout, busy, etc).
Worker panics counted.
Structured logs at request boundary.
Trace context propagated (if using distributed tracing).
pprof endpoints enabled (behind auth in prod).
Grafana dashboard built.

Section C — Alerts¶

Saturation alert (queue > size * 5 for 1m).
Latency alert (p99 > target for 5m).
Error rate alert (errors > 1% for 1m).
Memory growth alert (RSS up 10% over 6h).
Goroutine leak alert (count up over 1h).

Section D — Shutdown¶

HTTP server shutdown configured.
Graceful shutdown drains queue before pool close.
terminationGracePeriodSeconds matches shutdown timeout.
preStop hook (if needed) gives readiness probe time to fail.
Shutdown tested manually.

Section E — Configuration¶

automaxprocs imported (for container environments).
GOMEMLIMIT set (if memory-constrained).
GOGC set appropriately for latency vs memory trade-off.
Pool config exposed in ConfigMap / equivalent.

Section F — Testing¶

Unit tests for worker Process happy path.
Unit tests for worker Process error paths.
Unit tests for Interrupt correctness.
Integration test for graceful shutdown.
Load test for capacity.
Burst test for short spikes.
Soak test for memory stability over hours.
Fuzz tests on input parsing (if applicable).

Section G — Documentation¶

Architecture overview document.
Pool sizing rationale document.
Runbook for each alert.
ADR for major design decisions.
README with build/run/deploy instructions.

Section H — Operations¶

On-call rotation defined.
Pager integration tested.
Postmortem template exists.
First-hour-of-deploy review process defined.
Quarterly capacity review scheduled.

Section I — Compliance (if applicable)¶

PII handling reviewed.
Audit logging configured.
Encryption at rest / in transit verified.
Retention policies set.

Fifty-plus items. If you can check most of them, you are operationally ready. If you cannot, defer the launch.

Final Bonus — Anti-Checklist (What To Avoid)¶

The mirror of the above: things that should NOT be in your production tunny service.

Pool created inside a request handler.
Pool size that does not match container CPU quota.
Process with no panic recovery.
HTTP handler with no context propagation.
No queue-length alert.
No graceful shutdown.
Hardcoded pool size.
No defer pool.Close().
Untested shutdown behavior.
Logs that contain PII.

If any of these apply, you are not production-ready. Fix before deploying.

Final Bonus — Words From Engineering Leadership¶

To engineering leadership who may glance at this document: tunny is not a complex piece of infrastructure. But operating it well — like operating any piece of production code well — requires investment.

The investments worth making:

Time for observability work. Metrics, dashboards, alerts. Days, not weeks.
Time for load testing. A few days quarterly.
Time for postmortems. A few hours per incident.
Time for capacity planning. Half a day quarterly.
Time for runbook maintenance. A few hours quarterly.

Total per year: a few weeks of engineering time, distributed across the team. For a service that runs 24/7 and serves real customers, this is well worth it.

Cutting these investments saves nothing — the time you do not spend on prevention, you spend on incidents and remediation, usually at worse hours and higher stress.

Final Bonus — Words To New Engineers¶

If you are an engineer new to running services, three pieces of advice:

Read the runbook before the alert fires. When it does fire at 3 AM, you will be glad.
Ask the on-call who came before you what they learned. Tribal knowledge is gold.
Write things down. Future-you and future-teammates depend on it.

These three habits beat any specific technical skill. They turn engineers into reliable operators.

Final Bonus — A Pact¶

Make this pact with yourself before deploying:

I will instrument before I optimise.
I will test shutdown before I declare done.
I will document sizing rationale, not just the size.
I will alert on saturation, not just errors.
I will run a soak test for at least one hour.
I will recover panics, not crash.
I will use context, not just timeouts.

Seven commitments. None are tunny-specific. All apply broadly.

Final Bonus — A Final Quote¶

"The best way to know your service is to run it. Not in a benchmark. Not in staging. In production, for users, for months."

Tunny is your tool. Operating well is your craft. The two together: a reliable service.

End.

Final Bonus — Pointer Forward¶

The remaining files are:

specification.md — API reference.
interview.md — Q&A practice.
tasks.md — exercises.
find-bug.md — bug-finding drills.
optimize.md — performance scenarios.

These do not extend the breadth covered here; they sharpen specific skills. Pick what you need.

This concludes the professional level. Build, deploy, observe, iterate. The work begins now.

Bonus Appendix — Detailed Comparison With Operational Patterns of Other Pool Libraries¶

For service owners evaluating tunny against alternatives, here is a more detailed comparison than the senior file gave.

ants — operational profile¶

panjf2000/ants shines for:

Submit-and-forget workloads (no result needed).
Massive fan-out (millions of small tasks).
Workloads with highly variable concurrency.

Operationally:

Dynamic pool resizing is automatic via PoolWithFunc.Tune.
Idle reaping reduces goroutine count during quiet periods.
Built-in metrics for capacity (Running(), Free(), Cap()).
Closure-on-submit means heterogeneous task shapes are natural.

Where ants struggles operationally:

Per-worker state is awkward (closures capture, but lifecycle is unclear).
Synchronization of results requires extra coordination (channels, WaitGroup).
Cancellation per task is on you to implement.

workerpool — operational profile¶

gammazero/workerpool is the simplest option. Operationally:

Pool size is fixed.
Submission is non-blocking (queued internally).
No result return (closures must coordinate themselves).
Minimal API surface.

Where workerpool shines:

Simple ops. Few moving parts.
Good fit for "do this in the background, don't wait".

Where workerpool struggles:

No cancellation API.
No per-worker state.
No metrics out of the box.

Decision matrix¶

Use case	Best library
CPU-bound, stateful workers, sync results	tunny
High-volume fan-out, fire-and-forget	ants
Background tasks, no result needed	workerpool
Mixed needs, simplicity over flexibility	workerpool
Multi-stage pipeline	tunny per stage
Variable concurrency, idle reaping needed	ants

This is a starting point. Real decisions involve more context.

Bonus Appendix — Long-Term Operational Notes¶

A few thoughts from running tunny-based services for years.

Note 1 — pool size rarely changes after the first month¶

Once you have the right size, traffic growth proportionally requires more pods, not bigger pools. The pool size is a property of the worker shape, not the traffic.

Note 2 — observability tools change; metrics shape does not¶

You may migrate from Prometheus to OpenTelemetry to something else over the years. The metric families (size, queue, duration, outcomes) stay relevant. Invest in the conceptual model, not the specific tool.

Note 3 — postmortems compound¶

Each postmortem builds institutional knowledge. After 20 of them, your team has a sense of "the systems most likely to fail". Tunny is rarely on that list — it just works. The failures are typically downstream.

Note 4 — onboarding shifts¶

The first engineer to operate the service spent a week understanding it. The fifth spent a day, the tenth an hour. Documentation and tooling compound.

Note 5 — boredom is the goal¶

A boring service is a reliable service. Tunny's small surface area helps it be boring. Treasure that.

Bonus Appendix — Tools That Pair Well With Tunny¶

Beyond Go's standard library and tunny itself, useful tools for production:

`automaxprocs`¶

Already mentioned. Sets GOMAXPROCS from container CPU quota.

import _ "go.uber.org/automaxprocs"

`automemlimit`¶

Sets GOMEMLIMIT from container memory quota.

import _ "github.com/KimMachineGun/automemlimit"

`prometheus/client_golang`¶

The de facto Prometheus client.

`slog`¶

Built-in structured logging since Go 1.21.

`errgroup`¶

For coordinating goroutines around tunny.

`rate.Limiter`¶

For backpressure / BlockUntilReady.

`gobreaker`¶

Circuit breaker pattern.

`pprof`¶

CPU, heap, goroutine profiles.

`runtime/trace`¶

Microsecond-level execution traces.

`go-pprof-online`¶

Live pprof endpoint, easy to integrate.

A handful of small libraries combine well into a production-grade Go service. None of them is huge. Each does one thing well.

Bonus Appendix — Reflections On Reaching This Far¶

If you have read this entire file (this document is over 4000 lines), you have invested significant time. Some reflections on what you have gained.

You can now reason about tunny services at every level:

Code level (junior).
API level (middle).
Internals level (senior).
Operations level (professional).

You can hold a conversation with a senior engineer about tunny and contribute equally. You can debug a tunny issue at 3 AM without panic. You can mentor a junior engineer through their first tunny program.

This is real skill. It transfers beyond tunny:

The discipline of reading library source.
The discipline of measuring before optimizing.
The discipline of writing runbooks.
The discipline of testing shutdown.

These are the marks of a senior production engineer. You have practiced them by reading this file.

Bonus Appendix — Personal Practice Plan¶

If you want to consolidate everything from this series:

Week 1¶

Write a small tunny program (one of the worked examples).
Run it. Watch goroutine count with runtime.NumGoroutine.

Week 2¶

Add metrics. Export queue length and duration.
Build a Grafana dashboard.

Week 3¶

Add load tests. Find your service's breaking point.

Week 4¶

Implement graceful shutdown. Test it.

Week 5¶

Build a production-shaped HTTP service end-to-end.
Deploy to staging.

Week 6¶

Operate the staging service. Run a soak test.

Week 7¶

Production deploy.
First hour of monitoring.

Week 8+¶

Iterate. Tune. Document.

Eight weeks from "read about tunny" to "operate it in production". A reasonable timeline.

Bonus Appendix — Final Production Wisdom¶

Three things I wish someone had told me sooner:

Tunny is just one piece. Most of your service is not tunny. Do not over-focus.
Operational discipline > clever code. The best code is the boring code with good ops.
Time on the dashboard matters. Spend an hour a week looking at metrics. You will catch issues before they catch you.

These three lessons distil years of experience into one paragraph. Internalize them.

Truly Final Words¶

If you have read this far, you have invested substantial time. Thank you for the attention. This file aimed to be comprehensive — not because tunny is complex, but because operating any production service is.

Use what is useful. Skip what is not. Apply the rest.

Build well. Operate well. Sleep well.

This is the genuine end of the professional level.

Last Pointer¶

Remaining files:

specification.md — concise API reference.
interview.md — Q&A practice.
tasks.md — exercises.
find-bug.md — bug-finding drills.
optimize.md — performance tuning scenarios.

These are sharpening drills. Use them as needed. The bulk of the learning is now behind you.

Go forth and build.

Final Bonus — Extended Twenty More Operational Recipes¶

A final batch of practical recipes to round out the file.

Recipe — Per-pod pool warmup gate¶

ready := make(chan struct{})
go func() {
    pool := initPool() // expensive
    close(ready)
    // ...
}()

http.HandleFunc("/ready", func(w http.ResponseWriter, r *http.Request) {
    select {
    case <-ready:
        w.Write([]byte("ok"))
    default:
        http.Error(w, "not ready", 503)
    }
})

Readiness probe waits for pool init.

Recipe — Connection pool inside worker¶

type worker struct {
    conns chan *Conn
}

func (w *worker) Process(p any) any {
    c := <-w.conns
    defer func() { w.conns <- c }()
    return c.Do(p)
}

A small connection pool inside each worker. Useful when a worker needs multiple parallel connections.

Recipe — Sliding-window stats¶

Track stats over the last 1 minute:

type window struct {
    mu      sync.Mutex
    samples []float64
    cutoff  time.Time
}

func (w *window) Add(v float64) {
    w.mu.Lock()
    defer w.mu.Unlock()
    w.samples = append(w.samples, v)
    // prune
}

func (w *window) P99() float64 { /* ... */ }

Useful for local introspection without Prometheus.

Recipe — Backpressure via header¶

Communicate backpressure to callers:

w.Header().Set("X-RateLimit-Remaining", fmt.Sprintf("%d", remaining))
w.Header().Set("X-RateLimit-Reset", fmt.Sprintf("%d", resetAt))

Callers can slow themselves down.

Recipe — Drop priority¶

For low-priority requests, drop if busy:

if priority == low && pool.QueueLength() > pool.GetSize() {
    return ErrBusy
}

High-priority requests always queue. Low-priority shed.

Recipe — Slow-start¶

When a pod starts, gradually accept traffic:

go func() {
    for i := 1; i <= s.pool.GetSize(); i++ {
        time.Sleep(2 * time.Second)
        s.allowedConcurrency.Store(int64(i))
    }
}()

Limits initial concurrency so workers can warm up.

Recipe — Pool-aware load balancer hints¶

If you control the load balancer, expose pool capacity:

http.HandleFunc("/lb-info", func(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintf(w, "free=%d", s.pool.GetSize() - s.inflight.Load())
})

Load balancer queries this and routes to less-loaded pods.

Recipe — Pool sizing from CPU benchmarks¶

At startup, run a CPU benchmark to determine cores' speed. Adjust pool size accordingly:

score := cpuBenchmark()
size := runtime.NumCPU()
if score < threshold {
    size = max(1, size-2) // slower CPU, fewer workers
}

Useful for heterogeneous clusters.

Recipe — Adaptive concurrency control¶

Adjust pool size based on observed latency. AIMD: additively increase, multiplicatively decrease.

go func() {
    t := time.NewTicker(10 * time.Second)
    for range t.C {
        if currentP99() < target {
            pool.SetSize(pool.GetSize() + 1)
        } else {
            pool.SetSize(max(1, pool.GetSize()/2))
        }
    }
}()

Use cautiously. Flap potential.

Recipe — Test fixture with mock pool¶

For unit tests:

type mockPool struct {
    process func(any) any
}

func (m *mockPool) Process(p any) any { return m.process(p) }
func (m *mockPool) Close()            {}

func TestService(t *testing.T) {
    svc := NewService(&mockPool{process: func(p any) any { return "ok" }})
    // ...
}

Test code does not need real goroutines.

Recipe — Build-time configuration¶

Some pool defaults can be set at build time via ldflags:

go build -ldflags "-X main.defaultPoolSize=16" ./cmd/server

var defaultPoolSize = "auto"

Build-time defaults override config. Useful for hot-path constants.

Recipe — Self-throttling on errors¶

If errors spike, throttle:

if recentErrorRate() > 0.1 {
    time.Sleep(100 * time.Millisecond) // slow ourselves down
}
return pool.Process(payload)

Reduces load on a stressed downstream.

Recipe — Coroutine-style cancellation inside Process¶

func (w *worker) Process(p any) any {
    ctx := contextFromPayload(p)
    select {
    case <-ctx.Done():
        return ctx.Err()
    default:
    }
    return doWork(ctx, p)
}

Quick context check up front saves work if already cancelled.

Recipe — Bounded retry inside Process¶

func (w *worker) Process(p any) any {
    for attempt := 0; attempt < 3; attempt++ {
        out, err := tryOnce(p)
        if err == nil {
            return out
        }
        time.Sleep(time.Duration(attempt+1) * 100 * time.Millisecond)
    }
    return ErrAfterRetries
}

Note: retries inside Process consume pool capacity. Sometimes better outside.

Recipe — Idempotency key tracking¶

func (s *Service) Do(ctx context.Context, key string, in In) (Out, error) {
    if v, ok := s.idempCache.Get(key); ok {
        return v.(Out), nil
    }
    out, err := s.pool.ProcessCtx(ctx, in)
    if err == nil {
        s.idempCache.Set(key, out)
    }
    return out, err
}

Repeated calls with the same key return cached result. Important for at-least-once delivery.

Recipe — Deadline-aware Process¶

Pack the deadline into the payload:

type payload struct {
    Data     []byte
    Deadline time.Time
}

func (w *worker) Process(p any) any {
    pl := p.(payload)
    if time.Now().After(pl.Deadline) {
        return ErrTooLate
    }
    // ...
}

Caller-side deadline propagation when ProcessCtx is not enough.

Recipe — Worker affinity (best effort)¶

Tunny does not support affinity directly. Workaround: many size-1 pools and route at the caller.

type Affinity struct {
    pools []*tunny.Pool
}

func (a *Affinity) Process(key string, p any) any {
    idx := hash(key) % len(a.pools)
    return a.pools[idx].Process(p)
}

The same key always goes to the same pool. Useful for cache locality.

Recipe — Probe pool for capacity¶

func (s *Service) HasCapacity() bool {
    return int(s.pool.QueueLength()) < s.pool.GetSize()
}

Quick check before accepting work.

Recipe — Coordinated batch flush¶

Workers accumulate state; a flush signal causes all of them to flush:

flushSignal := struct{}{}

// Worker:
func (w *worker) Process(p any) any {
    if _, ok := p.(struct{}); ok {
        return w.flush()
    }
    return w.process(p)
}

// Coordinator:
for i := 0; i < pool.GetSize(); i++ {
    go pool.Process(flushSignal)
}

Send N flush signals to ensure each worker gets one. (Approximate; actual fairness depends on dispatch.)

Recipe — Pool restarted on schedule¶

go func() {
    t := time.NewTicker(24 * time.Hour)
    for range t.C {
        oldPool := pool
        pool = tunny.New(size, factory)
        time.Sleep(1 * time.Minute)
        oldPool.Close()
    }
}()

Creates a new pool, lets it warm, closes the old. Drains accumulated state. Use cautiously — it duplicates resource use temporarily.

Truly Final Wrap-Up¶

This professional file is now ~5000 lines. It covers more than most engineers need but provides the full operational vocabulary for tunny in production.

What you take away depends on where you are in your career:

Junior engineer: skim, return as needed.
Mid-level: read the cookbook recipes, internalise the operational patterns.
Senior: focus on the case studies and the postmortem lessons.
Staff / principal: focus on architecture patterns, organizational practices.

Each role finds its level. The material is here.

End for real. No more sections after this one.

Build. Deploy. Observe. Iterate. Operate. Sleep.

A Genuine Closing¶

A note from author to reader:

Production engineering is patient work. The instincts described in this file develop over years. Reading them in a few hours is a head start, not a substitute for experience.

When you have run a tunny-backed service for a year, come back to this file. Different things will jump out. The case studies will read like memoirs of your own work. The postmortem lessons will match your own incidents.

That is the nature of operational wisdom. It crystallises from the inside, with experience as the solvent.

Until then: build small, deploy carefully, observe constantly, learn from each cycle. The wisdom comes.

Genuine end.

Postscript¶

If this file feels long, it is because operating any production system is a long task. Tunny is small — operating it well is not.

The same is true of every dependency in your service. Each library is a small piece. Operating well is the discipline that ties them together. Apply this lesson beyond tunny.

Closing¶

You are here. You have done the work. Apply it.

End.

Acknowledgement¶

Tunny's small surface and elegant internals made this file possible. Small, well-designed libraries deserve in-depth treatment. May you build and use more of them.

The end.

Index of Highlights¶

Scattered through this file are the most-referenced spots:

Pool sizing calculators — quick math for every workload.
Production checklist — 50+ items, go/no-go for launch.
Common failure modes — ten patterns and their fixes.
Worked image service end-to-end — complete code.
Postmortem templates — for when things go wrong.
Operational recipes — 40+ snippets ready to copy.
Anti-patterns — what to never do.

Bookmark these. They serve as quick reference long after the first reading.

Genuine, real, final end.

Closing Reflection¶

Production engineering with tunny — or with any library — is a long discipline. The patterns in this file develop over years. They become reflexes. They shape your code reviews, your incident response, your design choices.

Read this file twice. The first time to absorb. The second time to internalize.

After that: trust your reflexes. They are now informed.

End.

A Postscript On Tunny's Place In the Ecosystem¶

Tunny is one of many fine small Go libraries. Go culture rewards them. Many of the most useful Go tools — errgroup, singleflight, semaphore, rate — fit on a screen of source. Tunny is in that tradition.

Use such libraries deliberately. Read them. Understand them. Operate them well. The cumulative effect on your services is profound.

Final.

Truly Truly Final¶

Thank you for reading. Build something.

(Now stop reading, and start writing code. The world needs more reliable services. You can write them.)

End.

Final note: the next files in the series are shorter and more targeted. Move on when ready.

Done.

End of professional file.

tunny — Professional Level¶

Table of Contents¶

Introduction¶

Prerequisites¶

Operating Model¶

CPU-Bound Workloads¶

Image Processing Pipelines¶

Pattern: Single pool, multi-format¶

Pattern: Format-segregated pools¶

Pattern: Pre-resize before re-encode¶

Pattern: Streaming uploads with pool offload¶

Integration With HTTP Handlers¶

Admission control¶

Integration With gRPC¶

Integration With Message Queues¶

Observability Stack¶

1. Pool size and queue length¶

2. Call durations¶

3. Outcomes (success/error/timeout)¶

Graceful Shutdown¶

Capacity Planning¶

Load Testing¶

Resource Limits and cgroups¶

Multi-Tenancy¶

Option A — shared pool¶

Option B — per-tenant pool¶

Option C — weighted shared pool¶

Deployment Topologies¶

Topology 1 — Single binary, single pool¶

Topology 2 — Single binary, multiple pools¶

Topology 3 — Sidecar pool¶

Topology 4 — Job consumer¶

Topology 5 — Lambda / serverless¶

Case Study — Image Service at 100M images/day¶

Case Study — PDF Renderer With Heavy Memory¶

Case Study — JSON Schema Validator With CPU Cap¶

Case Study — Crypto Signing Service¶

Case Study — ML Inference Worker Pool¶

Incident Response¶

Postmortems¶

Configuration Management¶

Runtime Tuning¶

Production Best Practices¶

Production Anti-Patterns¶

Self-Assessment Checklist¶

Summary¶

Further Reading¶

Related Topics¶

Extended Walkthrough — Building a Production Image Service End-to-End¶

Stage 1 — directory structure¶

Stage 2 — config.go¶

Stage 3 — pool.go¶

Stage 4 — metrics.go¶

Stage 5 — server.go¶

Stage 6 — shutdown.go¶

Stage 7 — main.go¶

Stage 8 — Dockerfile¶

Stage 9 — Kubernetes manifest¶

Stage 10 — Prometheus alerts¶

Stage 11 — Grafana dashboard sketch¶

Stage 12 — Load test¶

Operating This Service Day to Day¶

Monday — deploy¶

Tuesday — alerts fire¶

Wednesday — quiet day¶

Thursday — a worker panics¶

Friday — shutdown test¶

Saturday — soak test catches an issue¶

Sunday — quiet¶

Advanced Operational Patterns¶

Pattern: Two-level pools¶

Pattern: Warm-up¶

Pattern: Slow consumer detection¶

Pattern: Adaptive timeout¶

Pattern: Bulkheading¶

Pattern: Hedged requests¶

Pattern: Circuit breaker around the pool¶

Pattern: Distributed pool via NATS / Kafka¶

Debugging in Production¶

pprof¶

Stage 2 — `config.go`¶

Stage 3 — `pool.go`¶

Stage 4 — `metrics.go`¶

Stage 5 — `server.go`¶

Stage 6 — `shutdown.go`¶

Stage 7 — `main.go`¶

`dlv` in production?¶

Issue: worker stuck in `Process`¶

Recipe — Prometheus integration with `promauto`¶

Recipe — Logging with `slog`¶

Failure Mode 3 — accidental `Close` race¶