Dynamic Worker Scaling — Specification¶
Table of Contents¶
- Introduction
- Worker Pool Semantics
- Resize Operation Contract
- ants v2 Public API
- tunny Public API
- pond Public API
- Channel Operations
- Atomic Operation Guarantees
- GOMAXPROCS Interaction
- Runtime Knobs
- Memory Model Implications
- References
Introduction¶
This specification defines the formal contracts of dynamic worker pool operations: what Resize/Tune guarantees, what they do not guarantee, how external coordination interacts with Go's runtime, and the public APIs of major pool libraries.
Where applicable, this references the Go language specification, the Go memory model, and the documentation of panjf2000/ants, Jeffail/tunny, and alitto/pond.
Worker Pool Semantics¶
Definition¶
A worker pool is a set of goroutines that consume tasks from a shared queue or per-worker mailbox. The number of workers is bounded.
A dynamic worker pool is a worker pool whose worker count (capacity) can change at runtime.
Invariants¶
For a correctly-implemented dynamic worker pool:
- The number of in-flight workers at any time is at most the configured capacity.
- A task submitted via
Submiteither runs to completion exactly once, or fails immediately with an error/panic. It is never partially run. Resize(n)is idempotent within the same value: twoResize(5)calls have the same effect as one.Resizeoperations are linearizable: there is a global order across all resize and worker operations.- After
Closereturns, no new tasks will run.
Non-invariants¶
The following are not guaranteed:
- Resize takes effect immediately. Shrink in particular is often opportunistic.
- Two tasks submitted in order will execute in order.
- A task submitted before Resize(smaller) will run before the resize takes effect.
- The pool's live worker count equals its capacity at every instant.
Resize Operation Contract¶
Grow¶
When called with target > current:
- New workers MAY be spawned synchronously (before Resize returns) or asynchronously.
- New workers are functionally identical to existing workers.
- The pool's capacity is updated atomically before Resize returns.
Shrink¶
When called with target < current:
- The pool's capacity is updated atomically before Resize returns.
- Existing workers SHOULD exit when they next become idle. They MUST NOT abandon in-flight tasks.
- The shrink is bounded by current task durations; it may take any amount of time.
Bounds¶
A pool MAY enforce internal bounds (floor, ceiling). If Resize(target) exceeds bounds, the pool MAY clamp silently or return an error.
Concurrency¶
Multiple concurrent Resize calls SHOULD be serialized (mutex or atomic). The final state should reflect the order of calls.
Atomicity¶
The pool's Size() method returns the current live worker count, which may not match the most recent Resize target due to opportunistic shrink.
ants v2 Public API¶
From panjf2000/ants v2 (the most widely-used Go pool library):
Pool¶
Constructs a pool with initial capacity size. Returns a pool ready to accept submissions.
Methods¶
Submits a task. Returns: - nil on success - ErrPoolClosed if pool is closed - ErrPoolOverload if at capacity and non-blocking
Atomically updates the pool's capacity to size. Lazy shrink: workers exit opportunistically. Lazy grow: new workers spawn on demand. Tune(-1) enables unlimited capacity.
Returns the number of workers currently executing tasks.
Returns the current capacity (as set by NewPool or Tune).
Returns Cap() - Running(). May be negative briefly during transitions.
Closes the pool. Existing tasks finish; no new tasks accepted.
Like Release() but with a timeout. Returns error if not drained in time.
Returns true after Release was called.
Options¶
type Option func(*Options)
func WithPreAlloc(preAlloc bool) Option
func WithExpiryDuration(expiryDuration time.Duration) Option
func WithNonblocking(nonblocking bool) Option
func WithMaxBlockingTasks(maxBlockingTasks int) Option
func WithPanicHandler(panicHandler func(interface{})) Option
func WithLogger(logger Logger) Option
func WithDisablePurge(disable bool) Option
| Option | Default | Effect |
|---|---|---|
| PreAlloc | false | Allocate workers upfront vs on demand |
| ExpiryDuration | 1s | Idle worker exit timer |
| Nonblocking | false | Submit returns error vs blocks when at cap |
| MaxBlockingTasks | 0 (unlimited) | Limit on Submit-blocking goroutines |
| PanicHandler | nil | Function called when worker panics |
| Logger | nil | Custom logger |
| DisablePurge | false | Disable idle worker purging |
PoolWithFunc¶
type PoolWithFunc struct { ... }
func NewPoolWithFunc(size int, pf func(interface{}), options ...Option) (*PoolWithFunc, error)
Pool where every task is the same function. Faster (no closure allocation per task).
Submits an invocation. Returns ErrPoolOverload, ErrPoolClosed, etc.
MultiPool¶
type MultiPool struct { ... }
func NewMultiPool(size, sizePerPool int, ls LoadBalancingStrategy, options ...Option) (*MultiPool, error)
Sharded pool: N internal pools, work distributed by load-balancing strategy.
Default pool¶
ants exposes a package-level default pool for convenience.
tunny Public API¶
From Jeffail/tunny:
Pool¶
type Pool struct { ... }
func New(n int, ctor func() Worker) *Pool
func NewFunc(n int, f func(interface{}) interface{}) *Pool
func NewCallback(n int) *Pool
New takes a Worker factory. NewFunc wraps a function. NewCallback for goroutines with callback dispatch.
Methods¶
Synchronously submits and waits for result. Blocks if pool is busy.
Like Process but with timeout. Returns ErrJobTimedOut if not started in time.
Process with context. Returns ctx.Err() if cancelled.
Changes the worker count. Eager: shrinking blocks until excess workers terminate.
Returns the current number of workers.
Returns the number of tasks queued (waiting for an idle worker).
Closes the pool. All workers terminated.
Worker¶
type Worker interface {
Process(payload interface{}) interface{}
BlockUntilReady()
Interrupt()
Terminate()
}
Implementations provide per-worker state and lifecycle hooks.
pond Public API¶
From alitto/pond:
Pool¶
Constructs a pool with up to maxWorkers workers and a queue of maxCapacity tasks.
Methods¶
Submits a task. Blocks if queue is full (unless options).
Returns false if queue full.
Submits and waits for completion.
Closes the pool. StopAndWait blocks until tasks complete.
func (p *WorkerPool) Running() int
func (p *WorkerPool) Idle() int
func (p *WorkerPool) Submitted() uint64
func (p *WorkerPool) Completed() uint64
func (p *WorkerPool) WaitingTasks() uint64
Various counters and gauges.
Groups¶
func (p *WorkerPool) Group() *TaskGroup
func (p *WorkerPool) GroupContext(ctx context.Context) (*TaskGroupWithContext, context.Context)
Task groups (like errgroup) for batch operations.
Options¶
type Option func(*WorkerPool)
func IdleTimeout(d time.Duration) Option
func MinWorkers(n int) Option
func PanicHandler(f func(interface{})) Option
Channel Operations¶
Dynamic pools rely on channel semantics:
Send¶
- Blocks if channel is unbuffered and no receiver is ready
- Blocks if channel is buffered and full
- Panics if channel is closed
- Atomic with respect to other operations on the same channel
Receive¶
- Blocks if channel is empty
- Returns (zero, false) if channel is closed and drained
- Returns (value, true) otherwise
Close¶
- Subsequent sends panic
- Subsequent receives return (zero, false) once buffer is drained
- Closing a closed channel panics
Select¶
- Picks one ready case; if none, default (if present) or blocks
- Selection is pseudo-random among ready cases
These semantics define how pools' job channels behave under contention.
Atomic Operation Guarantees¶
From sync/atomic:
Loads and Stores¶
- Sequentially consistent on most platforms (Go memory model)
- Slower than non-atomic but faster than mutex
Compare-and-swap¶
- Atomic check-and-update
- Returns true if x was old (now is new), false otherwise
Add¶
- Atomic increment, returns new value
- Combines load, modify, store into one operation
Typed atomic (Go 1.19+)¶
Cleaner syntax. Same semantics.
Pools use atomics for counters (running, cap, busy). The combination of atomic + mutex (for cond, free list) is standard.
GOMAXPROCS Interaction¶
runtime.GOMAXPROCS(n) sets the maximum number of OS threads simultaneously executing Go code.
Worker pools and GOMAXPROCS interact:
- CPU-bound pool: ideal size is approximately
GOMAXPROCS. More workers cause scheduler overhead. - I/O-bound pool: workers spend time blocked on syscalls. Can have many more workers than GOMAXPROCS.
- The Go scheduler multiplexes goroutines onto threads. If you have 1000 workers but GOMAXPROCS=4, at most 4 workers run simultaneously; the rest wait.
In containerized environments, GOMAXPROCS defaults to the host CPU count, which may exceed the container's CFS quota. Use uber-go/automaxprocs to align with container limits.
For pool autoscaling decisions: - CPU-bound: ceiling ≈ GOMAXPROCS * 1.5 (allow brief over-subscription) - I/O-bound: ceiling based on memory and downstream limits, often much higher
Runtime Knobs¶
Go runtime knobs that affect worker pools:
GOGC¶
Higher = less frequent GC, more memory. Lower = more frequent GC, less memory.
Pools with many short-lived allocations benefit from tuning GOGC. Try GOGC=200 for less GC pressure.
GOMEMLIMIT¶
Soft memory limit. GC tries to keep heap below this.
For pools, GOMEMLIMIT prevents OOM during burst-grow.
GODEBUG¶
GODEBUG=schedtrace=1000 (print scheduler trace every 1000 ms)
GODEBUG=gctrace=1 (print GC trace each cycle)
Useful for diagnosing scheduler or GC issues.
runtime/debug¶
debug.SetMaxThreads(n) // max OS threads
debug.SetGCPercent(n) // set GOGC at runtime
debug.SetMemoryLimit(b) // set GOMEMLIMIT at runtime
For dynamic tuning from within the program.
Memory Model Implications¶
The Go memory model (go.dev/ref/mem) defines when one goroutine's writes are visible to another.
Happens-before¶
A Resize(n) that updates the capacity establishes a happens-before relationship with a subsequent worker's check of the capacity.
For pool implementations: - atomic.Store(&capacity, n) happens-before atomic.Load(&capacity) returning n - A chan send happens-before the corresponding receive - A sync.Mutex.Unlock() happens-before the next Lock() succeeds
Implications¶
If your pool uses non-atomic writes to capacity but atomic reads in workers, you have a data race. The race detector will flag this.
Always use atomic operations or hold a mutex consistently for shared state.
sync.Cond¶
Wait releases the cond's mutex and blocks. Signal wakes one waiter; Broadcast wakes all.
ants uses sync.Cond to coordinate between submitters and workers when blocking is enabled.
References¶
- Go Language Specification: https://go.dev/ref/spec
- Go Memory Model: https://go.dev/ref/mem
sync/atomicpackage: https://pkg.go.dev/sync/atomicsyncpackage: https://pkg.go.dev/syncruntimepackage: https://pkg.go.dev/runtimeruntime/debugpackage: https://pkg.go.dev/runtime/debug- panjf2000/ants: https://github.com/panjf2000/ants
- Jeffail/tunny: https://github.com/Jeffail/tunny
- alitto/pond: https://github.com/alitto/pond
- uber-go/automaxprocs: https://github.com/uber-go/automaxprocs
Related specifications¶
- AWS Auto Scaling (cluster-level reference)
- Kubernetes HPA design documents
- TCP congestion control (AIMD): RFC 5681
- Linux CFS (Completely Fair Scheduler) for thread-level scheduling context
Implementation references¶
ants/pool.go: pool struct, lifecycleants/worker.go: worker struct, run loopants/worker_loop_queue.go: ring buffer free listants/worker_stack.go: LIFO free listtunny/tunny.go: pool implementationpond/pond.go: pool implementation
These are the authoritative sources. Consult them when in doubt about exact behavior.