Wait-for-Empty-Channel — Optimization Scenarios¶

Nine scenarios where polling-based code can be optimized by replacing it with event-driven synchronisation. Each scenario presents the slow version, the fast version, expected performance characteristics, and a brief discussion.

Numbers are indicative of typical results; your workload may vary. Always measure your own.

Scenario 1: Replace Polling Drain with Range¶

Before¶

func drainSlow(ch chan int) {
    for len(ch) > 0 {
        <-ch
        time.Sleep(time.Microsecond) // small delay to avoid pegging CPU
    }
}

After¶

func drainFast(ch chan int) {
    for range ch {
    }
}

Performance¶

Metric	Polling	Range
Time to drain 1M items	~1.5s	~30ms
CPU usage	High (busy poll)	Low (block)
Allocation overhead	Same	Same

Discussion¶

The polling version both races (may miss items added concurrently) and wastes CPU. The range version is correct and uses the scheduler to suspend when no items are available. 50x speedup on drain time, near-zero CPU when idle.

The trick: the producer must close the channel for range to terminate. If your code does not close, that is the actual bug to fix first.

Scenario 2: Replace Polling-Based Wait with WaitGroup¶

Before¶

func processSlow(items []int) []int {
    out := make(chan int, len(items))
    for _, item := range items {
        go func(item int) {
            out <- compute(item)
        }(item)
    }
    for len(out) < len(items) {
        time.Sleep(time.Millisecond)
    }
    var result []int
    for i := 0; i < len(items); i++ {
        result = append(result, <-out)
    }
    return result
}

After¶

func processFast(items []int) []int {
    out := make(chan int)
    var wg sync.WaitGroup
    wg.Add(len(items))
    for _, item := range items {
        go func(item int) {
            defer wg.Done()
            out <- compute(item)
        }(item)
    }
    go func() {
        wg.Wait()
        close(out)
    }()
    var result []int
    for v := range out {
        result = append(result, v)
    }
    return result
}

Performance¶

Metric	Polling	WaitGroup
Latency (P50)	+5 ms (1 poll)	<1ms
Latency (P99)	+25 ms	<1ms
CPU during wait	50% of a core	<1% of a core
Items lost (race)	0-3 per call	0

Discussion¶

The polling version's latency tail is the polling interval (here ~10ms). The WaitGroup version exits as soon as the work is done. Both correctness and performance improve.

Scenario 3: Replace Polling Worker Pool with errgroup¶

Before¶

type SlowPool struct {
    jobs    chan Job
    stopped int32
}

func (p *SlowPool) worker() {
    for atomic.LoadInt32(&p.stopped) == 0 {
        select {
        case j := <-p.jobs:
            process(j)
        default:
            time.Sleep(time.Millisecond)
        }
    }
}

func (p *SlowPool) Stop() {
    atomic.StoreInt32(&p.stopped, 1)
    for len(p.jobs) > 0 {
        time.Sleep(10 * time.Millisecond)
    }
}

After¶

type FastPool struct {
    jobs chan Job
    g    *errgroup.Group
    ctx  context.Context
}

func NewFastPool(parent context.Context, workers int) *FastPool {
    g, ctx := errgroup.WithContext(parent)
    p := &FastPool{
        jobs: make(chan Job),
        g:    g,
        ctx:  ctx,
    }
    for i := 0; i < workers; i++ {
        g.Go(p.worker)
    }
    return p
}

func (p *FastPool) worker() error {
    for {
        select {
        case <-p.ctx.Done():
            return nil
        case j, ok := <-p.jobs:
            if !ok {
                return nil
            }
            process(j)
        }
    }
}

func (p *FastPool) Stop() error {
    close(p.jobs)
    return p.g.Wait()
}

Performance¶

Metric	Slow	Fast
Throughput (jobs/sec)	12,000	28,000
Worker CPU when idle (per pool)	8% / core	<0.1%
Shutdown time P99	3.5s	25ms
Lines of code	~50	~45

Discussion¶

The select/default polling in the worker is the bottleneck. Each iteration spins through a no-op default branch, sleeps 1ms, repeats. Even with no work, the pool burns CPU.

The event-driven version uses select without default, so the goroutine parks until an event arrives. CPU goes to zero when idle.

Scenario 4: Replace Polling Shutdown with Bounded Wait¶

Before¶

func (s *Server) ShutdownSlow() {
    s.stop()
    for s.activeRequests() > 0 {
        time.Sleep(100 * time.Millisecond)
    }
}

After¶

func (s *Server) ShutdownFast(ctx context.Context) error {
    return s.srv.Shutdown(ctx) // stdlib's implementation uses sync.WaitGroup internally
}

Performance¶

Metric	Slow	Fast
Shutdown time P99	2.5s	50ms
Shutdown time P99.9	12s	200ms
Times deadline exceeded	0.5% of runs	0

Discussion¶

The polling version has a granularity of time.Sleep. Even when all requests complete, the wait persists until the next poll. The stdlib Shutdown returns as soon as the last request finishes.

Scenario 5: Replace Polling Drain with Token-Return Pattern¶

Before¶

type SlowService struct {
    inFlight atomic.Int64
}

func (s *SlowService) Drain() {
    for s.inFlight.Load() > 0 {
        time.Sleep(time.Millisecond)
    }
}

After¶

type FastService struct {
    tokens chan struct{}
}

func New(max int) *FastService {
    s := &FastService{
        tokens: make(chan struct{}, max),
    }
    for i := 0; i < max; i++ {
        s.tokens <- struct{}{}
    }
    return s
}

func (s *FastService) Do(fn func()) {
    <-s.tokens
    defer func() { s.tokens <- struct{}{} }()
    fn()
}

func (s *FastService) Drain() {
    for i := 0; i < cap(s.tokens); i++ {
        <-s.tokens
    }
}

Performance¶

Metric	Slow	Fast
Drain time	Up to poll * N	Bounded
CPU during drain	50% of core	<1%
Correctness	Racy	Deterministic

Discussion¶

The polling version is racy: inFlight.Load() == 0 is checked atomically, but between the check and the next operation, new work can arrive (or in-flight work can re-enter). The token-return pattern receives exactly max tokens and is deterministic.

This pattern works for bounded resources (connection pools, rate limiters, semaphore-style limits).

Scenario 6: Replace Polling-Based Backpressure with Channel Backpressure¶

Before¶

func produceSlow(ch chan<- int, source []int) {
    for _, v := range source {
        for len(ch) > 80 {
            time.Sleep(time.Millisecond)
        }
        ch <- v
    }
}

After¶

func produceFast(ctx context.Context, ch chan<- int, source []int) error {
    for _, v := range source {
        select {
        case ch <- v:
        case <-ctx.Done():
            return ctx.Err()
        }
    }
    return nil
}

Performance¶

Metric	Slow	Fast
Throughput	Limited by poll	Native
Latency P99	+10 ms (1 poll)	<100 μs
CPU during backpressure	Wasted	None

Discussion¶

The polling version checks "is there room?" and sleeps if not. The fast version sends and lets the channel block when full. The block is the backpressure. No CPU is wasted on polling; the goroutine is suspended by the scheduler.

For very high throughput pipelines, this single change can yield 2-3x throughput gains.

Scenario 7: Replace Polling-Based Initialization with `sync.Once`¶

Before¶

var (
    initialized int32
    config      *Config
)

func GetConfig() *Config {
    for atomic.LoadInt32(&initialized) == 0 {
        time.Sleep(time.Millisecond)
    }
    return config
}

After¶

var (
    configOnce sync.Once
    initDone   = make(chan struct{})
    config     *Config
)

func InitConfig() {
    configOnce.Do(func() {
        config = loadConfig()
        close(initDone)
    })
}

func GetConfig() *Config {
    <-initDone
    return config
}

Performance¶

Metric	Polling	Once+Channel
First call latency	+0.5 ms	<1 μs
Subsequent call latency	+0.5 ms	<50 ns
CPU per call	Wasted poll	None

Discussion¶

The polling version blocks for up to one poll interval per call. The Once-based version returns immediately once initialization completes.

If init failure should be retried, use singleflight instead of Once.

Scenario 8: Replace Polling Queue Depth Check with Metrics Gauge¶

Before¶

func reportDepth(ch chan int) {
    for {
        if len(ch) > 0 {
            metrics.Counter("non-zero").Inc()
        }
        time.Sleep(time.Millisecond)
    }
}

After¶

func reportDepth(ctx context.Context, ch chan int) {
    ticker := time.NewTicker(time.Second)
    defer ticker.Stop()
    for {
        select {
        case <-ctx.Done():
            return
        case <-ticker.C:
            metrics.Gauge("queue.depth").Set(float64(len(ch)))
        }
    }
}

Performance¶

Metric	Polling 1ms	Ticker 1s
Reads per second	1000	1
CPU cost	5% of core	Negligible
Metric usefulness	Same	Same (sampled)

Discussion¶

The polling version reads len 1000 times per second to detect non-zero. The ticker version reads once per second and emits a gauge that downstream metrics infra can aggregate. The 1000x reduction in len calls is a meaningful CPU saving.

Even more important: the polling version was wrong (it was control flow, not observability); the ticker version is right (it is observability).

Scenario 9: Replace Polling Inter-Service Health Check with Server-Sent Events¶

Before¶

func waitForReadiness(addr string) {
    for {
        resp, err := http.Get(addr + "/health")
        if err == nil && resp.StatusCode == 200 {
            return
        }
        time.Sleep(time.Second)
    }
}

After¶

If the service supports it, use Server-Sent Events or a long-poll endpoint:

func waitForReadiness(ctx context.Context, addr string) error {
    req, _ := http.NewRequestWithContext(ctx, "GET", addr+"/wait-for-ready", nil)
    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        return err
    }
    defer resp.Body.Close()
    return nil // server returns 200 when ready
}

Server-side:

http.HandleFunc("/wait-for-ready", func(w http.ResponseWriter, r *http.Request) {
    select {
    case <-readiness:
        w.WriteHeader(200)
    case <-r.Context().Done():
        // client cancelled
    case <-time.After(60 * time.Second):
        w.WriteHeader(504) // long-poll timeout
    }
})

Performance¶

Metric	Polling 1s	Long-poll
Latency to detect ready	0-1000ms	<50ms
Requests per minute	60	0-1
Server CPU	60 req/min	1 req

Discussion¶

The polling version sends a request every second. The long-poll version holds one open request that completes when the event happens. Lower latency and dramatically less load on both sides.

Server-Sent Events are similar but for continuous streams; long-poll is right for one-shot "is it ready?" semantics.

Closing Summary¶

Across nine scenarios:

Scenario	Throughput change	Latency change	CPU change
1. Polling drain → range	50x	-1.5s	-99%
2. Polling wait → WG	n/a	-25ms P99	-98%
3. Polling pool → errgroup	2.3x	-3.5s P99 shutdown	-98%
4. Polling shutdown → Shutdown	n/a	-2.5s P99	-95%
5. Polling drain → token	n/a	Bounded	-95%
6. Polling backpressure → block	2-3x	-10ms P99	-100%
7. Polling init → Once	n/a	-0.5ms	-100%
8. Polling depth → gauge	n/a	n/a (informational)	-99%
9. Polling health → long-poll	n/a	-500ms P99	-98%

The pattern is consistent: replacing polling with event-driven primitives yields 50-100% CPU savings, large latency improvements (especially at the tail), and often throughput gains. The cost: a few lines of code change per instance.

This is the optimization payoff. Multiplied across an entire codebase, it is measurable in cloud bills, customer experience, and incident frequency.

Wait-for-Empty-Channel — Optimization Scenarios¶

Scenario 1: Replace Polling Drain with Range¶

Before¶

After¶

Performance¶

Discussion¶

Scenario 2: Replace Polling-Based Wait with WaitGroup¶

Before¶

After¶

Performance¶

Discussion¶

Scenario 3: Replace Polling Worker Pool with errgroup¶

Before¶

After¶

Performance¶

Discussion¶

Scenario 4: Replace Polling Shutdown with Bounded Wait¶

Before¶

After¶

Performance¶

Discussion¶

Scenario 5: Replace Polling Drain with Token-Return Pattern¶

Before¶

After¶

Performance¶

Discussion¶

Scenario 6: Replace Polling-Based Backpressure with Channel Backpressure¶

Before¶

After¶

Performance¶

Discussion¶

Scenario 7: Replace Polling-Based Initialization with sync.Once¶

Before¶

After¶

Performance¶

Discussion¶

Scenario 8: Replace Polling Queue Depth Check with Metrics Gauge¶

Before¶

After¶

Performance¶

Discussion¶

Scenario 9: Replace Polling Inter-Service Health Check with Server-Sent Events¶

Before¶

After¶

Performance¶

Discussion¶

Closing Summary¶

Scenario 7: Replace Polling-Based Initialization with `sync.Once`¶