Wait-for-Empty-Channel — Optimization Scenarios¶
Nine scenarios where polling-based code can be optimized by replacing it with event-driven synchronisation. Each scenario presents the slow version, the fast version, expected performance characteristics, and a brief discussion.
Numbers are indicative of typical results; your workload may vary. Always measure your own.
Scenario 1: Replace Polling Drain with Range¶
Before¶
func drainSlow(ch chan int) {
for len(ch) > 0 {
<-ch
time.Sleep(time.Microsecond) // small delay to avoid pegging CPU
}
}
After¶
Performance¶
| Metric | Polling | Range |
|---|---|---|
| Time to drain 1M items | ~1.5s | ~30ms |
| CPU usage | High (busy poll) | Low (block) |
| Allocation overhead | Same | Same |
Discussion¶
The polling version both races (may miss items added concurrently) and wastes CPU. The range version is correct and uses the scheduler to suspend when no items are available. 50x speedup on drain time, near-zero CPU when idle.
The trick: the producer must close the channel for range to terminate. If your code does not close, that is the actual bug to fix first.
Scenario 2: Replace Polling-Based Wait with WaitGroup¶
Before¶
func processSlow(items []int) []int {
out := make(chan int, len(items))
for _, item := range items {
go func(item int) {
out <- compute(item)
}(item)
}
for len(out) < len(items) {
time.Sleep(time.Millisecond)
}
var result []int
for i := 0; i < len(items); i++ {
result = append(result, <-out)
}
return result
}
After¶
func processFast(items []int) []int {
out := make(chan int)
var wg sync.WaitGroup
wg.Add(len(items))
for _, item := range items {
go func(item int) {
defer wg.Done()
out <- compute(item)
}(item)
}
go func() {
wg.Wait()
close(out)
}()
var result []int
for v := range out {
result = append(result, v)
}
return result
}
Performance¶
| Metric | Polling | WaitGroup |
|---|---|---|
| Latency (P50) | +5 ms (1 poll) | <1ms |
| Latency (P99) | +25 ms | <1ms |
| CPU during wait | 50% of a core | <1% of a core |
| Items lost (race) | 0-3 per call | 0 |
Discussion¶
The polling version's latency tail is the polling interval (here ~10ms). The WaitGroup version exits as soon as the work is done. Both correctness and performance improve.
Scenario 3: Replace Polling Worker Pool with errgroup¶
Before¶
type SlowPool struct {
jobs chan Job
stopped int32
}
func (p *SlowPool) worker() {
for atomic.LoadInt32(&p.stopped) == 0 {
select {
case j := <-p.jobs:
process(j)
default:
time.Sleep(time.Millisecond)
}
}
}
func (p *SlowPool) Stop() {
atomic.StoreInt32(&p.stopped, 1)
for len(p.jobs) > 0 {
time.Sleep(10 * time.Millisecond)
}
}
After¶
type FastPool struct {
jobs chan Job
g *errgroup.Group
ctx context.Context
}
func NewFastPool(parent context.Context, workers int) *FastPool {
g, ctx := errgroup.WithContext(parent)
p := &FastPool{
jobs: make(chan Job),
g: g,
ctx: ctx,
}
for i := 0; i < workers; i++ {
g.Go(p.worker)
}
return p
}
func (p *FastPool) worker() error {
for {
select {
case <-p.ctx.Done():
return nil
case j, ok := <-p.jobs:
if !ok {
return nil
}
process(j)
}
}
}
func (p *FastPool) Stop() error {
close(p.jobs)
return p.g.Wait()
}
Performance¶
| Metric | Slow | Fast |
|---|---|---|
| Throughput (jobs/sec) | 12,000 | 28,000 |
| Worker CPU when idle (per pool) | 8% / core | <0.1% |
| Shutdown time P99 | 3.5s | 25ms |
| Lines of code | ~50 | ~45 |
Discussion¶
The select/default polling in the worker is the bottleneck. Each iteration spins through a no-op default branch, sleeps 1ms, repeats. Even with no work, the pool burns CPU.
The event-driven version uses select without default, so the goroutine parks until an event arrives. CPU goes to zero when idle.
Scenario 4: Replace Polling Shutdown with Bounded Wait¶
Before¶
func (s *Server) ShutdownSlow() {
s.stop()
for s.activeRequests() > 0 {
time.Sleep(100 * time.Millisecond)
}
}
After¶
func (s *Server) ShutdownFast(ctx context.Context) error {
return s.srv.Shutdown(ctx) // stdlib's implementation uses sync.WaitGroup internally
}
Performance¶
| Metric | Slow | Fast |
|---|---|---|
| Shutdown time P99 | 2.5s | 50ms |
| Shutdown time P99.9 | 12s | 200ms |
| Times deadline exceeded | 0.5% of runs | 0 |
Discussion¶
The polling version has a granularity of time.Sleep. Even when all requests complete, the wait persists until the next poll. The stdlib Shutdown returns as soon as the last request finishes.
Scenario 5: Replace Polling Drain with Token-Return Pattern¶
Before¶
type SlowService struct {
inFlight atomic.Int64
}
func (s *SlowService) Drain() {
for s.inFlight.Load() > 0 {
time.Sleep(time.Millisecond)
}
}
After¶
type FastService struct {
tokens chan struct{}
}
func New(max int) *FastService {
s := &FastService{
tokens: make(chan struct{}, max),
}
for i := 0; i < max; i++ {
s.tokens <- struct{}{}
}
return s
}
func (s *FastService) Do(fn func()) {
<-s.tokens
defer func() { s.tokens <- struct{}{} }()
fn()
}
func (s *FastService) Drain() {
for i := 0; i < cap(s.tokens); i++ {
<-s.tokens
}
}
Performance¶
| Metric | Slow | Fast |
|---|---|---|
| Drain time | Up to poll * N | Bounded |
| CPU during drain | 50% of core | <1% |
| Correctness | Racy | Deterministic |
Discussion¶
The polling version is racy: inFlight.Load() == 0 is checked atomically, but between the check and the next operation, new work can arrive (or in-flight work can re-enter). The token-return pattern receives exactly max tokens and is deterministic.
This pattern works for bounded resources (connection pools, rate limiters, semaphore-style limits).
Scenario 6: Replace Polling-Based Backpressure with Channel Backpressure¶
Before¶
func produceSlow(ch chan<- int, source []int) {
for _, v := range source {
for len(ch) > 80 {
time.Sleep(time.Millisecond)
}
ch <- v
}
}
After¶
func produceFast(ctx context.Context, ch chan<- int, source []int) error {
for _, v := range source {
select {
case ch <- v:
case <-ctx.Done():
return ctx.Err()
}
}
return nil
}
Performance¶
| Metric | Slow | Fast |
|---|---|---|
| Throughput | Limited by poll | Native |
| Latency P99 | +10 ms (1 poll) | <100 μs |
| CPU during backpressure | Wasted | None |
Discussion¶
The polling version checks "is there room?" and sleeps if not. The fast version sends and lets the channel block when full. The block is the backpressure. No CPU is wasted on polling; the goroutine is suspended by the scheduler.
For very high throughput pipelines, this single change can yield 2-3x throughput gains.
Scenario 7: Replace Polling-Based Initialization with sync.Once¶
Before¶
var (
initialized int32
config *Config
)
func GetConfig() *Config {
for atomic.LoadInt32(&initialized) == 0 {
time.Sleep(time.Millisecond)
}
return config
}
After¶
var (
configOnce sync.Once
initDone = make(chan struct{})
config *Config
)
func InitConfig() {
configOnce.Do(func() {
config = loadConfig()
close(initDone)
})
}
func GetConfig() *Config {
<-initDone
return config
}
Performance¶
| Metric | Polling | Once+Channel |
|---|---|---|
| First call latency | +0.5 ms | <1 μs |
| Subsequent call latency | +0.5 ms | <50 ns |
| CPU per call | Wasted poll | None |
Discussion¶
The polling version blocks for up to one poll interval per call. The Once-based version returns immediately once initialization completes.
If init failure should be retried, use singleflight instead of Once.
Scenario 8: Replace Polling Queue Depth Check with Metrics Gauge¶
Before¶
func reportDepth(ch chan int) {
for {
if len(ch) > 0 {
metrics.Counter("non-zero").Inc()
}
time.Sleep(time.Millisecond)
}
}
After¶
func reportDepth(ctx context.Context, ch chan int) {
ticker := time.NewTicker(time.Second)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
metrics.Gauge("queue.depth").Set(float64(len(ch)))
}
}
}
Performance¶
| Metric | Polling 1ms | Ticker 1s |
|---|---|---|
| Reads per second | 1000 | 1 |
| CPU cost | 5% of core | Negligible |
| Metric usefulness | Same | Same (sampled) |
Discussion¶
The polling version reads len 1000 times per second to detect non-zero. The ticker version reads once per second and emits a gauge that downstream metrics infra can aggregate. The 1000x reduction in len calls is a meaningful CPU saving.
Even more important: the polling version was wrong (it was control flow, not observability); the ticker version is right (it is observability).
Scenario 9: Replace Polling Inter-Service Health Check with Server-Sent Events¶
Before¶
func waitForReadiness(addr string) {
for {
resp, err := http.Get(addr + "/health")
if err == nil && resp.StatusCode == 200 {
return
}
time.Sleep(time.Second)
}
}
After¶
If the service supports it, use Server-Sent Events or a long-poll endpoint:
func waitForReadiness(ctx context.Context, addr string) error {
req, _ := http.NewRequestWithContext(ctx, "GET", addr+"/wait-for-ready", nil)
resp, err := http.DefaultClient.Do(req)
if err != nil {
return err
}
defer resp.Body.Close()
return nil // server returns 200 when ready
}
Server-side:
http.HandleFunc("/wait-for-ready", func(w http.ResponseWriter, r *http.Request) {
select {
case <-readiness:
w.WriteHeader(200)
case <-r.Context().Done():
// client cancelled
case <-time.After(60 * time.Second):
w.WriteHeader(504) // long-poll timeout
}
})
Performance¶
| Metric | Polling 1s | Long-poll |
|---|---|---|
| Latency to detect ready | 0-1000ms | <50ms |
| Requests per minute | 60 | 0-1 |
| Server CPU | 60 req/min | 1 req |
Discussion¶
The polling version sends a request every second. The long-poll version holds one open request that completes when the event happens. Lower latency and dramatically less load on both sides.
Server-Sent Events are similar but for continuous streams; long-poll is right for one-shot "is it ready?" semantics.
Closing Summary¶
Across nine scenarios:
| Scenario | Throughput change | Latency change | CPU change |
|---|---|---|---|
| 1. Polling drain → range | 50x | -1.5s | -99% |
| 2. Polling wait → WG | n/a | -25ms P99 | -98% |
| 3. Polling pool → errgroup | 2.3x | -3.5s P99 shutdown | -98% |
| 4. Polling shutdown → Shutdown | n/a | -2.5s P99 | -95% |
| 5. Polling drain → token | n/a | Bounded | -95% |
| 6. Polling backpressure → block | 2-3x | -10ms P99 | -100% |
| 7. Polling init → Once | n/a | -0.5ms | -100% |
| 8. Polling depth → gauge | n/a | n/a (informational) | -99% |
| 9. Polling health → long-poll | n/a | -500ms P99 | -98% |
The pattern is consistent: replacing polling with event-driven primitives yields 50-100% CPU savings, large latency improvements (especially at the tail), and often throughput gains. The cost: a few lines of code change per instance.
This is the optimization payoff. Multiplied across an entire codebase, it is measurable in cloud bills, customer experience, and incident frequency.