When to Use a Pool — Hands-On Tasks¶

18 exercises. Some are coding, some are analysis, some are design. Each builds judgment for pool decisions.

Task 1: Benchmark all three libraries¶

Implement the same workload — process 100k tasks, each doing a 100 μs CPU burst — with raw goroutines, errgroup, ants, tunny, and workerpool. Use go test -bench. Report:

Total wall time.
CPU time used.
Allocations per task.
Peak heap.

Compare the numbers. Explain the differences.

Expected outcome. ants should win on small CPU tasks at high count due to lower per-task overhead and zero allocations. Differences shrink as per-task work grows.

Task 2: Build a decision matrix¶

For your team's services (pick 3-5), build a decision matrix. For each service:

What is the workload (RPS, latency, bound)?
What pool tool is currently used?
Is it the right tool?
If not, what should it be?

Write up your findings. Compare with teammates' assessments.

Task 3: Migrate a service from ants to errgroup¶

Pick a service in your codebase that uses ants for a fan-out without measurable benefit. Rewrite the fan-out using errgroup.SetLimit. Measure:

Lines of code change.
Throughput before/after.
p99 latency before/after.
Dependency count change.

If errgroup is equivalent or better, merge. If worse, document why and keep ants.

Task 4: Implement a custom priority pool¶

Build a 100-line pool that supports two priority levels. Submit takes a priority. High-priority tasks run before low-priority. Test under a workload with both.

Skeleton:

type PriorityPool struct {
    high chan func()
    low  chan func()
    wg   sync.WaitGroup
}

func New(workers, queueSize int) *PriorityPool {
    p := &PriorityPool{
        high: make(chan func(), queueSize),
        low:  make(chan func(), queueSize),
    }
    for i := 0; i < workers; i++ {
        p.wg.Add(1)
        go p.work()
    }
    return p
}

func (p *PriorityPool) Submit(priority int, task func()) error {
    if priority > 0 {
        select { case p.high <- task: return nil; default: return ErrFull }
    }
    select { case p.low <- task: return nil; default: return ErrFull }
}

func (p *PriorityPool) work() {
    defer p.wg.Done()
    for {
        select {
        case t, ok := <-p.high:
            if !ok { return }
            t()
        default:
            select {
            case t, ok := <-p.high:
                if !ok { return }
                t()
            case t, ok := <-p.low:
                if !ok { return }
                t()
            }
        }
    }
}

Verify: high-priority tasks run before low when both are pending.

Task 5: Implement a sharded pool¶

Build a pool with N shards. Submit routes to a shard based on hash. Each shard has its own queue and workers. Compare against a single-queue pool under high producer contention.

Expected: sharded pool shows lower contention metric, higher throughput at 100+ producers.

Task 6: Build a pool wrapper with metrics¶

Implement the WrappedPool from professional.md Appendix P1. Verify all metrics are exported correctly with a Prometheus registry. Test that the metrics update during workload.

Task 7: Load-test a pool¶

Pick a pool in your codebase. Write a load test that submits at increasing rates: 100/sec, 500/sec, 1000/sec, 5000/sec. Plot throughput vs offered load. Find the knee.

Identify the bottleneck at the knee: CPU, memory, downstream, or pool internal lock.

Task 8: Simulate a downstream slowdown¶

Mock a downstream call that takes 10 ms normally but 5 seconds 1% of the time (slow tail). Submit 10k tasks to a pool of K=50. Plot p50, p99, p999 latency.

Compare with K=100, K=500. Find the K that minimises p99.

Lesson: tail latency is dominated by slow tasks. Increasing K helps up to a point.

Task 9: Drain test¶

Write a test that:

Submits 1000 tasks to a pool of K=10.
After 100 tasks complete, sends SIGTERM (or simulates the equivalent).
Drains with a 5-second timeout.
Asserts at least 100 tasks completed; counts unfinished.

Verify that the drain timeout actually limits wait time.

Task 10: Decision document for a new service¶

You are designing a new service: receives webhook callbacks from a payment provider, validates, writes to DB.

Volume: 500/sec average, 5000/sec at settlement runs. DB: 50-connection pool.

Write a one-page design doc covering: - Pool choice. - K rationale. - Backpressure shape. - Failure modes. - Metrics list. - Alerts list.

Use the template from professional.md Appendix P22.

Task 11: Compare tunny vs errgroup for warm state¶

Build a "fake PDF renderer" that:

On construction: sleeps 200ms (simulating font cache load).
On Process: sleeps 50ms (simulating render).

Run 100 renders with: - errgroup.SetLimit(4): each task creates its own renderer. - tunny.NewCallback(4): renderers are per-worker.

Compare total wall time. Verify tunny wins by ~4x.

Task 12: Build a fan-out / fan-in pipeline¶

A 3-stage pipeline: - Stage 1: read JSON files. - Stage 2: transform. - Stage 3: write to a sink.

Use errgroup at each stage with different K. Connect stages via bounded channels.

Verify: cancellation via ctx flows through all stages.

Task 13: Implement a pool admission controller¶

Wrap a pool with an admission controller: before Submit, check per-tenant quota. Reject (with metric) if over quota.

type AdmissionPool struct {
    pool      *ants.Pool
    quotas    map[Tenant]*rate.Limiter
}

func (p *AdmissionPool) Submit(tenant Tenant, task func()) error {
    if !p.quotas[tenant].Allow() {
        return ErrOverQuota
    }
    return p.pool.Submit(task)
}

Test under load with two tenants, one noisy, one quiet. Verify the quiet tenant's throughput is not affected.

Task 14: Implement panic recovery for errgroup¶

Write a helper that wraps an errgroup task with panic recovery:

func wrap(f func() error) func() error {
    return func() (err error) {
        defer func() {
            if r := recover(); r != nil {
                err = fmt.Errorf("panic: %v\n%s", r, debug.Stack())
            }
        }()
        return f()
    }
}

g.Go(wrap(func() error { return work(ctx, x) }))

Test with a task that panics. Verify the errgroup continues, panic converted to error.

Task 15: Profile a pool¶

Pick a service with a pool. Run it under a load test. Collect:

CPU profile (30 seconds).
Goroutine profile.
Heap profile.
Mutex profile.

Open each in go tool pprof. Identify:

Where the pool spends CPU.
How many goroutines exist (and their states).
Heap allocations attributable to pool tasks.
Mutex contention in pool internals.

Write a one-paragraph report on findings.

Task 16: Migrate raw goroutines to a bounded approach¶

Find code that uses raw go f() for fan-out without a bound. Compute the worst-case goroutine count at peak. If it could OOM, migrate to errgroup.SetLimit with appropriate K.

Document the change rationale.

Task 17: Implement a per-tier pool¶

For a SaaS service, implement:

Pool tiers: Premium (K=100), Standard (K=50), Free (K=10).
Submit routes to the right pool based on tenant tier.
Metrics per pool.

Verify: under load, premium tenants get >Standard throughput, > Free.

Task 18: End-to-end exercise¶

Take any moderate-sized Go service (real or hypothetical). Do all of:

Inventory: list every concurrency primitive (raw, errgroup, semaphore, pool).
Assess: for each, is it the right tool for its workload?
Plan: migrate at least one to a better tool (or simplify).
Implement: write the migration PR with benchmark.
Document: ADR for the change.
Operate: add metrics, alerts, runbook entry.

This is the full senior-to-professional loop on one piece of code.

Task Difficulty Map¶

Task	Difficulty	Time
1	Easy	2h
2	Medium	4h
3	Medium	6h
4	Hard	4h
5	Hard	6h
6	Easy	2h
7	Medium	4h
8	Medium	4h
9	Easy	2h
10	Medium	4h
11	Easy	2h
12	Medium	6h
13	Medium	4h
14	Easy	1h
15	Medium	4h
16	Easy	2h
17	Medium	4h
18	Hard	12h

Total: ~75 hours. Don't do all at once. Spread over a few months.

What Each Task Teaches¶

Task 1: Benchmarking discipline. Numbers > opinions. Task 2: Critical assessment. Question existing choices. Task 3: Refactoring with measurement. The most valuable code change is removing. Task 4: Pool design. Understand what a pool actually is. Task 5: Performance optimization. Sharding for less contention. Task 6: Instrumentation. The professional baseline. Task 7: Empirical sizing. Knee of the curve. Task 8: Tail latency thinking. p99 vs p50. Task 9: Lifecycle handling. Drain on shutdown. Task 10: Design documentation. The professional artifact. Task 11: Worker-state thinking. When tunny wins. Task 12: Pipeline design. Multiple stages, multiple bounds. Task 13: Multi-tenant fairness. Admission control. Task 14: Panic recovery. Robust error handling. Task 15: Profiling. Real-world diagnosis. Task 16: Migration. Reducing risk in legacy code. Task 17: Tier-based isolation. Product feature design. Task 18: End-to-end. The professional loop.

Task 19 (Bonus): Build a "self-tuning" pool¶

Wrap an ants pool with a control loop that adjusts K based on utilization. Implement hysteresis (don't flap).

type AutoTunedPool struct {
    pool    *ants.Pool
    minK    int
    maxK    int
    step    int
}

func (p *AutoTunedPool) tune() {
    for range time.Tick(10 * time.Second) {
        util := float64(p.pool.Running()) / float64(p.pool.Cap())
        switch {
        case util > 0.85 && p.pool.Cap() < p.maxK:
            p.pool.Tune(p.pool.Cap() + p.step)
        case util < 0.40 && p.pool.Cap() > p.minK:
            p.pool.Tune(p.pool.Cap() - p.step)
        }
    }
}

Test under varying load: ramp from 100/sec to 5000/sec back to 100/sec. Plot K vs time.

Verify: - K rises during high load. - K falls during low load. - K stays stable when load is steady.

If K oscillates rapidly, add more hysteresis (longer averaging window, larger threshold gap).

Task 20 (Bonus): Write an integration test for backpressure¶

For your service, write a test that:

Starts the service with K=10.
Submits 100 tasks rapidly.
Verifies the producer is blocked (or returns 503, depending on policy).
Lets tasks complete.
Verifies all 100 eventually complete.

This proves backpressure works as designed.

Task 21 (Bonus): Implement a queue depth alert simulation¶

Write a test that simulates the conditions for a "queue depth high" alert. Submit tasks fast enough that queue grows. Check the alert metric fires after the expected duration.

func TestQueueDepthAlert(t *testing.T) {
    pool, _ := ants.NewPool(2)
    defer pool.Release()

    for i := 0; i < 100; i++ {
        pool.Submit(func() {
            time.Sleep(100 * time.Millisecond)
        })
    }

    // Queue should be growing
    time.Sleep(50 * time.Millisecond)
    if pool.Waiting() < 50 {
        t.Errorf("expected queue depth >50, got %d", pool.Waiting())
    }
}

Task 22 (Bonus): Reproduce the "K=0" bug¶

A common production bug: K is sourced from env var, missing env var → K=0 → pool refuses all submissions.

Reproduce with a deliberate test. Validate at construction.

func validateK(k int) error {
    if k <= 0 { return fmt.Errorf("invalid K: %d (must be > 0)", k) }
    return nil
}

Add this check to your team's standard wrapper.

Task 23 (Bonus): Cross-team coordination¶

Pretend you operate three services that share a downstream with a 200-concurrency limit. Each service has 4 replicas.

Compute K per replica for each service such that the cluster total stays under 200.

If services have different priorities (one is critical, two are batch), redistribute.

Document your allocation in a per-downstream config:

downstream: payments-api
limit: 200
allocations:
  service-a:  100  # 50%, critical
  service-b:  60   # 30%, normal
  service-c:  40   # 20%, batch

Each service reads this config; K = allocation / replicas.

Task 24 (Bonus): Write a pool linter¶

Use Go's AST package to scan a codebase for pool-related anti-patterns:

errgroup.WithContext without subsequent SetLimit.
ants.NewPool without defer Release().
pool.Submit(...) where the return error is ignored.

Build a CLI tool that prints findings.

// pool-lint: scans Go files for pool anti-patterns
package main

// ... AST walking similar to Appendix P39

Use the linter in CI for new code.

Task 25 (Bonus): Performance regression test¶

Write a benchmark that fails CI if the pool's Submit cost regresses by >20%:

func BenchmarkPoolSubmit(b *testing.B) {
    pool, _ := ants.NewPool(100)
    defer pool.Release()
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        pool.Submit(func() {})
    }
}

Run baseline. Save the result. CI runs the benchmark on each PR; compares to baseline. Fails if >20% slower.

Catches accidental performance regressions.

Final Mini-Project: Full Service With Pool¶

Build a real (or stub) service from scratch:

HTTP API: POST /process.
Each request creates K sub-tasks (where K is part of request).
Tasks are CPU-bound (e.g., compute SHA-512 of random data).
Service uses ants pool sized for NumCPU.
Service exports all standard pool metrics.
Service handles SIGTERM gracefully.
Service has tests (unit + integration).
Service has a Dockerfile.
Service has Prometheus rules + Grafana dashboard.
Service has a runbook.

This is one week of focused work. End result: a portfolio-quality demonstration of pool engineering.

Task Completion Tracking¶

Make a checklist:

Pick the ones most relevant to your team's needs. Skip the rest.

End of tasks.md.