Concurrent Counters — Hands-On Tasks¶
A graded set of exercises. Each is a small, self-contained Go project. Estimated time in parentheses.
Junior Tasks¶
Task J1: The broken example (15 min)¶
Save and run the following:
package main
import (
"fmt"
"sync"
)
func main() {
var count int
var wg sync.WaitGroup
for i := 0; i < 1000; i++ {
wg.Add(1)
go func() {
defer wg.Done()
count++
}()
}
wg.Wait()
fmt.Println(count)
}
Run with go run main.go ten times. Note the variation. Then run with go run -race main.go and read the race detector report.
Task J2: Fix with mutex (15 min)¶
Take the broken example. Add a sync.Mutex. Run again and verify the result is always 1000.
Task J3: Fix with atomic (15 min)¶
Take the broken example. Replace int with atomic.Int64 and count++ with count.Add(1). Verify the result is always 1000. Compare performance with the mutex version (eyeball the elapsed time).
Task J4: Build a request counter (30 min)¶
Write an HTTP server with one handler that increments a counter on each request. Add a /metrics handler that returns the count as plain text. Test with curl http://localhost:8080/ and curl http://localhost:8080/metrics.
Task J5: Build a gauge (30 min)¶
Add an "in-flight requests" gauge to your server. Increment on entry; decrement on exit (with defer). Verify it stays bounded under wrk -c 64 http://localhost:8080/.
Task J6: Per-request statistics (30 min)¶
Inside a handler, track the number of database calls made for that request. Use a struct of atomic.Int64 fields stored in the request context. Log the values when the request finishes.
Task J7: Reset and report (30 min)¶
Add a goroutine that every 10 seconds reads the request count via Swap(0) and logs "requests in last 10s: X". Verify it reports correctly under load.
Task J8: Wire up expvar (30 min)¶
Use expvar.NewInt for two counters in your service. Curl /debug/vars and inspect the JSON output.
Middle Tasks¶
Task M1: CAS-based max tracker (30 min)¶
Implement AtomicMax with Observe(int64) and Get() int64. Use a CAS loop. Test with 1000 goroutines submitting random values; verify the result equals the actual max.
Task M2: CAS-based capped counter (30 min)¶
Implement Capped with Inc() bool (returns false if at cap). Use a CAS loop. Test: start 1000 goroutines that each try to Inc; with cap=100, exactly 100 should succeed.
Task M3: Multi-counter snapshot (45 min)¶
Three counters (requests, errors, in-flight). Use atomic.Pointer[Snapshot] to publish a consistent snapshot once per second. Add a /status HTTP handler that prints the current snapshot.
Task M4: expvar.Map for per-route counts (45 min)¶
Add expvar.NewMap("routes") to your server. For each request, routes.Add(r.URL.Path, 1). Curl /debug/vars to see the per-route counts.
Task M5: Basic sharded counter (1 h)¶
Implement a counter with 64 shards, each an atomic.Int64. Inc() picks a shard at random. Get() sums all shards. Benchmark vs single atomic at -cpu=1,4,16.
Task M6: Sliding-window rate (1 h)¶
Implement a counter that reports "requests in the last 60 seconds". Use a ring buffer of 60 per-second counters with a 1-second tick.
Task M7: HTTP handler with full metrics (1 h)¶
Combine: request counter, error counter, in-flight gauge, per-status-code labeled counter. All exposed via expvar.
Task M8: Replace mutex with atomic in existing code (1 h)¶
Find a sync.Mutex-wrapped counter in an open-source Go project (or your own code). Replace with atomic.Int64. Verify behaviour with -race. Submit a PR (or just keep the diff).
Senior Tasks¶
Task S1: Detect false sharing (1 h)¶
Write a benchmark with [64]atomic.Int64 (no padding). Run at -cpu=16. Note the throughput. Now add _ cpu.CacheLinePad between elements. Benchmark again. Note the improvement.
Task S2: Padded sharded counter (1 h)¶
Implement a sharded counter where each cell is a struct with cpu.CacheLinePad before and after the atomic.Int64. Power-of-2 size. Random shard selection via rand.Uint64(). Benchmark vs unpadded.
Task S3: Per-P counter with procPin (2 h)¶
Use //go:linkname to access runtime.procPin and runtime.procUnpin. Implement a per-P counter sized to GOMAXPROCS. Benchmark vs random-shard. Show near-linear scaling.
Task S4: Sloppy counter (1 h)¶
Implement Sloppy with Local accumulators that flush at threshold. Each goroutine in your test should call local := s.Local(1024); defer local.Flush(); for ...{ local.Inc() }. Verify exact correctness post-flush.
Task S5: Diagnose a slow benchmark (1 h)¶
Given the unpadded sharded counter from Task M5, profile it with go test -bench=. -cpu=16 -cpuprofile=cpu.prof and inspect go tool pprof cpu.prof. Identify the hot spot. Confirm by adding padding.
Task S6: LongAdder-style auto-growing counter (3 h)¶
Implement a counter that starts unsharded and grows shards under contention. Track CAS failures; when a threshold is exceeded, install a sharded cell array. Use atomic.Pointer[Cells] for atomic resize.
Task S7: Multi-counter coherent snapshot (2 h)¶
Three sharded counters (requests, errors, in-flight). Publish a coherent snapshot via a publisher goroutine and atomic.Pointer[Snapshot]. The publisher reads each counter, builds the snapshot, swaps the pointer. Readers see consistent values.
Task S8: NUMA-aware shard placement (3 h)¶
If you have access to a multi-socket machine (or can simulate via container CPU restrictions): pin Go workers to specific sockets, allocate per-socket shard arrays, increment locally. Compare to single shared sharded counter.
Professional Tasks¶
Task P1: Build an HDR-backed latency metric (2 h)¶
Use github.com/HdrHistogram/hdrhistogram-go. Implement LatencyMetric with sharded (per-P) HDR histograms. Observe(time.Duration), Quantile(0.99) int64. Wrap in HTTP middleware. Verify p99 reporting on a synthetic workload.
Task P2: Sliding-window HDR histogram (2 h)¶
A ring of HDR histograms, one per minute, covering 5 minutes. Observe(int64) records to the current bucket. Tick() advances and resets the next bucket. Snapshot() returns the merged histogram of the last 5 minutes.
Task P3: Full Prometheus integration (3 h)¶
Take your in-house counter library. Wire it to prometheus/client_golang via CounterFunc and GaugeFunc. Curl /metrics and verify Prometheus-format output. Run a local Prometheus instance and query the metrics.
Task P4: Cardinality-bounded labeled counter (2 h)¶
Implement a labeled counter that refuses new label combinations beyond a configured limit. The "dropped" count is itself a counter. Test by feeding random label values and observing the drop counter rising.
Task P5: Multi-format exposition (3 h)¶
A single counter exposed via expvar (JSON), Prometheus (text), and OpenTelemetry (OTLP). Verify all three produce correct, equivalent outputs.
Task P6: SLO budget tracker (3 h)¶
Two counters (requests, errors). Implement a tracker that computes "budget remaining" over a 30-day window. Wire to an alert that fires when budget is consumed too fast.
Task P7: Counter telemetry (2 h)¶
Add counters about your counters: metrics_inc_total, metrics_dropped_total, metrics_registry_size. Expose them; alert on rapid growth.
Task P8: A production-grade metrics subsystem (5+ h)¶
Combine all of the above into a single internal library. Document it. Use it in a real service (yours or a sandbox). Operate it for at least a week. Iterate based on what operators tell you.
Bonus Long-Form Tasks¶
Task B1: Read and summarise sync.Pool source¶
Read runtime/sync.go and sync/pool.go. Write a 500-word summary of how sync.Pool uses per-P shards. Identify the patterns that map to per-P counters.
Task B2: Read and summarise expvar source¶
Read expvar.go. Write a 500-word summary of how the JSON exposition works. Replicate the design in your own minimal "expvar-like" package.
Task B3: Read hdrhistogram-go source¶
Read the package. Identify the bucket-index calculation. Write a function that, given a value and HDR parameters, returns the bucket index, replicating the library's logic.
Task B4: Read Java's LongAdder and translate to Go¶
Read OpenJDK's LongAdder.java and Striped64.java. Port the algorithm to Go. Benchmark vs your fixed-shard counter.
Task B5: Build a custom metric backend¶
Write a tiny "metric backend" — a process that listens for OTLP/Prometheus pushes/scrapes and stores metrics in memory. Visualise with a simple HTML/JS dashboard. Drives home how the backend uses the metrics.
Setup Requirements¶
All tasks require:
- Go 1.19 or newer
- A Linux or Mac dev machine
- For senior tasks:
perf(Linux only) - For Prometheus tasks: a local Prometheus install
- For HDR tasks:
go get github.com/HdrHistogram/hdrhistogram-go
Approach¶
Do tasks in order within a level; pick a level that matches your current skill. Skip tasks that feel too easy; revisit ones that feel too hard. The order is roughly progressive but not strictly sequential.
For each task, ship working code. Test with -race. Benchmark where the task calls for it. Commit to a public repo (or local). Reflect on what you learned.
Time Budget¶
- Junior: 3-4 hours total
- Middle: 6-8 hours total
- Senior: 10-15 hours total
- Professional: 20+ hours total
Spread over weeks if needed. Counters are a deep topic; rushing produces shallow understanding.
Verification¶
Each task should result in:
- Code that compiles
- Tests (with
-race) - A benchmark where relevant
- A short README describing what you learned
After finishing a level, you should be able to answer the corresponding interview questions confidently.
End. Build well.
Extended Tasks: Counter Lab Notebook¶
For each task above, keep a lab notebook entry. Sample format:
Task: J1 - The broken example
Date: 2026-03-15
Time spent: 25 min
What I did:
- Ran the code 10 times. Outputs: 1000, 992, 1000, 987, 1000, 994, 1000, 998, 1000, 990
- Ran with -race; got immediate WARNING: DATA RACE report
- Line numbers in the race report point to count++ inside the goroutine
What I learned:
- count++ is non-atomic across goroutines
- The race detector reports the racy access locations precisely
- Even when "lucky" (1000 result), the program is still wrong
Surprises:
- Most runs gave 1000; the variance is small enough that without -race
you might not notice in casual testing
Doing this for every task builds:
- A personal reference you can return to
- Evidence of deliberate practice
- A portfolio for job interviews
Extended Tasks: Counter Code Reviews¶
For each task above, also do a code review of your own work after a week. Look for:
- Race conditions you missed
- API choices you regret
- Performance you can improve
- Tests you should add
Self-review after a delay reveals what you have actually internalised vs what was passing-knowledge.
Extended Tasks: Pair Programming Variations¶
Some tasks are better with a partner:
Pair Task PP1: Race-detect-driven development¶
One partner writes a counter with a deliberate race. The other partner runs -race and identifies the race. Swap. Iterate. Builds intuition for what -race catches.
Pair Task PP2: Performance regression hunting¶
One partner makes a small change to a counter implementation. The other partner runs benchmarks and identifies whether throughput improved or regressed. Swap. Iterate. Builds intuition for what helps and what doesn't.
Pair Task PP3: API design debate¶
One partner argues for Counter returning the new value from Inc(). The other argues for void return. Debate trade-offs. Switch. Reach consensus. Builds API-design discipline.
Pair Task PP4: Postmortem writing¶
Given an outage scenario (e.g., "counter cardinality bombed and Prometheus OOMed"), one partner plays the on-call engineer; the other plays the investigator. Write the postmortem together. Builds operational mindset.
Extended Tasks: Reading Assignments¶
Companion reading for each level:
Junior reading¶
- Go memory model: https://go.dev/ref/mem
sync/atomicdocs- "What is a data race?" — official Go article
Middle reading¶
expvarsourcesync.Mapsource- Prometheus client_golang README
Senior reading¶
- Java
LongAddersource sync.Poolsource- Linux
percpu_counter.c - Gil Tene "How Not to Measure Latency" (video)
Professional reading¶
- HdrHistogram-go source
- OpenTelemetry metrics SDK source
- Google SRE Book (chapters on monitoring)
- "Site Reliability Workbook" practical examples
Spend 1-2 hours per level on reading. The patterns will start to repeat; that is the goal.
Extended Tasks: Teaching as Mastery¶
The final mastery test: teach the topic to someone else.
Teaching Task T1: Whiteboard the broken counter¶
Stand at a whiteboard. Explain to a colleague: "Why does count++ give wrong answers from many goroutines?" Use no notes. Watch their face for confusion; clarify.
Teaching Task T2: Walk through atomic.Int64¶
Open Go source for sync/atomic. Show your audience: the type definition, the methods, the asm files. Explain how Add becomes a hardware instruction.
Teaching Task T3: Demo false sharing¶
Set up a live benchmark showing throughput collapse with naive sharding, then recovery with padding. Watch your audience's eyebrows rise.
Teaching Task T4: Architect the metrics subsystem¶
Whiteboard the full observability subsystem from this series. Discuss trade-offs. Field questions. The questions will reveal gaps in your own understanding.
Teaching Task T5: Write a blog post¶
Public-facing teaching. Choose a counter topic; write 1500-2000 words; publish. Engage with comments. Refine for future versions.
If you can teach the topic clearly to a colleague, you have mastered it.
Extended Tasks: Production Verification¶
For the brave: deploy your counter work to production.
Production Task PR1: Replace a counter¶
Find a counter in your team's production code. Replace it with your improved version (padded, sharded, etc.). Verify in monitoring that it behaves correctly. Measure performance impact.
Production Task PR2: Add a metric and alert¶
Add a new counter to a production service. Wire it to a Grafana dashboard. Wire to an alert. Document the runbook. Wait for the alert to fire (hopefully not!) and iterate.
Production Task PR3: Conduct a counter audit¶
Pick a service. List every counter. Audit each (cardinality, naming, exposition, alerting). Write a report. Propose fixes. Implement them.
Production Task PR4: Mentor a colleague¶
Pair with a more junior colleague on a counter task. Watch them work; offer guidance only when needed. Reflect on what was hard for them.
Production tasks are the highest-fidelity verification of your counter knowledge.
Final Word on Tasks¶
Knowledge without practice is fragile. The tasks here turn the documentation into capability. Do them. Reflect. Iterate. Teach.
Counters are a deep topic; mastery takes years. Start now.
End.
Appendix: Solution Hints¶
Brief hints (not full solutions) for the trickier tasks.
J3 hint¶
Use var count atomic.Int64 and count.Add(1). The print at the end becomes count.Load().
J7 hint¶
A goroutine launched in main reading via Swap(0) every 10 seconds. Use time.NewTicker(10 * time.Second).
M1 hint¶
CAS loop pattern:
M2 hint¶
CAS loop with cap check before the swap.
M3 hint¶
atomic.Pointer[Snapshot]. Publisher reads all three counters, builds snapshot, calls Store.
M5 hint¶
[64]atomic.Int64 array. Shard by rand.IntN(64) or by hash of goroutine identity.
S1 hint¶
Compare benchmarks of [64]atomic.Int64{} vs [64]struct{_ cpu.CacheLinePad; v atomic.Int64; _ cpu.CacheLinePad}{}.
S3 hint¶
Use insideInc to pick the shard index. S4 hint¶
Local accumulator that flushes when local.n >= local.flushAt. Always pair with defer local.Flush().
S6 hint¶
Atomic.Pointer[[]Cell] for the cell array; CAS to detect contention; mutex to coordinate growth.
P1 hint¶
Per-P shards of hdrhistogram.New(1, 60_000_000_000, 3), each protected by its own mutex. Merge at quantile-read time.
P5 hint¶
Use expvar.Func, prometheus.CounterFunc, and OpenTelemetry ObservableCounter — each reads the same underlying atomic.Int64.
P6 hint¶
rate(errors[30d]) / rate(requests[30d]) in PromQL; compare to 1 - target.
The full solutions are in the various deep dives in the senior and professional files. Refer back when stuck.
Appendix: How to Grade Yourself¶
After each task:
- Pass: code works, tests pass with
-race, you can explain it. - Partial: code works but you cannot explain a part of it. Revisit the docs.
- Fail: code does not work. Find the bug; understand it before moving on.
Honest self-grading is essential. Self-deception leads to false confidence.
Appendix: The Tasks as Curriculum¶
The tasks are designed to be completed in order within a level, with optional skipping. A team's onboarding might use this as a curriculum:
- Week 1: Junior tasks
- Week 2-3: Middle tasks
- Week 4-6: Senior tasks
- Week 7-10: Professional tasks
- Ongoing: production tasks and teaching
Use the tasks for new-hire ramp-up. Use them for skill-up sessions. Use them for self-study.
Final end.