Concurrent Counters — Junior Level¶
Table of Contents¶
- Introduction
- Prerequisites
- Glossary
- Core Concepts
- Real-World Analogies
- Mental Models
- Pros & Cons
- Use Cases
- Code Examples
- Coding Patterns
- Clean Code
- Product Use / Feature
- Error Handling
- Security Considerations
- Performance Tips
- Best Practices
- Edge Cases & Pitfalls
- Common Mistakes
- Common Misconceptions
- Tricky Points
- Test
- Tricky Questions
- Cheat Sheet
- Self-Assessment Checklist
- Summary
- What You Can Build
- Further Reading
- Related Topics
- Diagrams & Visual Aids
Introduction¶
Focus: "Why does
count++from many goroutines give the wrong number? What is the simplest correct counter?"
You have probably written this Go program at least once in your life, or something just like it:
package main
import (
"fmt"
"sync"
)
func main() {
var count int
var wg sync.WaitGroup
for i := 0; i < 1000; i++ {
wg.Add(1)
go func() {
defer wg.Done()
count++
}()
}
wg.Wait()
fmt.Println(count)
}
You expect 1000. You get 987, or 942, or sometimes 1000. Run it again — a different answer. Run it under go run -race main.go and the Go race detector prints a long, scary report.
This file is about why that program is broken, and the smallest amount of theory and code you need to fix it. By the end you will:
- Know what a race condition is, and why
count++is the textbook example - Use
sync.Mutexto protect a counter (the safe, obvious solution) - Know about
sync/atomicand useatomic.AddInt64/atomic.Int64for a faster, simpler solution - Understand the difference between monotonic counters (only go up) and gauge counters (go up and down)
- Recognise the
intvsint64choice and why atomic operations require a fixed-width type - Know the three ways an
intincrement can go wrong concurrently: lost updates, torn reads, and visibility delays - Be able to write a simple
RequestCountmetric that survives 100 concurrent goroutines hammering it - Know when to reach for a counter and when to reach for a richer primitive
You do not need to know about cache lines, false sharing, sharded counters, sloppy counters, HDR histograms, or expvar yet. Those come at the middle, senior, and professional levels. This file is about the moment you accept that "one number, many writers" needs help.
Prerequisites¶
- Required: A Go installation, version 1.19 or newer. Versions before 1.19 do not have the new
atomic.Int64/atomic.Uint64typed wrappers — only the olderatomic.AddInt64(&x, 1)functions. Both work, but the newer API is friendlier. - Required: Comfort writing a
mainfunction and usinggo func(). You should be able to run the broken example above by yourself. - Required: Familiarity with
sync.WaitGroup. You will use it in nearly every example here to make sure the program does not exit before all the increments finish. - Helpful: Awareness that modern computers run instructions out of order, and that a value written by one CPU core may sit in a cache for some time before another core sees it. You do not need to understand cache coherence yet.
- Helpful: Some experience with
-race(the Go race detector). If you have never rungo test -race, try it once before continuing.
If go run hello.go works on your machine, and you can already write var wg sync.WaitGroup; wg.Add(1); go func() { defer wg.Done() }(); wg.Wait() from memory, you are ready.
Glossary¶
| Term | Definition |
|---|---|
| Counter | A number that goes up (and sometimes down) to record how many times something has happened. The simplest concurrent primitive in the world, and one of the trickiest to make fast and correct. |
| Monotonic counter | A counter that only ever increases. Examples: request count, bytes received, errors, total bytes written to disk. Decreasing a monotonic counter is almost always a bug. |
| Gauge | A number that goes both up and down. Examples: in-flight requests, queue depth, open connections. Gauges need both increment and decrement; monotonic counters do not. |
| Race condition | When the result of a program depends on the relative timing of operations from multiple goroutines. The classic example is two goroutines doing count++ and one of the increments being lost. |
| Lost update | The specific failure mode of count++ under concurrency: two goroutines read the same old value, both add one, and both write back the same new value. One increment effectively vanishes. |
| Torn read | Reading a value while it is being written, getting half the old bytes and half the new bytes. On 64-bit CPUs reading a 64-bit aligned int64 is normally atomic, but on 32-bit CPUs it is not — and the language spec does not guarantee atomicity for plain int64 even on 64-bit. |
sync.Mutex | A mutual exclusion lock. Wraps a critical section so only one goroutine at a time may execute it. Slow for short operations like count++, but always correct. |
sync/atomic | A package of functions and types that perform single-word memory operations in a way the CPU guarantees to be atomic — no other thread can see a partial state. |
atomic.Int64 | A typed wrapper around an atomically-accessed int64, introduced in Go 1.19. Has Load, Store, Add, CompareAndSwap, Swap methods. The modern way to write a simple atomic counter. |
atomic.AddInt64 | The older package-level function that adds a delta to a *int64. Still works, still common in older code. Slightly easier to misuse because it takes a pointer to a plain int64. |
| Critical section | A region of code that must execute atomically with respect to other goroutines. With a mutex, the critical section is between Lock() and Unlock(). With an atomic, the critical section is a single machine instruction. |
| Race detector | The -race flag for go run and go test. Instruments memory accesses and reports any race it observes during the run. It does not catch every race, but it catches almost every real one if your test is even a little adversarial. |
Core Concepts¶
Why count++ is not safe¶
count++ looks atomic. In English it is a single verb: "increment". In machine code it is three operations:
- Read the current value of
countfrom memory into a CPU register - Add one to the register
- Write the new value back to memory
In Go's runtime, even on a single CPU, the scheduler may pause a goroutine between any of these steps. On multiple CPUs, two goroutines may execute step 1 at the same instant, both reading the same old value. Both then add one to it. Both write the same new value back. The counter has gone up by one, not by two. One increment is lost.
Concretely, imagine count starts at 5 and two goroutines both run count++:
Goroutine A Goroutine B
read count -> 5
read count -> 5
add 1 -> 6
add 1 -> 6
write 6 to count
write 6 to count
End result: count == 6, not 7. The longer the program runs and the more goroutines you spawn, the more increments are silently dropped.
This is called a lost update or, more generally, a read–modify–write race.
Three ways the wrong thing can happen¶
When multiple goroutines touch the same variable without synchronisation, three distinct problems can occur. All three apply to counters:
- Lost updates: described above. Two writers step on each other's read-modify-write.
- Torn reads / writes: a reader sees half the old bytes and half the new bytes. This is rare on a 64-bit CPU for a 64-bit aligned value but is not guaranteed by the Go memory model. On 32-bit ARM, writing an
int64is two 32-bit stores and a concurrent reader can absolutely see a torn value. - Stale visibility: even after writer A has written, reader B may not see the new value for some unbounded amount of time, because the write sits in A's CPU cache. Without a memory barrier (which atomic operations provide), there is no guarantee about when B will see the update.
A correct concurrent counter must address all three.
Solution 1: sync.Mutex¶
The simplest fix: wrap the increment in a mutex.
type SafeCounter struct {
mu sync.Mutex
count int64
}
func (c *SafeCounter) Inc() {
c.mu.Lock()
c.count++
c.mu.Unlock()
}
func (c *SafeCounter) Get() int64 {
c.mu.Lock()
defer c.mu.Unlock()
return c.count
}
This is correct under any concurrency, on any architecture. It is also slow for such a small operation — a mutex involves at least one atomic instruction itself, plus the bookkeeping of holding and releasing the lock, plus potential goroutine sleeps if the lock is contended.
For a counter that is hit a million times a second from 64 goroutines, a mutex is the wrong choice. But it is always a correct first step. Make it work, then make it fast.
Solution 2: sync/atomic¶
The standard library gives us a much faster primitive: atomic operations. On any modern CPU, an atomic add on a 64-bit aligned address is a single hardware instruction (LOCK XADD on x86, LDADD on ARMv8.1, or a load-linked / store-conditional loop on older ARM). It is faster than a mutex by an order of magnitude.
The modern, type-safe API since Go 1.19:
import "sync/atomic"
type AtomicCounter struct {
count atomic.Int64
}
func (c *AtomicCounter) Inc() {
c.count.Add(1)
}
func (c *AtomicCounter) Get() int64 {
return c.count.Load()
}
atomic.Int64 is a struct that wraps an int64 and gives it atomic methods. Its zero value is a usable counter at zero. It is safe to copy only before any goroutine has touched it; once it is in use, you must pass it by pointer.
The older, still-supported API:
type AtomicCounter struct {
count int64
}
func (c *AtomicCounter) Inc() {
atomic.AddInt64(&c.count, 1)
}
func (c *AtomicCounter) Get() int64 {
return atomic.LoadInt64(&c.count)
}
Both are correct. The new API is preferred for new code because it makes the contract visible at the type level — you cannot accidentally write c.count++ next to atomic.LoadInt64(&c.count) and get a silent race.
Monotonic vs gauge¶
A counter that only goes up (request count, bytes received) needs only Add(1) and Load(). A counter that can also go down (in-flight requests, queue depth) needs both Add(+1) and Add(-1). atomic.Int64 supports both with a single method:
Decrementing an atomic.Uint64 requires a small trick: you cannot pass -1 to an unsigned add. Use Add(^uint64(0)) (the two's-complement of one) or, more readably, choose atomic.Int64 if you need decrement.
What you do not yet need¶
You do not need to think about:
- Cache lines and false sharing (senior level)
- Per-CPU sharding (senior level)
- HDR histograms (professional level)
expvarand metric registration (middle level)atomic.Valueoratomic.Pointer(different topic)
For everything you will write as a junior — a request counter for an HTTP handler, an error tally for a worker pool, a "bytes processed" meter for a log shipper — atomic.Int64 and atomic.Uint64 are the right answer.
Real-World Analogies¶
A scoreboard at a sports stadium¶
Imagine a stadium scoreboard. Many referees can call "add one point". If two referees press the "+1" button on a board that simply reads the current number, adds one, and writes it back, two simultaneous presses might only increase the score by one.
A mutex is like saying: "only one referee may touch the scoreboard at a time; everyone else queues up." Correct, but slow.
An atomic operation is like a special button that atomically increments the displayed number — the hardware itself guarantees both presses count, regardless of timing.
Hotel front-desk room-count¶
Multiple receptionists check guests in and out. The hotel has a "current occupied rooms" gauge. If receptionist A reads "47", receptionist B reads "47", both check in a guest, and both write "48", one guest is invisible to the count.
A shared variable with a lock is one receptionist with a master ledger that everyone passes around. An atomic gauge is an electronic counter that all receptionists trigger but which itself runs in hardware.
Newsroom ticker tape¶
In old newsrooms a single "wire ticker" produced sentences character by character. If two operators tried to add words at the same moment, the tape became gibberish. The fix was either a shared editor (mutex) or a special device that accepted whole words at a time and serialised them internally (atomic).
Mental Models¶
The read-modify-write cycle¶
Every counter increment is a read-modify-write cycle. Either:
- No one else may read or write while my cycle runs (mutex), or
- The entire cycle is one indivisible hardware step (atomic).
There is no third option. Anything else is a race.
"Atomic" means "no one sees the in-between"¶
It does not mean fast. It does not mean lock-free in the abstract sense. It means: from the perspective of every other goroutine, the operation either has not happened or has fully happened. There is no observable partial state.
Counter, gauge, and the difference¶
| Pattern | What it measures | Direction | Examples |
|---|---|---|---|
| Counter | A monotonic total | Up only | requests_total, bytes_received_total |
| Gauge | A snapshot value | Up and down | inflight_requests, queue_depth |
| Distribution | A set of observations | N/A | request latency, response size |
For a junior, the first two are everything. Distributions (histograms, summaries) are a senior/professional concern.
Pros & Cons¶
sync.Mutex counter¶
Pros - Simple to reason about; no atomic-memory subtleties - Works for any operation, not just a single increment (e.g. count++; lastEvent = time.Now() together) - Works on every architecture without alignment worries - Easy to extend — add a field, no API change
Cons - Slower for hot single-counter increments (often 5-20× slower than a single atomic) - Lock contention scales poorly: 64 cores hammering one mutex is a disaster - Holding a lock for any unrelated logic blocks other increments
sync/atomic counter¶
Pros - Very fast — one CPU instruction in the uncontended case - No lock to forget to release - The modern atomic.Int64 type makes intent clear in the type system
Cons - Each method call is its own atomic; you cannot atomically do "load, compute, store" without CompareAndSwap or a mutex - Still scales poorly under heavy contention because the cache line ping-pongs between cores (senior topic: false sharing) - Requires 64-bit alignment of the value (the typed wrappers handle this; the raw int64+package-level functions do not on 32-bit ARM)
Plain int with no protection¶
Pros - None worth mentioning
Cons - Races, lost updates, torn reads, and the Go race detector hates it - The compiler is free to optimise it in surprising ways because it sees no synchronisation
Use Cases¶
HTTP request counter¶
Every web framework has one. Each request increments a counter; an admin endpoint or /metrics page exposes the total.
var requestsTotal atomic.Int64
func handler(w http.ResponseWriter, r *http.Request) {
requestsTotal.Add(1)
// ... handle the request
}
Error count per worker¶
A worker pool that processes jobs. Whenever a job fails, increment an error counter. After the pool finishes, log the count.
type Pool struct {
errors atomic.Int64
}
func (p *Pool) runJob(j Job) {
if err := j.Do(); err != nil {
p.errors.Add(1)
}
}
Bytes-processed meter¶
A log shipper that reads bytes off a TCP connection. Each goroutine periodically adds its share to a global meter.
In-flight requests gauge¶
For a server that wants to gracefully drain on shutdown, an atomic.Int64 of in-flight requests is exactly the right tool. Increment on entry, decrement on exit, refuse new requests when set to "draining", wait for the gauge to reach zero, then exit.
"Test instrumentation"¶
Tests often want to assert "the callback was invoked N times". An atomic counter is the easiest way:
var called atomic.Int64
client.OnEvent = func() { called.Add(1) }
// ... do work
if got := called.Load(); got != 3 {
t.Errorf("expected 3 calls, got %d", got)
}
A "we have started" sentinel¶
Sometimes you do not need a count but a flag. Atomics work here too:
var started atomic.Bool
if started.CompareAndSwap(false, true) {
// first arrival; run startup logic
}
Code Examples¶
The broken program, fully spelled out¶
package main
import (
"fmt"
"sync"
)
func main() {
var count int
var wg sync.WaitGroup
for i := 0; i < 1000; i++ {
wg.Add(1)
go func() {
defer wg.Done()
count++
}()
}
wg.Wait()
fmt.Println("count =", count) // probably not 1000
}
Run it. Run it ten times. Note the variation. Then run it with go run -race main.go and notice the screaming.
Fix 1: mutex¶
package main
import (
"fmt"
"sync"
)
func main() {
var (
mu sync.Mutex
count int
)
var wg sync.WaitGroup
for i := 0; i < 1000; i++ {
wg.Add(1)
go func() {
defer wg.Done()
mu.Lock()
count++
mu.Unlock()
}()
}
wg.Wait()
fmt.Println("count =", count) // always 1000
}
Fix 2: atomic, modern API¶
package main
import (
"fmt"
"sync"
"sync/atomic"
)
func main() {
var count atomic.Int64
var wg sync.WaitGroup
for i := 0; i < 1000; i++ {
wg.Add(1)
go func() {
defer wg.Done()
count.Add(1)
}()
}
wg.Wait()
fmt.Println("count =", count.Load()) // always 1000
}
Fix 3: atomic, older API¶
package main
import (
"fmt"
"sync"
"sync/atomic"
)
func main() {
var count int64
var wg sync.WaitGroup
for i := 0; i < 1000; i++ {
wg.Add(1)
go func() {
defer wg.Done()
atomic.AddInt64(&count, 1)
}()
}
wg.Wait()
fmt.Println("count =", atomic.LoadInt64(&count)) // always 1000
}
Note: this version is safe on 64-bit platforms but on 32-bit ARM it requires count to be 64-bit aligned. The typed atomic.Int64 handles alignment for you.
A reusable counter type¶
Most codebases wrap their counter in a small named type with a clear API:
package metrics
import "sync/atomic"
// Counter is a monotonically-increasing 64-bit value safe for
// concurrent use from any number of goroutines.
type Counter struct {
v atomic.Int64
}
// Inc increments the counter by one and returns the new value.
func (c *Counter) Inc() int64 {
return c.v.Add(1)
}
// Add adds delta to the counter and returns the new value.
// For a monotonic counter, delta should be non-negative; for
// a gauge use the Gauge type instead.
func (c *Counter) Add(delta int64) int64 {
return c.v.Add(delta)
}
// Get returns the current value.
func (c *Counter) Get() int64 {
return c.v.Load()
}
// Reset sets the counter back to zero and returns the previous
// value. Useful for periodic flushing.
func (c *Counter) Reset() int64 {
return c.v.Swap(0)
}
package metrics
import "sync/atomic"
// Gauge is a 64-bit value that may increase or decrease.
type Gauge struct {
v atomic.Int64
}
func (g *Gauge) Inc() { g.v.Add(1) }
func (g *Gauge) Dec() { g.v.Add(-1) }
func (g *Gauge) Add(d int64) { g.v.Add(d) }
func (g *Gauge) Set(v int64) { g.v.Store(v) }
func (g *Gauge) Get() int64 { return g.v.Load() }
In-flight request guard¶
type Server struct {
inflight atomic.Int64
drain atomic.Bool
}
func (s *Server) Handle(w http.ResponseWriter, r *http.Request) {
if s.drain.Load() {
http.Error(w, "shutting down", http.StatusServiceUnavailable)
return
}
s.inflight.Add(1)
defer s.inflight.Add(-1)
// ... real handler
}
func (s *Server) Shutdown(ctx context.Context) error {
s.drain.Store(true)
for {
if s.inflight.Load() == 0 {
return nil
}
select {
case <-ctx.Done():
return ctx.Err()
case <-time.After(50 * time.Millisecond):
}
}
}
Counter that gets snapshotted periodically¶
func report(c *Counter, interval time.Duration, stop <-chan struct{}) {
ticker := time.NewTicker(interval)
defer ticker.Stop()
for {
select {
case <-stop:
return
case <-ticker.C:
n := c.Reset()
log.Printf("requests in last interval: %d", n)
}
}
}
Note the use of Swap(0) (via Reset) instead of Load() + Store(0). Doing those two operations separately would lose any increments that occur between them.
Coding Patterns¶
Pattern: defer-decrement for gauges¶
If you forget defer, a panic between the two lines will leak the increment forever. Always pair with defer.
Pattern: take a snapshot, do not poll¶
If you need to log or expose a counter, do one Load() and use that value for everything in the line:
// Bad: two reads, value may change between them
log.Printf("requests=%d, percent_of_max=%.2f", c.Get(), float64(c.Get())/float64(max))
// Good: one read
n := c.Get()
log.Printf("requests=%d, percent_of_max=%.2f", n, float64(n)/float64(max))
Pattern: counter per category¶
If you want to count kinds of errors, use one counter per kind, not a map[string]*Counter keyed by string — map access is not safe for concurrent writes, and locking the map defeats the purpose.
For dynamic keys you would use sync.Map or a sync.Mutex over a map; that is a middle-level pattern.
Pattern: counter as method receiver¶
Always make counter methods on a pointer receiver. Copying a counter loses safety:
// Wrong: value receiver. Each call sees its own copy.
func (c Counter) Inc() { c.v.Add(1) }
// Right: pointer receiver. All callers share the same atomic.
func (c *Counter) Inc() { c.v.Add(1) }
Pattern: never vet-suppress an atomic value¶
go vet flags any copy of atomic.Int64. Do not silence the warning — fix the code by passing a *Counter instead of a Counter.
Clean Code¶
Name the unit¶
requestsTotal, bytesProcessed, inflightRequests. Avoid count, total, n — they tell the reader nothing about what is being counted or in what unit.
Suffix _total for monotonic counters¶
A widely-adopted convention (Prometheus, OpenMetrics) is to suffix monotonically-increasing counters with _total: http_requests_total, bytes_received_total. The suffix signals intent.
Suffix nothing for gauges¶
Gauges name the thing, not the action: queue_depth, inflight_requests, goroutines. No _total.
Wrap atomics in a named type¶
A bare atomic.Int64 field in a struct invites mistakes. A named Counter type with methods is harder to misuse and clearer at call sites.
Document the unit in the type or the field comment¶
// BytesSent counts bytes written to the wire, including headers
// and TLS overhead. It is monotonically increasing.
type BytesSent struct{ v atomic.Int64 }
Product Use / Feature¶
Rate-limiting¶
A request counter that resets every second, combined with a check if c.Get() > limit { reject() }, is the simplest token bucket. The senior version uses sliding windows and is more accurate.
Health-check signal¶
A counter that records "last successful operation" timestamp is a one-line liveness probe. Use atomic.Int64 to store unix nanos and compare against time.Now().UnixNano().
Backpressure¶
if inflight.Load() > maxInflight { return Busy } is the textbook backpressure mechanism. The counter is the only thing standing between you and an overload.
Per-feature analytics¶
Counters per feature flag, per request type, per cache hit/miss — every observability story starts with counters.
Error Handling¶
Atomic operations never return an error. They cannot fail. The contract is: the operation either happens entirely or has not yet been issued; there is no in-between failure mode.
This is one reason atomics are nice — there is no error to forget to handle. But it also means atomics cannot tell you anything went wrong. If Inc() is called from a goroutine that panics afterwards, your gauge will be one too high forever. That is why gauges should always use defer.
For recoverable errors involving counters, the typical pattern is:
Use this sparingly. It is correct under any concurrency, but if you find yourself "undoing" counters often, you might want CompareAndSwap-based admission control instead — a middle-level topic.
Security Considerations¶
Counter overflow¶
int64 overflows after about 9.2 × 10^18 increments. At a billion increments per second, that is ~292 years. Practically irrelevant for monotonic counters, but watch out for delta arithmetic: subtracting two large uint64 values can underflow if the order is wrong.
Side-channel leaks¶
Counters exposed to untrusted clients (e.g. /metrics endpoints) can leak information about request volumes, errors, and internal state. Always authenticate /metrics.
Counter-based authorisation is almost always wrong¶
Logic like "the third request from this IP is privileged" using an atomic counter has subtle races and should be reviewed carefully. Prefer explicit per-session state.
expvar exposes counters by default¶
By importing expvar, you implicitly register /debug/vars on http.DefaultServeMux. Do not expose DefaultServeMux to the public internet without thinking about what it leaks. (More on this at the middle and professional levels.)
Performance Tips¶
These are all you need at junior level. The deeper performance work belongs to the senior file.
- Prefer
sync/atomicoversync.Mutexfor single-word increments. Atomic is typically 5–20× faster. - Use the typed
atomic.Int64API in new code. It avoids the alignment trap of the old API on 32-bit ARM and reads better. - Do not allocate a new counter per request. Allocate once, reuse across all goroutines.
- Do not read a counter inside a tight inner loop if you do not have to. A
Load()is cheap but not free; cache it locally if you reuse it many times. - Bench before you tune. A 5 ns difference per increment may be irrelevant if your handler runs in 5 ms.
Best Practices¶
- Always use a pointer receiver for methods on a struct containing an atomic.
- Always pair
Add(+1)withdefer Add(-1)for gauges. - Use
Swap(0)to reset, neverLoad() + Store(0)— the latter loses increments. - Name counters with units (
bytes,requests,_total). - Wrap the atomic in a named type rather than exposing the field directly.
- Run your tests with
-race. It will catch the day-one mistakes for free. - Prefer
atomic.Int64overatomic.Uint64unless you genuinely need unsigned semantics and never need to decrement. - Document the monotonicity in a comment near the field.
Edge Cases & Pitfalls¶
Copying a struct that contains an atomic¶
type Counter struct{ v atomic.Int64 }
a := Counter{}
b := a // copy! b.v is now an independent atomic with value 0
a.v.Add(1)
fmt.Println(b.v.Load()) // 0, not 1
go vet warns. Listen to it. Always pass *Counter, never Counter by value.
Counter inside a value-receiver method¶
The receiver is a copy. The increment goes into a value about to be thrown away. Use *Counter.
32-bit ARM alignment¶
On 32-bit ARM, calling atomic.AddInt64(&someField, 1) requires the field to be 64-bit aligned. If your struct layout puts the int64 at an odd offset, the program panics. The typed atomic.Int64 handles this automatically by embedding alignment-forcing padding. The package-level functions do not. The standard fix: put the int64 first in the struct, or migrate to atomic.Int64.
Mixing atomic and non-atomic access¶
Once a variable is written atomically by anyone, every read must also be atomic. The race detector will catch this.
Counters in maps¶
A map[string]*Counter is fine if the map itself never changes (build it once at startup). A map[string]*Counter you write to concurrently is not safe — the map is not goroutine-safe even if the pointer values are.
Load() then act-on-the-value¶
Two goroutines may both see > 0, both decrement, and double-spend the slot. Use CompareAndSwap (middle level) or a mutex.
Forgetting to start at zero¶
The zero value of atomic.Int64 is a usable counter at zero. You do not need an explicit constructor. Some teams write one anyway for symmetry with other types — that is fine but not required.
atomic.Uint64 decrement¶
Surprising and ugly. If you need decrement, use atomic.Int64.
Common Mistakes¶
- Reading without
Load. Once a variable is touched atomically anywhere, every read must beLoad. - Writing without
Store. Likewise for every write. - Holding a mutex around an atomic. Redundant and slow.
- Copying the struct. Loses safety, loses identity.
- Adding
_totalto a gauge. Confuses operators reading metrics. - Using
intinstead ofint64and then passing&intValuetoatomic.AddInt64. Compile error —intandint64are different types in Go. - Treating an atomic as a transaction. Two atomic operations are not atomic together. Use
CompareAndSwapor a mutex. - Forgetting the race detector. It is the cheapest tool you will ever own.
Common Misconceptions¶
- "Atomic means lock-free." They are related but not the same. All single-instruction atomics are lock-free, but a lock-free data structure is usually much more than one atomic.
- "Atomics are slow." Compared to a mutex they are fast. Compared to a plain memory write they cost a CPU pipeline stall. Both are true.
- "On a single core I do not need atomics." Wrong. The Go scheduler can preempt a goroutine in the middle of a read-modify-write even on one core. Atomics are still required.
- "
int64reads and writes are atomic on x86-64." Reads of an alignedint64are atomic at the hardware level on x86-64, but the Go memory model does not guarantee this for plain assignments. The compiler is free to introduce intermediate writes or reorder. Useatomic.LoadInt64/atomic.StoreInt64. - "
sync.Oncesolves my counter problem." It runs once. A counter runs many times. Different tool.
Tricky Points¶
atomic.Int64{} is zero-initialized¶
You do not need to call a constructor. The zero value works.
atomic.Int64.Add returns the new value¶
This avoids a second Load() after the add and is also race-free with respect to other increments.
atomic.Int64.Swap returns the previous value¶
Useful for "report and reset" patterns:
atomic.Int64.CompareAndSwap returns a bool¶
Returns true if the value was 0 and is now 1; false otherwise. The simplest building block for higher-level concurrent algorithms.
Load/Store are not "free"¶
They are cheaper than Add but they still cross the memory barrier on most architectures. In a billion-iteration inner loop you may be able to cache a counter's value locally and Add the total at the end. That is a middle-level optimisation — but worth being aware of.
Negative gauges are a bug signal¶
If you ever see a gauge go below zero, somebody decremented without incrementing. Add if g.Add(-1) < 0 { log.Panic("gauge underflow") } in tests.
Test¶
package metrics
import (
"sync"
"testing"
)
func TestCounter_Inc(t *testing.T) {
var c Counter
const N = 10_000
var wg sync.WaitGroup
wg.Add(N)
for i := 0; i < N; i++ {
go func() {
defer wg.Done()
c.Inc()
}()
}
wg.Wait()
if got := c.Get(); got != N {
t.Errorf("expected %d, got %d", N, got)
}
}
func TestGauge_AddDec(t *testing.T) {
var g Gauge
const N = 1_000
var wg sync.WaitGroup
wg.Add(N * 2)
for i := 0; i < N; i++ {
go func() { defer wg.Done(); g.Inc() }()
go func() { defer wg.Done(); g.Dec() }()
}
wg.Wait()
if got := g.Get(); got != 0 {
t.Errorf("expected 0, got %d", got)
}
}
func TestCounter_Reset(t *testing.T) {
var c Counter
c.Add(42)
if prev := c.Reset(); prev != 42 {
t.Errorf("expected swap to return 42, got %d", prev)
}
if got := c.Get(); got != 0 {
t.Errorf("expected reset to 0, got %d", got)
}
}
Run with -race:
If you see any DATA RACE report from this, your atomics are not actually atomic — go check that you used Load/Add/Store everywhere.
Tricky Questions¶
Q: Is count++ ever safe across goroutines? A: Only if exactly one goroutine ever writes count and no other goroutine reads it. As soon as a second writer or any concurrent reader exists, it is a race.
Q: Why does count++ sometimes give the right answer in a small test? A: With few goroutines and a fast machine, the OS scheduler may serialise them. The race is latent. Run it under -race, or with millions of iterations, and it will surface.
Q: Does sync.Mutex protect against torn reads? A: Yes — every reader and writer holds the lock, so no one ever sees an intermediate state.
Q: Does atomic.Int64 protect a struct containing an int64? A: No. It protects only the single int64. A struct with multiple fields needs a mutex or a atomic.Pointer[Snapshot] swap pattern.
Q: Can I use atomic.Int64 on a 32-bit system? A: Yes. The typed wrapper enforces alignment. The old atomic.AddInt64(&x, 1) against a plain int64 field may panic if the field is not 64-bit aligned.
Q: What is the cost of an atomic add on modern x86-64? A: Roughly 10–20 ns when no contention, rising sharply with the number of contending cores because the cache line bounces.
Q: Does atomic.Int64.Load() require a write barrier? A: It requires whatever the platform needs to guarantee sequential consistency for that load. On x86-64 it is usually a plain MOV (the architecture is strong enough). On ARM it is a load-acquire instruction.
Q: Are expvar counters atomic? A: Yes. expvar.Int.Add(int64) uses atomic.AddInt64 internally. You learn that at the middle level.
Cheat Sheet¶
import "sync/atomic"
// declare
var c atomic.Int64
// increment
c.Add(1)
// decrement (only for gauges; never on monotonic counters)
c.Add(-1)
// read
v := c.Load()
// reset (atomic exchange to zero, returns previous)
prev := c.Swap(0)
// "first arrival" — set to 1 only if currently 0
first := c.CompareAndSwap(0, 1)
// old API (still works, still common)
var raw int64
atomic.AddInt64(&raw, 1)
v := atomic.LoadInt64(&raw)
atomic.StoreInt64(&raw, 0)
When in doubt, choose:
| Need | Type |
|---|---|
| Counter that only goes up | atomic.Int64 |
| Gauge that goes up and down | atomic.Int64 |
| Unsigned semantics, never decrement | atomic.Uint64 |
| Two related fields | sync.Mutex |
| A whole snapshot replaced atomically | atomic.Pointer[T] |
| Boolean flag | atomic.Bool |
Self-Assessment Checklist¶
- I can explain in one sentence why
count++is unsafe across goroutines - I can fix it with
sync.Mutex - I can fix it with
atomic.Int64 - I know the difference between a monotonic counter and a gauge
- I know to use
defer Add(-1)for gauges - I know to use
Swap(0)to reset, notLoad()+Store(0) - I know that copying a struct containing an atomic is a bug
go vetwill flag - I run my tests with
-race
Summary¶
A concurrent counter looks like the most boring possible exercise in concurrency and turns out to teach the central lessons of the whole field: read-modify-write atomicity, visibility, and the cost of synchronisation. The junior toolkit is:
sync.Mutexwhen in doubt — always correct, sometimes slowsync/atomic'satomic.Int64/atomic.Uint64for fast single-counter increments- Distinguish monotonic counters from gauges; name them accordingly
- Always pair gauge increments with
deferdecrements - Run the race detector
When you reach the middle level you will add CompareAndSwap loops, expvar integration, and start thinking about the multi-counter coordination problem. When you reach the senior level you will discover that even atomic.Int64.Add(1) is too slow under heavy contention, and you will learn about cache lines, false sharing, sharded counters, and sloppy counters. When you reach the professional level you will deal with HDR histograms, percentile-preserving merges, NUMA effects, and the design of an observability subsystem.
But it all starts with count++ being wrong, and atomic.Int64 being a one-line fix.
What You Can Build¶
With only the junior-level material you can confidently build:
- A request counter for an HTTP handler
- A bytes-processed meter for a copy loop
- An in-flight gauge for a server that needs to drain on shutdown
- A "first arrival wins" sentinel using
CompareAndSwap - A periodic "report and reset" worker
- A test assertion that "callback X was invoked exactly N times"
- A basic admission controller: refuse work when gauge exceeds threshold
- A simple per-error-type tally for an aggregating worker
Further Reading¶
- The Go Memory Model: https://go.dev/ref/mem
sync/atomicpackage docs: https://pkg.go.dev/sync/atomicsync.Mutexpackage docs: https://pkg.go.dev/sync#Mutex- "What is a data race?" — the official guide: https://go.dev/doc/articles/race_detector
- "The Cost of Mutexes vs Atomics" — a classic blog series; search for benchmarks
- Russ Cox, "Hardware Memory Models" series on research.swtch.com
Related Topics¶
- Goroutines (start, lifecycle, leaks)
sync.Mutex,sync.RWMutex,sync.WaitGroupsync/atomic(this file's other API surfaces:Bool,Pointer[T],Value)- The Go race detector
- Memory ordering and the Go memory model
- The Prometheus client library (which builds on these primitives)
expvar(middle level)- Sharded counters and
LongAdder(senior level) - HDR histograms (professional level)
Diagrams & Visual Aids¶
The lost-update timeline¶
time →
G1: read(5) -------- +1 -------- write(6)
G2: read(5) -------- +1 -------- write(6)
result: count == 6, not 7
Mutex vs atomic vs raw¶
+--------+----------+-----------------+-------------------+
| Tool | Correct? | Speed (1 core) | Speed (16 cores) |
+--------+----------+-----------------+-------------------+
| ++ | NO | "fast" | "fast" but wrong |
| Mutex | yes | ~50 ns / op | ~500 ns / op |
| atomic | yes | ~5 ns / op | ~150 ns / op |
+--------+----------+-----------------+-------------------+
(Numbers are illustrative; measure on your own hardware.)
Counter, gauge, distribution¶
Counter (monotonic) Gauge (snapshot) Distribution (histogram)
↑ ↑↓ _.--._
| /\ / \
| _/ \_ _/ \_
|________ ________ ______________
Add returns the new value, not the old¶
Swap returns the old value¶
That is everything you need to start. Build the broken example, watch it fail, fix it both ways, and move on to the middle level when you are ready to think about contention.
Deeper Dive: Why count++ is Three Instructions¶
Let us look at what the Go compiler actually produces for count++ where count is an int64. You can ask for the assembly with:
For an unsynchronised count++ on x86-64 you typically see something like:
MOVQ "".count(SB), AX ; read count into register AX
INCQ AX ; AX = AX + 1
MOVQ AX, "".count(SB) ; write AX back to count
Three instructions. Between any two of them another thread on another core could read and write the same memory location. That is the lost update.
Now look at the assembly for atomic.AddInt64(&count, 1) on x86-64:
A single instruction, prefixed with LOCK. The LOCK prefix tells the CPU: "for the duration of this instruction, I have exclusive access to the cache line containing this memory address; no other core may interleave." That is what atomic buys you.
On ARMv8 the equivalent is LDADD (a single instruction since ARMv8.1). On older ARM cores it is a LDREX/STREX (load-exclusive / store-exclusive) loop:
loop:
LDREX r0, [count]
ADD r0, r0, #1
STREX r1, r0, [count]
CBNZ r1, loop ; retry if the exclusive bit was cleared
The CPU monitors whether anyone else touched that cache line between the LDREX and the STREX; if they did, the STREX fails and the whole sequence retries. This is called optimistic concurrency at the hardware level. It is the foundation of all lock-free programming.
When you call atomic.Int64.Add(1), Go's sync/atomic package compiles to whichever of these instruction sequences is best for your CPU. You never see it — but you should know it is there.
Deeper Dive: The Race Detector Walkthrough¶
Let us run the broken program under -race and read the output. Save this as bad_count.go:
package main
import (
"fmt"
"sync"
)
func main() {
var count int
var wg sync.WaitGroup
for i := 0; i < 1000; i++ {
wg.Add(1)
go func() {
defer wg.Done()
count++
}()
}
wg.Wait()
fmt.Println(count)
}
Run it:
$ go run -race bad_count.go
==================
WARNING: DATA RACE
Read at 0x00c00001a0a0 by goroutine 8:
main.main.func1()
/tmp/bad_count.go:13 +0x44
Previous write at 0x00c00001a0a0 by goroutine 7:
main.main.func1()
/tmp/bad_count.go:13 +0x55
Goroutine 8 (running) created at:
main.main()
/tmp/bad_count.go:11 +0xb0
Goroutine 7 (finished) created at:
main.main()
/tmp/bad_count.go:11 +0xb0
==================
997
Found 1 data race(s)
exit status 66
Three things to notice:
- The line numbers point to
count++. That is the racy access. - Both goroutines were created on line 11. The race detector tells you exactly which
gostatement spawned the racers. - The program still finished and printed
997. A race does not necessarily crash; it silently produces wrong answers. That is what makes races so dangerous in production.
The exit status 66 is what -race uses to signal "races found". Your CI should treat that as a failure.
Now fix it and re-run:
Clean exit, correct answer. The race detector is the single most valuable tool in your concurrency toolkit. Use it in every test run, every CI build, every time you touch a shared variable.
Deeper Dive: When the Race Detector Misses Races¶
The race detector observes accesses that actually happened during the run. It is sound — every race it reports is a real race — but it is not complete: it can miss races that happen to not occur during the particular execution it observed.
Concrete example:
If the race detector runs and the Println happens before the goroutine wakes up, no concurrent access has occurred yet and no race is reported. The race is latent.
Mitigations:
- Run tests many times under
-race:go test -race -count=100 - Use stress-testing tools like
golang.org/x/tools/cmd/stress - Write tests with
runtime.Gosched()injections at the suspect points - Trust the principle, not just the detector: if two goroutines touch the same memory, one of them must use synchronisation. The detector confirms; it does not prove.
Deeper Dive: atomic.Int64 vs Older Functions, Side-by-Side¶
// Old API
var x int64
atomic.AddInt64(&x, 1)
v := atomic.LoadInt64(&x)
atomic.StoreInt64(&x, 0)
old, swapped := atomic.CompareAndSwapInt64(&x, 0, 1), true
prev := atomic.SwapInt64(&x, 99)
// New API (Go 1.19+)
var x atomic.Int64
x.Add(1)
v := x.Load()
x.Store(0)
swapped := x.CompareAndSwap(0, 1)
prev := x.Swap(99)
Differences:
| Aspect | Old | New |
|---|---|---|
| Field declaration | int64 | atomic.Int64 |
| Alignment on 32-bit | manual | automatic |
Easy to misuse with x++ | yes | no (no ++ defined on the type) |
Easy to misuse with x directly | yes | impossible — methods only |
go vet copy detection | no | yes |
| Available since | Go 1.0 | Go 1.19 |
| Source readability | function calls | method chains |
For new code, always use the new API. For old code, leave it alone unless you are refactoring nearby.
Deeper Dive: Counters and the Memory Model¶
The Go memory model (see https://go.dev/ref/mem) says, in part:
The APIs in the
sync/atomicpackage are collectively "atomic operations" that can be used to synchronise the execution of different goroutines. If the effect of an atomic operation A is observed by atomic operation B, then A is synchronised before B.
In plain English: every Load is guaranteed to see every Store/Add that finished before it, in a consistent global order. There is no "I saw 42 in this goroutine but 41 in that one at the same nanosecond" — atomics are sequentially consistent in Go.
This is a strong guarantee. C and C++ let you choose weaker orderings (relaxed, acquire, release) for performance. Go has decided: every atomic is fully ordered. That is one fewer footgun for you. The trade-off is a tiny bit of unnecessary work on weakly-ordered architectures (ARM, POWER) — Go emits stronger fences than absolutely necessary so that you do not have to think about it.
Practically, the only thing you need to remember:
- If goroutine A
Adds and goroutine B subsequentlyLoads, B sees the new value. - If goroutine A writes a non-atomic variable and then
Stores to an atomic, goroutine B thatLoads the atomic and then reads the non-atomic variable sees A's write to the non-atomic variable too.
The last point is the "release/acquire" pattern: Store releases prior writes, Load acquires them. It lets you build more complex structures (one of which is the sharded counter at the senior level).
Deeper Dive: When a Counter is the Wrong Tool¶
Sometimes a junior engineer reaches for a counter when something else would serve better. Examples:
"How many goroutines are alive?"¶
Use runtime.NumGoroutine(). It is already a counter inside the runtime.
"Has this work been done?"¶
Use sync.Once — it solves "exactly once" execution. A counter compared against 1 is a worse version of this.
"Notify me when N tasks are done"¶
Use sync.WaitGroup. It is a counter, but with built-in blocking and panic-on-misuse.
"What is the rate of requests per second?"¶
A counter alone is not enough — you also need time. Combine a counter with a periodic sampler, or use a sliding-window structure (middle/senior level).
"How long did each request take?"¶
A counter cannot answer this. You need a distribution: a histogram or a summary. Histograms come at the professional level.
"What was the value at time T?"¶
A bare counter loses history. If you need history, log the value over time or use a time-series database.
"What is the maximum we ever reached?"¶
You need a MaxOf operation. With atomics:
type AtomicMax struct{ v atomic.Int64 }
func (m *AtomicMax) Observe(x int64) {
for {
cur := m.v.Load()
if x <= cur {
return
}
if m.v.CompareAndSwap(cur, x) {
return
}
}
}
That CompareAndSwap loop is your first taste of lock-free programming. We will see many more at the senior level.
Deeper Dive: Counter Naming Conventions Across Frameworks¶
Different ecosystems have different conventions. Junior code often mixes them; senior code picks one and sticks to it.
| Ecosystem | Counter style | Gauge style |
|---|---|---|
| Prometheus / OpenMetrics | http_requests_total{code="200"} | inflight_requests |
| StatsD | app.requests.count | app.connections.active |
| OpenTelemetry | http.server.request.count | http.server.active_requests |
| Datadog | app.requests (rate-derived) | app.queue.depth |
| Honeycomb | event-based, no counters per se | derived from event count |
If your service emits to Prometheus, name your Go variable in a way that matches the metric name: httpRequestsTotal for the metric http_requests_total. The visual match helps grepping production alerts back to code.
Deeper Dive: Two Common Counter Idioms¶
Idiom 1: counter with derived rate¶
type RateCounter struct {
count atomic.Int64
start time.Time
}
func NewRateCounter() *RateCounter {
return &RateCounter{start: time.Now()}
}
func (r *RateCounter) Inc() { r.count.Add(1) }
func (r *RateCounter) PerSecond() float64 {
n := r.count.Load()
elapsed := time.Since(r.start).Seconds()
if elapsed == 0 {
return 0
}
return float64(n) / elapsed
}
Beware: the rate is the average since start, which is rarely what operators want. They want the rate over the last minute. Use a sliding-window structure for that (senior topic).
Idiom 2: counter with periodic flush¶
type FlushCounter struct {
cur atomic.Int64
sink func(int64)
}
func (f *FlushCounter) Inc() { f.cur.Add(1) }
func (f *FlushCounter) RunFlusher(interval time.Duration, stop <-chan struct{}) {
t := time.NewTicker(interval)
defer t.Stop()
for {
select {
case <-stop:
return
case <-t.C:
f.sink(f.cur.Swap(0))
}
}
}
The Swap(0) reads and resets in one atomic step. Reads done between the read and the reset would lose increments; the swap prevents that.
Deeper Dive: Test Your Understanding With Variations¶
Variation 1: Make the broken program more broken¶
for i := 0; i < 1000; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for j := 0; j < 1000; j++ {
count++
}
}()
}
Now you have a million increments split across 1000 goroutines. The error becomes far more visible — typically you will see a final count in the low hundreds of thousands instead of 1,000,000. The losses scale with contention.
Variation 2: Add runtime.Gosched() between read and write¶
Now you have explicitly opened the race window. The final count will be close to the number of distinct schedule points, which on a 16-core machine might be very small indeed.
Variation 3: Add a single mutex around a single increment¶
You should always get 1000, even with millions of iterations. But notice the wall-clock time grows enormously — the lock becomes the bottleneck. Compare to atomics, which scale much better.
Variation 4: One goroutine increments, one reads¶
go func() {
for i := 0; i < 1_000_000; i++ {
count++
}
}()
go func() {
for {
if count >= 1_000_000 {
break
}
}
}()
The reader may never see 1_000_000 because the writer's updates may stay in its cache forever. With -race, this is flagged immediately. Even without -race, on weakly-ordered architectures (ARM) the reader can deadlock. The fix is the same: atomic.
Deeper Dive: When int32 Is Enough¶
If your counter genuinely fits in 31 bits (a counter that resets every second and counts at most a few million events), atomic.Int32 is a bit cheaper on 32-bit ARM and identical on x86-64. It also gives you slightly clearer intent: "this never gets big."
Most counters do not benefit. atomic.Int64 is the default for a reason — int64 math is free on 64-bit architectures, overflow is functionally impossible for normal workloads, and you do not have to think about it.
Deeper Dive: Counter vs Generation Number¶
A related pattern: the "generation number" or "version stamp". Each time some structure changes, bump an atomic counter. Readers compare before/after to detect concurrent modification:
type SeqCounter struct{ g atomic.Uint64 }
func (s *SeqCounter) Bump() uint64 { return s.g.Add(1) }
func (s *SeqCounter) Current() uint64 { return s.g.Load() }
This is the foundation of seqlocks (a senior topic) and of many lock-free read paths. The atomic counter is the primitive; the seqlock is the algorithm built on top.
Deeper Dive: Counters in Standard Library Code¶
If you grep the Go standard library for atomic.Int64, you find dozens of places where the runtime uses exactly the pattern this file teaches. Some examples:
net/http:Server.inFlightis an atomic counter of in-flight requests.runtime/metrics: many internal counters expose stop-the-world durations, GC pause counts, and goroutine counts.database/sql:DB.numClosedand various connection-state counters.expvar: everyexpvar.Intis anatomic.Int64underneath.sync/atomictests: every micro-benchmark for atomic ops is itself a counter.
Reading any of these is a quick way to internalise the idioms.
Deeper Dive: Counters and Context Cancellation¶
A subtle pattern: when a request is cancelled, do you decrement the in-flight gauge?
func (s *Server) Handle(w http.ResponseWriter, r *http.Request) {
s.inflight.Add(1)
defer s.inflight.Add(-1) // always
select {
case <-r.Context().Done():
return
case res := <-s.process(r):
write(w, res)
}
}
defer runs whether the context cancels or not. The gauge stays consistent. If you put the decrement only in the success path, cancellation will leak the gauge upward forever — a very common production bug.
Deeper Dive: Counters in Benchmark Code¶
When you write a Go benchmark, b.N is itself a counter the framework manages. Inside the benchmark, your counters should be reset between iterations:
func BenchmarkCounter(b *testing.B) {
var c atomic.Int64
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
c.Add(1)
}
})
if c.Load() != int64(b.N) {
b.Fatalf("expected %d, got %d", b.N, c.Load())
}
}
Run with: go test -bench=BenchmarkCounter -cpu=1,4,16 to see how the same counter scales with parallelism. Spoiler: it does not, very much, and the senior file explains why.
Final Word for Juniors¶
Memorise these three lines and you have 90% of what you need:
Use -race. Always.
That is it. Build something. The middle file will be waiting when contention starts to bite.
Extended Walkthrough: Building a Real Request Counter Step-by-Step¶
Let us walk through building a request counter for a real HTTP server, from naive to safe to fast, watching each version's behaviour in detail.
Step 1: The naive version¶
package main
import (
"fmt"
"net/http"
)
var requests int
func handler(w http.ResponseWriter, r *http.Request) {
requests++
fmt.Fprintf(w, "request %d", requests)
}
func metrics(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "total = %d", requests)
}
func main() {
http.HandleFunc("/", handler)
http.HandleFunc("/metrics", metrics)
http.ListenAndServe(":8080", nil)
}
Hammer it with wrk -c 64 -t 8 -d 10s http://localhost:8080/. Then call /metrics. The number you see will be lower than the actual number of requests served, because the increments are racing with each other. The Go race detector run (go run -race main.go) will be screaming.
Step 2: Add a mutex¶
var (
mu sync.Mutex
requests int
)
func handler(w http.ResponseWriter, r *http.Request) {
mu.Lock()
requests++
cur := requests
mu.Unlock()
fmt.Fprintf(w, "request %d", cur)
}
func metrics(w http.ResponseWriter, r *http.Request) {
mu.Lock()
cur := requests
mu.Unlock()
fmt.Fprintf(w, "total = %d", cur)
}
Notice the patterns:
- Read into a local under the lock, then release the lock, then format the response. Holding a lock during
fmt.Fprintfwould serialise every request on the lock. - The local
curcannot change after the unlock, so we can use it freely.
-race is silent now. But under heavy load you will see lock contention show up in go tool pprof as time spent in sync.(*Mutex).Lock.
Step 3: Switch to atomic¶
var requests atomic.Int64
func handler(w http.ResponseWriter, r *http.Request) {
cur := requests.Add(1)
fmt.Fprintf(w, "request %d", cur)
}
func metrics(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "total = %d", requests.Load())
}
Three lines shorter, much faster, still correct. The Add(1) returns the new value, so you do not need a separate Load.
Step 4: What we have NOT yet solved¶
If your server runs on 64 cores and every request increments the same atomic.Int64, the cache line holding requests ping-pongs between all 64 cores. Throughput collapses as you add cores. The senior file solves this with sharded counters. For most workloads — say, anything under a million RPS per machine — the single atomic is fine. Past that, you need to know about cache lines.
Step 5: Adding a gauge for in-flight requests¶
var (
requests atomic.Int64
inflight atomic.Int64
)
func handler(w http.ResponseWriter, r *http.Request) {
requests.Add(1)
inflight.Add(1)
defer inflight.Add(-1)
// ... do work
}
func metrics(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "total=%d inflight=%d", requests.Load(), inflight.Load())
}
Two counters, two atomic.Int64s, no lock anywhere. This is the standard shape of a modern Go service's metrics block.
Step 6: Per-route counters¶
What if you want a separate counter per route? Tempting:
But map writes are not concurrency-safe. You must protect it:
type Counters struct {
mu sync.RWMutex
routes map[string]*atomic.Int64
}
func (c *Counters) Inc(route string) {
c.mu.RLock()
cnt, ok := c.routes[route]
c.mu.RUnlock()
if ok {
cnt.Add(1)
return
}
c.mu.Lock()
defer c.mu.Unlock()
if cnt, ok := c.routes[route]; ok {
cnt.Add(1)
return
}
cnt = &atomic.Int64{}
c.routes[route] = cnt
cnt.Add(1)
}
The double-checked locking pattern minimises write-lock acquisitions. Or you can use sync.Map:
type Counters struct {
m sync.Map // map[string]*atomic.Int64
}
func (c *Counters) Inc(route string) {
v, ok := c.m.Load(route)
if !ok {
v, _ = c.m.LoadOrStore(route, &atomic.Int64{})
}
v.(*atomic.Int64).Add(1)
}
sync.Map.LoadOrStore returns the existing value if there is one, or stores and returns the new one. It is a textbook atomic compare-and-swap for maps.
For dynamic per-key counters at high cardinality, sync.Map plus atomic.Int64 is the standard idiom.
Step 7: Exposing through expvar¶
The standard library has a built-in metric endpoint at /debug/vars. Counters registered with expvar.NewInt show up there automatically:
import "expvar"
var requests = expvar.NewInt("requests")
var inflight = expvar.NewInt("inflight")
func handler(w http.ResponseWriter, r *http.Request) {
requests.Add(1)
inflight.Add(1)
defer inflight.Add(-1)
}
Now curl http://localhost:8080/debug/vars returns a JSON blob with requests and inflight (and a bunch of runtime metrics for free). This is the cheapest way to add a metrics endpoint to a Go service. The middle file covers expvar in depth.
Extended Walkthrough: Counters in a Worker Pool¶
Picture a worker pool consuming jobs from a channel. You want:
- A counter of jobs processed
- A counter of errors
- A gauge of currently-running jobs
- A counter per error class
Here is the whole thing, idiomatic and correct:
package pool
import (
"context"
"sync"
"sync/atomic"
)
type Stats struct {
Processed atomic.Int64
Errors atomic.Int64
Running atomic.Int64
byClass sync.Map // map[string]*atomic.Int64
}
func (s *Stats) incClass(class string) {
if v, ok := s.byClass.Load(class); ok {
v.(*atomic.Int64).Add(1)
return
}
v, _ := s.byClass.LoadOrStore(class, &atomic.Int64{})
v.(*atomic.Int64).Add(1)
}
func (s *Stats) ClassCount(class string) int64 {
if v, ok := s.byClass.Load(class); ok {
return v.(*atomic.Int64).Load()
}
return 0
}
type Pool struct {
Stats Stats
jobs chan Job
wg sync.WaitGroup
}
type Job interface {
Do(ctx context.Context) error
}
func New(workers int) *Pool {
p := &Pool{jobs: make(chan Job, 1024)}
p.wg.Add(workers)
for i := 0; i < workers; i++ {
go p.worker()
}
return p
}
func (p *Pool) worker() {
defer p.wg.Done()
for j := range p.jobs {
p.Stats.Running.Add(1)
err := j.Do(context.Background())
p.Stats.Running.Add(-1)
p.Stats.Processed.Add(1)
if err != nil {
p.Stats.Errors.Add(1)
p.Stats.incClass(errorClass(err))
}
}
}
func (p *Pool) Submit(j Job) { p.jobs <- j }
func (p *Pool) Close() { close(p.jobs); p.wg.Wait() }
func errorClass(err error) string {
// simplified
return err.Error()
}
Every counter in there is atomic.Int64. No mutex. The only synchronisation is the channel and the sync.Map. This pattern scales linearly to hundreds of workers because nobody is contending on a single lock.
If you needed even more scale, you would shard the counters by worker ID — but the senior file covers that.
Extended Walkthrough: Failing Tests You Can Write¶
Writing a test that demonstrates a race is harder than it sounds, because races are timing-dependent. Here are reliable techniques.
Technique 1: many goroutines, many iterations¶
func TestRaceBait(t *testing.T) {
var count int
var wg sync.WaitGroup
for i := 0; i < 1000; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for j := 0; j < 1000; j++ {
count++
}
}()
}
wg.Wait()
if count != 1_000_000 {
t.Errorf("races dropped %d", 1_000_000-count)
}
}
Run with go test -race. The race detector will fire immediately; the assertion will also fire often, even without -race, because the increments truly are being lost.
Technique 2: a sleep between read and write¶
Now you have a hand-rolled, guaranteed race. Useful for teaching, not for production.
Technique 3: runtime.Gosched() injection¶
runtime.Gosched() is a hint to the scheduler "let other goroutines run". Inserting it between read and write opens the race window without using a sleep.
Technique 4: GODEBUG=gctrace=1¶
Garbage collection pauses can change scheduling and may surface races that otherwise stay hidden. Run with GODEBUG=gctrace=1 to interleave GC events with your code.
Technique 5: the stress tool¶
stress runs your test binary in a loop, in parallel, until it fails. Catches races that hit only one in a thousand runs.
Extended Walkthrough: A Day in the Life of an Increment¶
Let us trace exactly what happens when you call count.Add(1) on a modern multi-core x86-64 machine, starting from the Go source line and ending with the cache coherence protocol updating the L3 cache.
-
Compiler emits an
XADDwith aLOCKprefix. The Go compiler knows(*atomic.Int64).Addand lowers it to one machine instruction. No function call overhead; the call is inlined. -
CPU decode. The instruction enters the CPU pipeline. The decoder sees
LOCK XADDand routes it to the memory subsystem with the lock prefix set. -
Cache line fetch. The address of
countlives in some cache line of L1 or L2. If it is not present, the cache controller issues a request to L2/L3 to bring it in. If another core has the line in Modified state, that core must write back (or transfer) the line first. This is the MESI/MOESI cache coherence protocol in action. -
Cache line lock. The cache controller marks the line as exclusively owned. No other core may read or write it until this instruction completes.
-
Atomic add. The CPU reads the value, adds 1, writes back, and (with
XADD) returns the previous value in the register. All as one indivisible step from any other core's perspective. -
Cache line release. The exclusive lock is released. Other cores can now request the line.
-
Return to your Go code. The result is back in your goroutine's register, ready for the next instruction.
Total latency: 10–50 nanoseconds on a quiet machine, climbing to hundreds of nanoseconds under contention. Throughput per core: tens of millions of ops per second. Across many cores hammering the same address: the cache line ping-pongs and throughput per core falls to a few hundred thousand ops per second.
This is the universe atomic.Int64.Add lives in. It looks like a free function call. It is anything but.
Extended Q&A: Things Juniors Actually Ask¶
Q: Should I prefer atomic.Uint64 for counters because they are never negative?
A: Tempting but no. atomic.Int64 has the same operations plus Add(-1), plus easier-to-read decrement semantics. The negative range is wasted, but in 64 bits you can afford it. Reach for Uint64 only if you genuinely need unsigned arithmetic (e.g., wraparound modular arithmetic for hash mixing).
Q: Why is the expvar.Int type a struct wrapper around int64 instead of just being atomic.Int64?
A: Historical. expvar predates atomic.Int64 by a decade. The wrapper also adds an Add(int64) method that returns nothing (matching the older API style) and a String() method for JSON serialisation. Internally it uses atomic.AddInt64 and atomic.LoadInt64.
Q: Can I use sync.Atomic instead of sync/atomic?
A: No. The package is sync/atomic, accessed as atomic. There is no sync.Atomic symbol.
Q: Does the Go race detector slow my program down?
A: Yes — typically 2–10× slower and 5–10× more memory. That is why you use it in tests and CI, not in production. Modern projects run their entire test suite under -race in CI as a matter of course.
Q: My counter occasionally returns an old value. Bug?
A: If you are using atomic.Int64, no — every Load is sequentially consistent. You may be reading at a slightly different moment than you think. Print timestamps to verify ordering. If you are not using atomic, every read is a coin flip.
Q: I want to count requests per minute. Should I have 60 atomic counters?
A: Not quite. The classic structure is a ring buffer of one counter per second (or per minute), with a current-time index. Increment the slot for "now", report the sum of recent slots. The shard-by-time idea generalises beautifully to sliding windows. Middle/senior topic.
Q: Are atomic ops safe across goroutines on different machines?
A: No. They are CPU-level instructions; they do not cross machine boundaries. For cross-machine counters use a database INCREMENT, Redis INCR, or a dedicated counter service.
Q: Can I atomic.AddInt64(&x, -1) to decrement?
A: Yes — the second argument is an int64 delta, signed. Pass -1. Works the same way on atomic.Int64.Add(-1).
Q: Is atomic.LoadInt64 safer than reading the variable directly because of the memory barrier?
A: Yes — it is the only correct way to read a value that any other goroutine has atomic-written. Direct reads may see stale values (most architectures) or torn values (32-bit ARM).
Q: Should I document my counter as monotonic in the type system?
A: Some teams have a MonotonicCounter type with no Dec or Add(-1) methods to enforce it at compile time. Others rely on convention. Either is fine.
Q: What if I want a counter that wraps at some N?
A: Use CompareAndSwap:
func (c *WrappingCounter) Inc() int64 {
for {
cur := c.v.Load()
next := (cur + 1) % c.modulus
if c.v.CompareAndSwap(cur, next) {
return next
}
}
}
The CAS loop is your first taste of lock-free programming. Middle topic.
Walkthrough: Reading expvar.Int Source¶
Reading well-written standard library source is one of the best ways to absorb idioms. Here is the gist of expvar.Int (paraphrased and shortened):
type Int struct {
i atomic.Int64
}
func (v *Int) Value() int64 { return v.i.Load() }
func (v *Int) String() string {
return strconv.FormatInt(v.i.Load(), 10)
}
func (v *Int) Add(delta int64) { v.i.Add(delta) }
func (v *Int) Set(value int64) { v.i.Store(value) }
That is it. A four-method struct wrapping an atomic.Int64, plus the String() method that makes it serialise as JSON. The whole "expvar counter" abstraction is this.
The other half is the registry: expvar.NewInt(name string) allocates an Int, stores it in a global map keyed by name, and returns the pointer. Then /debug/vars walks the map and serialises everything. Trivial code, enormous operational value. Middle file covers it in depth.
Walkthrough: Reading sync.WaitGroup Source (How Atomic Counters Build Bigger Things)¶
You already know WaitGroup. Look at it from a counter angle:
type WaitGroup struct {
noCopy noCopy
state atomic.Uint64 // high 32 bits: counter; low 32 bits: waiters
sema uint32
}
func (wg *WaitGroup) Add(delta int) {
state := wg.state.Add(uint64(delta) << 32)
v := int32(state >> 32)
w := uint32(state)
if v < 0 {
panic("sync: negative WaitGroup counter")
}
if w != 0 && delta > 0 && v == int32(delta) {
panic("sync: WaitGroup misuse: Add called concurrently with Wait")
}
if v > 0 || w == 0 {
return
}
// last decrement; release all waiters
runtime_Semrelease(&wg.sema, false, 0)
}
Key takeaways:
- A
WaitGroupis, fundamentally, two counters packed into oneatomic.Uint64so they can be updated together atomically. Addis a singleatomic.Uint64.Addplus some inspection of the result.- The whole synchronisation guarantee of
WaitGrouprests on the atomic counter.
When you read standard library code with the question "where are the counters?" in mind, you find them everywhere. They are the load-bearing primitive of concurrent Go.
Closing Thought¶
A counter looks like nothing. A correct counter is a small lesson in CPU architecture, the Go memory model, and operational discipline. A fast counter is a much bigger lesson — it is what the next three files (middle, senior, professional) are about. Master the simple case here and the rest will land easily.
Always:
atomic.Int64for new counters.defer Add(-1)for gauges.-racein every test run.- Pointer receivers.
- Wrapping types.
- Document units.
The middle file picks up with CompareAndSwap, expvar, and the multi-counter coordination problem. See you there.
Appendix A: A Long Side-by-Side Comparison¶
Suppose you have a service that wants to expose requests_total and inflight_requests. Here are five versions of the same code, in increasing sophistication.
Version 1: broken¶
var requests int
var inflight int
func handler(w http.ResponseWriter, r *http.Request) {
requests++
inflight++
defer func() { inflight-- }()
process(r)
}
Three races: requests++, inflight++, inflight--. All silently dropping increments. Never ship.
Version 2: mutex per counter¶
var (
reqMu sync.Mutex
requests int
inMu sync.Mutex
inflight int
)
func handler(w http.ResponseWriter, r *http.Request) {
reqMu.Lock(); requests++; reqMu.Unlock()
inMu.Lock(); inflight++; inMu.Unlock()
defer func() {
inMu.Lock(); inflight--; inMu.Unlock()
}()
process(r)
}
Correct, but verbose. Two locks for two counters. Imagine ten counters: ten locks, twenty Lock/Unlock pairs per request. Tedious and slow.
Version 3: one mutex over both¶
var (
mu sync.Mutex
requests int
inflight int
)
func handler(w http.ResponseWriter, r *http.Request) {
mu.Lock()
requests++
inflight++
mu.Unlock()
defer func() {
mu.Lock(); inflight--; mu.Unlock()
}()
process(r)
}
Correct, fewer locks, but worse contention — every request blocks every other request on the same lock. Acceptable at low RPS, catastrophic at high.
Version 4: atomics¶
var (
requests atomic.Int64
inflight atomic.Int64
)
func handler(w http.ResponseWriter, r *http.Request) {
requests.Add(1)
inflight.Add(1)
defer inflight.Add(-1)
process(r)
}
Correct, fast, zero lock contention, three lines, easy to read.
Version 5: expvar¶
var (
requests = expvar.NewInt("requests")
inflight = expvar.NewInt("inflight")
)
func handler(w http.ResponseWriter, r *http.Request) {
requests.Add(1)
inflight.Add(1)
defer inflight.Add(-1)
process(r)
}
Same speed as version 4, plus you get a free /debug/vars endpoint. This is the production-ready, idiomatic Go shape. Use it.
Appendix B: A Long List of Counter Names That Tell Operators What They Mean¶
Names matter. A counter called errors tells the operator nothing. Here are names that do.
| Field name | Meaning | Unit | Direction |
|---|---|---|---|
requests_total | All requests served | requests | up only |
requests_2xx_total | Successful requests | requests | up only |
requests_4xx_total | Client-error requests | requests | up only |
requests_5xx_total | Server-error requests | requests | up only |
inflight_requests | Currently-handling requests | requests | up & down |
bytes_in_total | Bytes received from clients | bytes | up only |
bytes_out_total | Bytes sent to clients | bytes | up only |
connections_accepted_total | Sockets accepted | connections | up only |
connections_closed_total | Sockets closed cleanly | connections | up only |
connections_open | Currently-open sockets | connections | up & down |
db_queries_total | DB queries issued | queries | up only |
db_query_errors_total | DB queries that errored | queries | up only |
cache_hits_total | Cache reads that hit | reads | up only |
cache_misses_total | Cache reads that missed | reads | up only |
goroutines | runtime.NumGoroutine() snapshot | goroutines | up & down |
heap_bytes | runtime.MemStats.HeapAlloc | bytes | up & down |
gc_pause_ns_total | sum of GC pauses | ns | up only |
panics_total | recovered panics | panics | up only |
A junior who internalises this naming convention writes code that operators thank them for.
Appendix C: Counters as Building Blocks for Higher-Level Patterns¶
Once you have a fast atomic counter, you can build:
- Sequence numbers: each call returns a unique, monotonically increasing ID.
- Token buckets: a counter holding "tokens" plus a refill ticker — the simplest rate limiter.
- CountDownLatch: a counter that starts at N, decrements as workers finish, and signals when it hits zero.
- Reference counting: increment when a reference is acquired, decrement when released, free when zero.
- Reader/writer count for a seqlock: writers bump the counter twice (odd before, even after); readers check parity.
- Heartbeat counter: bumped by a background goroutine, read by a watchdog; if it stops changing, something is stuck.
- Lock-free queue index: head and tail counters, with the queue being
slots[index % len]. - GC tick counter: each minor cycle increments; consumers detect epochs by reading the value.
Every one of these is a senior or professional topic in its own right. They all start with the same atomic.Int64.Add(1).
Appendix D: A Long Bug Story¶
This is a true story, slightly anonymised. A service team had a counter called failed_logins. They used int32, no atomic, no mutex. For two years it appeared to work — the numbers were always plausible. Then they added a feature that ran failed_logins through an alert: if the rate exceeded 100 per minute, page the on-call.
The alerts never fired. Reviewers eventually noticed that failed_logins only ever decreased on Mondays. After hours of investigation, the bug was:
- Friday afternoon, traffic spike, lots of failed logins, counter races up to ~5,000.
- Monday morning, a cron job snapshot of the counter recorded the value... but the cron ran with
GODEBUG=cpuid.SSE2=offfor unrelated reasons. Theint32was decremented twice by the snapshot code because of a separate cpu-bug workaround that read-modified-wrote the variable. - The snapshot also reset the counter, but the reset was a plain
failed_logins = 0which raced with new increments, sometimes losing them.
The team replaced the counter with atomic.Int64, added a Swap(0) for snapshots, and added alerts based on Swap's return value. Alerts started firing within an hour. The bug had been invisible for two years because the value of the counter was wrong, not because it crashed.
Moral: a counter that looks plausible is not the same as a counter that is correct. Always atomic. Always tested with -race. Always snapshot with Swap, not Load + Store.
Appendix E: Things You Will Read That Are Subtly Wrong¶
The internet has a lot of Go code that is almost right about counters. Here are warning signs.
"Just use int64, reads and writes are atomic on 64-bit."¶
Half true. On a 64-bit machine, an aligned int64 load compiled to a single MOV is atomic at the hardware level. But:
- The Go compiler does not promise to emit a single
MOV. It may split or reorder. - On 32-bit ARM, an
int64is two 32-bit accesses. - The race detector will flag it.
- Even if no torn read occurs, there is no memory barrier; readers may see stale values.
Always use atomic.LoadInt64 / atomic.Int64.Load.
"Mutexes are slower than atomics, so always use atomics."¶
A simplification. For a single counter, atomics are faster. For a group of related state, a single mutex is often faster than multiple atomic-fenced operations, because you avoid the multiple atomic instructions. Measure.
"Cache lines do not matter at low concurrency."¶
True up to a point — say, a single counter incremented by 4 cores at moderate rate. False at higher core counts or higher rates. The transition happens earlier than you think; senior file explores it.
"atomic.Value is for atomic counters."¶
No. atomic.Value (and its generic successor atomic.Pointer[T]) is for atomic snapshots of arbitrary types — you replace the whole value, you do not increment it. For counters, atomic.Int64/Uint64.
"I do not need defer on the gauge because my code never panics."¶
Code panics. Always. Even production code. Especially production code on Friday afternoon. Always defer.
"Atomic ops are lock-free, so they never block."¶
The instruction itself does not put the goroutine to sleep. But under heavy contention, the cache coherence traffic causes the CPU pipeline to stall — equivalent to blocking, just not in software terms. Atomics are not blocked but they are not free.
Appendix F: Frequently Re-Visited Idioms¶
Bookmark these. You will write them dozens of times in your career.
Counter increment¶
Counter snapshot¶
Gauge increment/decrement around work¶
Read-and-reset (for periodic flush)¶
First arrival sentinel¶
Maximum tracker¶
Counter per dynamic key¶
Counter exposed via expvar¶
That is the junior counter toolkit, complete. Build, ship, iterate, and graduate to middle when you start asking "but why is my counter slow at 32 cores?"
Appendix G: Walkthrough of Real Production Counter Code¶
Here is a sketch of counter code from a real-world Go service (sanitised). Annotate each line with what it teaches.
package server
import (
"context"
"expvar"
"net/http"
"sync/atomic"
"time"
)
var (
// Monotonic counters — only ever increase. Exposed via expvar.
requestsTotal = expvar.NewInt("server_requests_total")
errors4xxTotal = expvar.NewInt("server_errors_4xx_total")
errors5xxTotal = expvar.NewInt("server_errors_5xx_total")
bytesInTotal = expvar.NewInt("server_bytes_in_total")
bytesOutTotal = expvar.NewInt("server_bytes_out_total")
// Gauges — go up and down. Note: NOT suffixed _total.
inflight = expvar.NewInt("server_inflight")
// Sentinel — set once when shutdown begins.
draining atomic.Bool
)
type Server struct {
h http.Handler
}
func New(h http.Handler) *Server { return &Server{h: h} }
func (s *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) {
if draining.Load() {
// Refuse new work during graceful shutdown.
http.Error(w, "draining", http.StatusServiceUnavailable)
return
}
// Count *every* request, before any conditional logic.
requestsTotal.Add(1)
// Track in-flight with the defer pattern. If anything below
// panics, the deferred decrement still runs.
inflight.Add(1)
defer inflight.Add(-1)
// bytesInTotal: count what the client sent us. Use the
// Content-Length header if available; otherwise wrap the
// body in a counting reader (omitted here for brevity).
if r.ContentLength > 0 {
bytesInTotal.Add(r.ContentLength)
}
// Wrap the writer to count the response status and bytes.
rw := &countingWriter{ResponseWriter: w}
s.h.ServeHTTP(rw, r)
// After the handler returns, sort the status into a counter.
switch {
case rw.status >= 500:
errors5xxTotal.Add(1)
case rw.status >= 400:
errors4xxTotal.Add(1)
}
bytesOutTotal.Add(int64(rw.bytes))
}
type countingWriter struct {
http.ResponseWriter
status int
bytes int
}
func (c *countingWriter) WriteHeader(s int) {
c.status = s
c.ResponseWriter.WriteHeader(s)
}
func (c *countingWriter) Write(b []byte) (int, error) {
if c.status == 0 {
c.status = http.StatusOK
}
n, err := c.ResponseWriter.Write(b)
c.bytes += n
return n, err
}
func (s *Server) Shutdown(ctx context.Context) error {
draining.Store(true)
deadline := time.Now().Add(30 * time.Second)
for {
if inflight.Value() == 0 {
return nil
}
if time.Now().After(deadline) {
return ctx.Err()
}
select {
case <-ctx.Done():
return ctx.Err()
case <-time.After(50 * time.Millisecond):
}
}
}
Things to notice:
- Counter names match the metric names.
requestsTotalis the variable forserver_requests_total. Easy to grep production alerts back to code. - The
_totalsuffix is reserved for monotonic counters. Gauges (inflight) do not get it. - Drain check first. No counter increment for refused requests — they are not "served".
defer Add(-1)for the gauge. Panic-safe.- Counting is done in the wrapper, not in the handler. The handler does not know about metrics; the server does.
- Shutdown loops on
Value() == 0. The atomic counter is the heart of graceful shutdown.
This is what idiomatic, production-quality Go counter usage looks like. Junior engineers who internalise this template will write good metrics code on day one of every job.
Appendix H: Counters and the Question of Sampling¶
At very high RPS — say, 1M+ requests per second — even an atomic counter can become a bottleneck per core. One mitigation: do not increment every time; increment with some probability and scale up at read time.
const sample = 100 // 1 in 100
func handler(...) {
if rand.IntN(sample) == 0 {
c.Add(int64(sample))
}
}
// reading: c.Load() approximates the true count
Two things to keep in mind:
- This introduces sampling error of roughly sqrt(N/sample). For a true count of 1,000,000 with sample=100, expected error ~316.
rand.IntNitself contends on a global RNG (the oldermath/rand) or is goroutine-local (math/rand/v2). Use v2.
For most workloads, sampling is unnecessary — atomic.Int64 keeps up fine. For the few workloads where it does not, the senior file's sharded-counter approach is usually better than sampling because it preserves exact counts.
Appendix I: Counters in Tests vs Production¶
A counter that lives only in your test should still use atomic.Int64. It is the same primitive, and the race detector treats it the same.
func TestEventBus_FanOut(t *testing.T) {
bus := NewBus()
var calls atomic.Int64
bus.Subscribe(func(e Event) { calls.Add(1) })
for i := 0; i < 1000; i++ {
bus.Publish(Event{})
}
bus.Wait()
if got := calls.Load(); got != 1000 {
t.Errorf("expected 1000, got %d", got)
}
}
The principle "if more than one goroutine writes it, atomic" applies just as strongly in tests. Tests that race are flaky tests, and flaky tests destroy CI trust.
Appendix J: When to Read the Standard Library¶
Whenever you find yourself wondering "is this the right way to do X with counters?", grep the standard library for atomic.Int64. The Go authors are excellent practitioners; their patterns are usually the right ones to copy.
You will find counters in:
net/http(server bookkeeping)database/sql(connection pool stats)runtime/metrics(everything)testing(parallel test scheduling)expvar(everyIntis one)time(timer firings and resets)
Read them. They are excellent examples.
Truly Final Word¶
A counter is small. The lessons are large. Go forth, increment safely, and run -race.
When you start measuring increments per second and find them disappointing, the middle and senior files are waiting for you. But you do not need them yet. Build something first.
Appendix K: Building a Counter Library From Scratch (Exercise Walkthrough)¶
To consolidate, build a tiny counters package from scratch. It should expose:
Counter— monotonicGauge— up & downMax— tracks the largest observed valueMin— tracks the smallest (skip if always positive)- A
Registrythat holds named counters and can dump them as JSON
package counters
import (
"encoding/json"
"math"
"sync"
"sync/atomic"
)
type Counter struct{ v atomic.Int64 }
func (c *Counter) Inc() { c.v.Add(1) }
func (c *Counter) Add(n int64) { c.v.Add(n) }
func (c *Counter) Get() int64 { return c.v.Load() }
func (c *Counter) Reset() int64 { return c.v.Swap(0) }
type Gauge struct{ v atomic.Int64 }
func (g *Gauge) Inc() { g.v.Add(1) }
func (g *Gauge) Dec() { g.v.Add(-1) }
func (g *Gauge) Add(n int64) { g.v.Add(n) }
func (g *Gauge) Set(n int64) { g.v.Store(n) }
func (g *Gauge) Get() int64 { return g.v.Load() }
type Max struct{ v atomic.Int64 }
func (m *Max) Observe(x int64) {
for {
cur := m.v.Load()
if x <= cur {
return
}
if m.v.CompareAndSwap(cur, x) {
return
}
}
}
func (m *Max) Get() int64 { return m.v.Load() }
type Min struct{ v atomic.Int64 }
func NewMin() *Min {
var m Min
m.v.Store(math.MaxInt64)
return &m
}
func (m *Min) Observe(x int64) {
for {
cur := m.v.Load()
if x >= cur {
return
}
if m.v.CompareAndSwap(cur, x) {
return
}
}
}
func (m *Min) Get() int64 { return m.v.Load() }
type Registry struct {
mu sync.RWMutex
m map[string]any
}
func New() *Registry { return &Registry{m: map[string]any{}} }
func (r *Registry) MustRegister(name string, v any) {
r.mu.Lock()
defer r.mu.Unlock()
if _, ok := r.m[name]; ok {
panic("counters: duplicate name: " + name)
}
r.m[name] = v
}
func (r *Registry) JSON() ([]byte, error) {
r.mu.RLock()
defer r.mu.RUnlock()
out := make(map[string]int64, len(r.m))
for name, v := range r.m {
switch t := v.(type) {
case *Counter:
out[name] = t.Get()
case *Gauge:
out[name] = t.Get()
case *Max:
out[name] = t.Get()
case *Min:
out[name] = t.Get()
}
}
return json.Marshal(out)
}
Test it:
package counters
import (
"sync"
"testing"
)
func TestCounter(t *testing.T) {
var c Counter
var wg sync.WaitGroup
for i := 0; i < 100; i++ {
wg.Add(1)
go func() { defer wg.Done(); for j := 0; j < 1000; j++ { c.Inc() } }()
}
wg.Wait()
if got := c.Get(); got != 100_000 {
t.Errorf("expected 100000, got %d", got)
}
}
func TestMax(t *testing.T) {
var m Max
var wg sync.WaitGroup
for i := int64(0); i < 1000; i++ {
i := i
wg.Add(1)
go func() { defer wg.Done(); m.Observe(i) }()
}
wg.Wait()
if got := m.Get(); got != 999 {
t.Errorf("expected 999, got %d", got)
}
}
func TestRegistryJSON(t *testing.T) {
r := New()
c := &Counter{}
g := &Gauge{}
r.MustRegister("requests", c)
r.MustRegister("inflight", g)
c.Add(5)
g.Set(3)
b, err := r.JSON()
if err != nil { t.Fatal(err) }
if string(b) != `{"inflight":3,"requests":5}` && string(b) != `{"requests":5,"inflight":3}` {
t.Errorf("unexpected: %s", b)
}
}
Run with -race -count=10. You now have:
- Two simple atomic types (Counter, Gauge)
- Two CAS-loop atomic types (Max, Min)
- A registry pattern (the foundation of
expvar, Prometheus, OpenTelemetry, etc.) - JSON output (the foundation of
/debug/vars)
You wrote your own expvar. Read its source after this exercise — you will recognise almost everything.
Appendix L: A Truly Final Cheat Card¶
+-------------------------------------------------------------+
| ATOMIC COUNTERS - 60 SECOND SUMMARY |
+-------------------------------------------------------------+
| Declare: var c atomic.Int64 |
| Inc: c.Add(1) |
| Dec: c.Add(-1) |
| Read: c.Load() |
| Reset: c.Swap(0) |
| CAS: c.CompareAndSwap(old, new) |
| |
| Wrap: type MyCounter struct{ v atomic.Int64 } |
| Pointer: use *MyCounter receivers |
| Gauge: g.Add(1); defer g.Add(-1) |
| Test: go test -race -count=10 |
| expvar: var c = expvar.NewInt("name") |
+-------------------------------------------------------------+
Print it. Tape it to your monitor. Refer to it for one week. After that you will not need to.
Appendix M: Twenty Tiny Programs to Write¶
Each of these is a 20–50-line program that exercises one counter idea. Doing them all will give you fluency.
- Spawn 1,000 goroutines that each
count++a plainint. Print the result. Run 10 times. - Do (1) again with a
sync.Mutex. Confirm the answer is always 1,000. - Do (1) again with
atomic.Int64. Confirm. Benchmark vs (2). - Do (3) but with
atomic.AddInt64(&x, 1)(old API). Confirm and compare assembly withgo build -gcflags=-S. - Write a server that increments an in-flight gauge on entry and decrements on exit. Crash it under load; verify the gauge stays bounded.
- Write a server with a request counter. Expose it via
expvar.NewInt. Curl/debug/varsto see it. - Write a
MaxObservedthat tracks the largest value ever passed toObserve. UseCompareAndSwap. - Write a counter that resets every second by spawning a goroutine that calls
Swap(0)periodically. Print the per-second rate. - Write a "first arrival" sentinel using
atomic.Bool.CompareAndSwap(false, true). - Write a
WaitGroupfrom scratch usingatomic.Int64and a channel. Compare tosync.WaitGroup. - Build a simple token-bucket rate limiter using one atomic counter and a periodic refill.
- Write a benchmark that increments a counter from
BenchmarkXXX(b *testing.B) { b.RunParallel(...) }and run it with-cpu=1,2,4,8,16. - Take (12) and watch what happens to ops/sec as parallelism grows. Note the scaling.
- Add
atomic.Pointer[T]to your toolkit: store a snapshot struct of multiple counters in one atomic pointer. - Use
sync.Mapplusatomic.Int64to build a per-key counter. - Build a "circular" counter that wraps around modulo N using
CompareAndSwap. - Build a "report and reset" worker: every 5 seconds,
Swap(0)and emit the value. - Write a heartbeat: a goroutine that increments a counter every second; a watchdog that panics if it stops.
- Write a leader-election sentinel: many goroutines call
CompareAndSwap(0, myID); the winner becomes leader. - Write a test that fails without
-raceand passes with the fix. Verify both.
Twenty exercises. You can do them all in a long weekend. After that, you are no longer a counter junior.
Truly Truly Final Word¶
If you remember nothing else from this file, remember:
And run -race. Always.
Final Appendix: A Junior-Level Quiz¶
Quickly test your understanding without re-reading.
- What is the result of
count++from 1000 goroutines on a non-atomicint? - Which is faster for a simple counter:
sync.Mutexoratomic.Int64? - Why is
Swap(0)preferable toLoad()+Store(0)? - When do you use
defer counter.Add(-1)? - What does
atomic.Int64.Add(1)return? - What is the difference between a counter and a gauge?
- Why must you use a pointer receiver for methods on a struct with
atomic.Int64? - What does the
-raceflag do? - What is a "lost update"?
- How would you expose a counter via
/debug/vars?
If you can answer all ten without looking, you have absorbed the junior level. If any feel uncertain, revisit the relevant section.
Final Appendix: Five Common Junior Mistakes¶
Mistakes I see junior engineers make every week:
- Forgetting
-race: tests pass, production has races. - Mixing atomic and non-atomic reads: the writer uses Add, the reader uses direct access. Race.
- Copy-paste mutex around atomic: redundant, slower, sometimes wrong.
- Reset via Load+Store: loses concurrent increments.
- Forgetting
deferon gauge decrement: gauge leaks on panic.
Avoid these and you are ahead of 80% of junior Go engineers.
End of Junior File. Truly.