sync.Map — Middle Level¶

Table of Contents¶

Introduction
The Full API at Middle Level
Go 1.20 Additions in Depth
Decision Matrix: sync.Map vs RWMutex + map
Benchmarks You Should Run Yourself
Atomic-Update Patterns
Range Semantics in Detail
Tracking Size
TTL and Eviction Wrappers
Mixed Workloads — When Neither Fits
Self-Assessment
Summary

Introduction¶

At junior level you learned the sync.Map API and the rule "read-mostly, stable keys, otherwise default to RWMutex+map." At middle level you start justifying that rule with measurements and choosing the right tool with data. After this file you will:

Know every method on sync.Map, including the Go 1.20 additions, in detail.
Reason about why sync.Map is slower for write-heavy workloads, with numbers.
Pick between sync.Map, RWMutex+map, and sharded variants for real scenarios.
Build atomic-update patterns using CompareAndSwap.
Wrap sync.Map with TTL and size-tracking.

The Full API at Middle Level¶

package sync

type Map struct { /* unexported */ }

func (m *Map) Load(key any) (value any, ok bool)
func (m *Map) Store(key, value any)
func (m *Map) LoadOrStore(key, value any) (actual any, loaded bool)
func (m *Map) LoadAndDelete(key any) (value any, loaded bool)
func (m *Map) Delete(key any)
func (m *Map) Range(f func(key, value any) bool)

// Go 1.20+
func (m *Map) Swap(key, value any) (previous any, loaded bool)
func (m *Map) CompareAndSwap(key, old, new any) (swapped bool)
func (m *Map) CompareAndDelete(key, old any) (deleted bool)

Every method is safe for concurrent use. None return errors; their effects are observable via the ok/loaded/swapped/deleted booleans. The key must be comparable; values can be anything.

`LoadOrStore` revisited¶

The atomic "set if absent" is more than convenience — it eliminates the classic race:

// Bad: check-then-act
if _, ok := m.Load(k); !ok {
    m.Store(k, v) // another goroutine may have stored between
}

// Good
m.LoadOrStore(k, v)

The loaded return tells you whether the entry pre-existed. If you only care about putting something there, you can ignore it. If you care about which value won, use actual.

`LoadAndDelete` revisited¶

The atomic "take and remove" is the work-handoff primitive. Producers Store; consumers LoadAndDelete. Only one consumer can succeed per entry. Without LoadAndDelete, you would have to lock externally to prevent two consumers from taking the same key.

if v, ok := m.LoadAndDelete(jobID); ok {
    process(v.(*Job)) // we, and only we, took this job
}

Go 1.20 Additions in Depth¶

Three methods landed in Go 1.20 that close the longest-standing gap in sync.Map: atomic updates.

`Swap` — set and return the previous value¶

func (m *Map) Swap(key, value any) (previous any, loaded bool)

Atomically replaces the value for key with value and returns the previous one. Like Store but with the old value as a return. Useful when you want to act on the old value while installing a new one.

previous, loaded := registry.Swap(connID, newConn)
if loaded {
    previous.(*Conn).Close() // old connection bumped out
}

Before 1.20 you would need a Load followed by a Store, with the racy gap in between.

`CompareAndSwap` — atomic conditional update¶

func (m *Map) CompareAndSwap(key, old, new any) (swapped bool)

Replaces the value only if the current value is equal to old. Returns true on success. The values are compared with ==, the standard Go equality, so old must be a comparable type. Crucially, a struct or pointer with the same shape but different identity may or may not compare equal depending on the type — be precise about what you store.

// Increment-if-current pattern
for {
    v, _ := m.Load("hits")
    if m.CompareAndSwap("hits", v, v.(int)+1) {
        break
    }
}

Failure modes: - old does not equal the current value → returns false, no change. - The key does not exist → returns false, no change. CompareAndSwap does not insert.

If you need "insert if absent, atomically update if present," use LoadOrStore first, then CompareAndSwap in a retry loop.

`CompareAndDelete` — atomic conditional remove¶

func (m *Map) CompareAndDelete(key, old any) (deleted bool)

Removes the entry only if the current value equals old. Useful for "remove only if I am the most recent writer" patterns:

// Remove cache entry if it has not been refreshed
m.CompareAndDelete(key, staleValue)

This eliminates the race where you Load a stale value, decide to delete, and meanwhile another goroutine refreshed it.

Values must be comparable for CAS variants¶

CompareAndSwap and CompareAndDelete compare values with ==. If your values are interface types holding a slice, map, or function, the comparison panics at runtime:

var m sync.Map
m.Store("k", []int{1, 2})
m.CompareAndSwap("k", []int{1, 2}, []int{3, 4})
// panic: runtime error: comparing uncomparable type []int

Pointer values (*Entry) compare by address, which is usually what you want for cache entries.

Decision Matrix: `sync.Map` vs `RWMutex + map`¶

The most useful skill at middle level is choosing correctly. Here is the decision table I keep on my desk:

Workload property	sync.Map	RWMutex + map	Sharded map	Notes
Read-mostly, stable keys	Best	OK	OK	The sweet spot.
Read-mostly, growing keys	OK	Best	OK	sync.Map pays for `dirty` rebuilds.
Balanced read/write	Slow	Best	Best	sync.Map writes are expensive.
Write-heavy	Slow	Good	Best	Contention is the bottleneck; sharding cuts it.
Per-key contention low (writes spread)	OK	OK	Best	Sharding shines.
Per-key contention high (hot key)	Slow	Slow	Slow	None help; rethink data model.
Need atomic counter increment	OK (Go 1.20+)	OK	OK	Or use `atomic.Int64` outside the map.
Need `Len()`	No	Yes	Yes	sync.Map has no `Len`.
Need ordered iteration	No	Yes	Yes	None of these give order; use a sorted slice.
Need atomic snapshot	No	Yes	Hard	RWMutex+map is easiest: `RLock`, copy keys, `RUnlock`.
Small (< 100 entries)	Slow	Best	Overkill	The mutex is rarely contended at this size.
Huge (1M+ entries), churn	Memory amp	OK	Best	sync.Map retains tombstones.
Generic value type wanted	No (any)	Yes	Yes	Build a generic wrapper if needed.
Cross-goroutine writes rare	Best	OK	OK	sync.Map's other sweet spot.

The two patterns the Go authors explicitly designed for:

The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys.

If your workload is neither, sync.Map is almost always slower than RWMutex+map.

Benchmarks You Should Run Yourself¶

Here are skeletal benchmarks. Copy them into your project, adjust the workload, and measure.

// bench_test.go
package mapbench_test

import (
    "fmt"
    "sync"
    "testing"
)

const N = 10000

func setupBuiltin() (map[int]int, *sync.RWMutex) {
    m := make(map[int]int)
    for i := 0; i < N; i++ {
        m[i] = i
    }
    return m, &sync.RWMutex{}
}

func setupSyncMap() *sync.Map {
    var m sync.Map
    for i := 0; i < N; i++ {
        m.Store(i, i)
    }
    return &m
}

func BenchmarkReadMostly_RWMutex(b *testing.B) {
    m, mu := setupBuiltin()
    b.RunParallel(func(pb *testing.PB) {
        i := 0
        for pb.Next() {
            mu.RLock()
            _ = m[i%N]
            mu.RUnlock()
            i++
        }
    })
}

func BenchmarkReadMostly_SyncMap(b *testing.B) {
    m := setupSyncMap()
    b.RunParallel(func(pb *testing.PB) {
        i := 0
        for pb.Next() {
            _, _ = m.Load(i % N)
            i++
        }
    })
}

func BenchmarkBalanced_RWMutex(b *testing.B) {
    m, mu := setupBuiltin()
    b.RunParallel(func(pb *testing.PB) {
        i := 0
        for pb.Next() {
            if i%2 == 0 {
                mu.RLock()
                _ = m[i%N]
                mu.RUnlock()
            } else {
                mu.Lock()
                m[i%N] = i
                mu.Unlock()
            }
            i++
        }
    })
}

func BenchmarkBalanced_SyncMap(b *testing.B) {
    m := setupSyncMap()
    b.RunParallel(func(pb *testing.PB) {
        i := 0
        for pb.Next() {
            if i%2 == 0 {
                _, _ = m.Load(i % N)
            } else {
                m.Store(i%N, i)
            }
            i++
        }
    })
}

func BenchmarkWriteHeavy_RWMutex(b *testing.B) {
    m, mu := setupBuiltin()
    b.RunParallel(func(pb *testing.PB) {
        i := 0
        for pb.Next() {
            mu.Lock()
            m[i%N] = i
            mu.Unlock()
            i++
        }
    })
}

func BenchmarkWriteHeavy_SyncMap(b *testing.B) {
    m := setupSyncMap()
    b.RunParallel(func(pb *testing.PB) {
        i := 0
        for pb.Next() {
            m.Store(i%N, i)
            i++
        }
    })
}

var _ = fmt.Sprintf

Run with:

go test -bench=. -benchmem -cpu=1,2,4,8 ./...

Indicative results on Apple M2, Go 1.22, 8 cores¶

These are rough numbers — your hardware will differ. Always re-measure.

Benchmark	-cpu=1	-cpu=4	-cpu=8
ReadMostly_RWMutex	30 ns/op	90 ns/op	200 ns/op
ReadMostly_SyncMap	15 ns/op	18 ns/op	22 ns/op
Balanced_RWMutex	50 ns/op	280 ns/op	700 ns/op
Balanced_SyncMap	90 ns/op	220 ns/op	350 ns/op
WriteHeavy_RWMutex	80 ns/op	350 ns/op	900 ns/op
WriteHeavy_SyncMap	180 ns/op	600 ns/op	1 500 ns/op

Lessons:

Read-mostly with stable keys: sync.Map scales beautifully. At 8 cores it is roughly 10× faster than RWMutex+map, because reads are lock-free.
Balanced workloads: sync.Map and RWMutex+map are comparable. sync.Map may pull ahead on many cores; the gap is small.
Write-heavy: RWMutex+map wins. sync.Map's write path is more expensive per operation. Plus, both contend on a single lock at high concurrency, so consider sharding.

The above numbers reverse if your workload has high churn (many new keys and deletions). sync.Map slows further as the read/dirty rebuild fires repeatedly.

Atomic-Update Patterns¶

Pattern 1: Increment-if-present with `CompareAndSwap`¶

func increment(m *sync.Map, key any) {
    for {
        v, ok := m.Load(key)
        if !ok {
            m.LoadOrStore(key, 1)
            return
        }
        next := v.(int) + 1
        if m.CompareAndSwap(key, v, next) {
            return
        }
    }
}

The retry loop handles concurrent updates. If CompareAndSwap fails, another goroutine bumped the value first; we retry with the new current value.

Caveat: for hot counters, this can spin under contention. An atomic.Int64 is faster. Use sync.Map for per-key counters where each key is rarely contended.

Pattern 2: Replace-if-equal for cache invalidation¶

func refreshIfStale(m *sync.Map, key string, stale *Entry) {
    fresh := load(key)
    m.CompareAndSwap(key, stale, fresh)
}

If another goroutine already refreshed, our CompareAndSwap does nothing. No lost updates, no double-refresh.

Pattern 3: Delete-if-equal for safe eviction¶

func evict(m *sync.Map, key string, expected *Entry) {
    m.CompareAndDelete(key, expected)
}

Common in TTL eviction: a background goroutine schedules a delete after N seconds, but only if the entry has not been replaced in the meantime.

Pattern 4: Atomic swap with side effect¶

prev, loaded := m.Swap(connID, newConn)
if loaded {
    prev.(*Conn).Close()
}

The Swap returns the old value; we close it. This pattern is connection-pool gold.

Range Semantics in Detail¶

The Range callback signature:

func(key, value any) bool

Return true to continue, false to stop.

The spec:

Range does not necessarily correspond to any consistent snapshot of the Map's contents: no key will be visited more than once, but if the value for any key is stored or deleted concurrently (including by f), Range may reflect any mapping for that key from any point during the Range call.

Translation:

Each key existing at the start of Range is visited at most once.
Keys inserted during Range may or may not be visited.
Keys deleted during Range may or may not be visited.
The value observed for a key may be the value at any point during the call.
Order is unspecified.

Consequences¶

Don't use Range for "atomic snapshot" of state. Use RWMutex+map if you need that.
Modifying the map from inside Range is safe but the modifications interact with iteration in implementation-defined ways. Avoid.
Range runs in O(n) of the visible entries. For large maps with many tombstones (see professional level), the actual work can exceed the apparent count.

Common Range patterns¶

// Collect keys (loosely consistent)
var keys []string
m.Range(func(k, _ any) bool {
    keys = append(keys, k.(string))
    return true
})

// Filter and collect
var actives []*Conn
m.Range(func(_, v any) bool {
    c := v.(*Conn)
    if c.Active() {
        actives = append(actives, c)
    }
    return true
})

// Early exit on match
var found *Entry
m.Range(func(_, v any) bool {
    e := v.(*Entry)
    if e.ID == target {
        found = e
        return false
    }
    return true
})

Tracking Size¶

sync.Map has no Len(). The recommended way to track size, if you need it:

type CountedMap struct {
    m sync.Map
    n int64 // atomic
}

func (c *CountedMap) Store(k, v any) {
    if _, loaded := c.m.LoadOrStore(k, v); loaded {
        c.m.Store(k, v) // overwrite
    } else {
        atomic.AddInt64(&c.n, 1)
    }
}

func (c *CountedMap) Delete(k any) {
    if _, loaded := c.m.LoadAndDelete(k); loaded {
        atomic.AddInt64(&c.n, -1)
    }
}

func (c *CountedMap) Len() int64 {
    return atomic.LoadInt64(&c.n)
}

Notes:

The Store path uses LoadOrStore first to detect "was it already there?" then a second Store for the overwrite. Two operations, slightly slower.
The Len is eventually consistent: between a Store and the counter Add, a concurrent Len may be off by one. Usually acceptable.
If you need precise size, you need a mutex, which defeats the point of sync.Map. Consider RWMutex+map instead.

TTL and Eviction Wrappers¶

A common request: "cache values with a TTL." sync.Map does not natively support this; you build it on top:

type ttlEntry struct {
    value   any
    expires time.Time
}

type TTLMap struct {
    m sync.Map
}

func (t *TTLMap) Set(key any, value any, ttl time.Duration) {
    t.m.Store(key, ttlEntry{value, time.Now().Add(ttl)})
}

func (t *TTLMap) Get(key any) (any, bool) {
    v, ok := t.m.Load(key)
    if !ok {
        return nil, false
    }
    e := v.(ttlEntry)
    if time.Now().After(e.expires) {
        t.m.CompareAndDelete(key, e) // safe: only if unchanged
        return nil, false
    }
    return e.value, true
}

The CompareAndDelete ensures we do not evict an entry that was refreshed between our Load and the delete.

For periodic sweep:

func (t *TTLMap) Sweep() {
    now := time.Now()
    t.m.Range(func(k, v any) bool {
        if now.After(v.(ttlEntry).expires) {
            t.m.CompareAndDelete(k, v)
        }
        return true
    })
}

Run Sweep from a ticker. Note that high-churn TTL workloads can hurt sync.Map performance due to memory amplification — consider an eviction-aware library like ristretto or freecache for serious caching.

Mixed Workloads — When Neither Fits¶

Two scenarios where the choice is harder:

Scenario A: Bursty writes, otherwise read-mostly¶

Example: a config map that is mostly read, but occasionally a bulk update writes 1 000 entries.

If reads vastly outnumber writes overall, sync.Map still wins despite the burst. But during the burst, throughput drops. If the burst causes user-visible latency, consider an atomic.Pointer[map[K]V] that swaps the entire map atomically on update — readers see no contention at all, writers pay the full rebuild cost.

type Config struct {
    m atomic.Pointer[map[string]string]
}

func (c *Config) Get(k string) (string, bool) {
    v, ok := (*c.m.Load())[k]
    return v, ok
}

func (c *Config) Swap(newMap map[string]string) {
    c.m.Store(&newMap)
}

This is the "copy-on-write map" pattern. Reads are a single atomic load plus a map index — even faster than sync.Map. The catch: every write rebuilds the whole map.

Scenario B: Hot keys¶

If a small set of keys gets most of the writes, no map structure helps — your data model is concentrated. Solutions:

Per-shard atomic counters ([N]atomic.Int64) indexed by hash.
Move hot state out of the map into a typed struct with atomic.Pointer.
Aggregate updates in a buffered channel processed by a single goroutine.

The map is not the bottleneck; the contention is. Sharding the map without sharding the access pattern does not help.

Sharded map skeleton¶

const shardCount = 64

type ShardedMap struct {
    shards [shardCount]struct {
        sync.RWMutex
        m map[string]any
    }
}

func (s *ShardedMap) shardFor(key string) *struct {
    sync.RWMutex
    m map[string]any
} {
    h := fnv32(key)
    return &s.shards[h%shardCount]
}

func (s *ShardedMap) Get(key string) (any, bool) {
    sh := s.shardFor(key)
    sh.RLock()
    v, ok := sh.m[key]
    sh.RUnlock()
    return v, ok
}

func (s *ShardedMap) Set(key string, value any) {
    sh := s.shardFor(key)
    sh.Lock()
    sh.m[key] = value
    sh.Unlock()
}

64 shards means at most 1/64 of operations contend on the same lock. For write-heavy workloads at high concurrency, this typically outperforms both sync.Map and a single RWMutex+map.

Self-Assessment¶

Summary¶

sync.Map is a specialised concurrent map with a small but powerful API. Since Go 1.20 it supports atomic update via Swap, CompareAndSwap, and CompareAndDelete. Its sweet spot — read-mostly with stable keys, or per-goroutine disjoint writes — wins by an order of magnitude over RWMutex+map at high concurrency. Outside that sweet spot, a plain RWMutex+map is faster, simpler, and typed.

The middle-level skill is measurement-driven choice: run the benchmarks, look at your read/write ratio, count your keys, ask whether you need Len or snapshot semantics, and pick deliberately. For write-heavy concentrated workloads, sharding beats both. For bursty config swaps, atomic.Pointer[map] is even faster than sync.Map on the read path.

At senior level we examine deeper trade-offs: generic wrappers, singleflight for load-once, and the limits of all these structures when contention concentrates.