Concurrent LRU — Find the Bug¶
Each snippet contains a real concurrency or correctness bug in a concurrent LRU. Find it, explain it, fix it.
Bug 1 — Forgotten map delete on eviction¶
func (c *LRU[K, V]) Set(k K, v V) {
c.mu.Lock()
defer c.mu.Unlock()
if c.order.Len() >= c.capacity {
back := c.order.Back()
c.order.Remove(back) // evict from list
// BUG
}
ent := &entry[K, V]{k, v}
c.items[k] = c.order.PushFront(ent)
}
Bug. The list removes the back element, but the map still contains its key. The map grows unbounded; future Gets for the "evicted" key return a stale *list.Element pointing to a freed node.
Fix. Extract the key from the back element before removing, then delete from the map:
Bug 2 — Get without lock¶
func (c *LRU[K, V]) Get(k K) (V, bool) {
if e, ok := c.items[k]; ok {
c.mu.Lock()
c.order.MoveToFront(e)
c.mu.Unlock()
return e.Value.(*entry[K, V]).val, true
}
var zero V
return zero, false
}
Bug. The map read c.items[k] is unsynchronized. Concurrent Set may resize the map while this Get reads it, causing a crash or wrong result. Go maps are not safe for concurrent read-and-write.
Fix. Take the lock before any map access:
func (c *LRU[K, V]) Get(k K) (V, bool) {
c.mu.Lock()
defer c.mu.Unlock()
if e, ok := c.items[k]; ok {
c.order.MoveToFront(e)
return e.Value.(*entry[K, V]).val, true
}
var zero V
return zero, false
}
Bug 3 — Eviction callback re-entering the cache¶
cache, _ := lru.NewWithEvict[string, *Conn](128, func(k string, v *Conn) {
cache.Add(k+":closed", v) // callback writes back to cache
})
Bug. The callback runs under the cache's lock. Calling cache.Add deadlocks because the lock is not reentrant.
Fix. Push work to a separate goroutine:
cache, _ := lru.NewWithEvict[string, *Conn](128, func(k string, v *Conn) {
go func() { cache.Add(k+":closed", v) }()
})
Or, more typically, perform a non-cache action in the callback (close a file, decrement a refcount).
Bug 4 — RWMutex for Get¶
type LRU[K comparable, V any] struct {
mu sync.RWMutex
items map[K]*list.Element
order *list.List
// ...
}
func (c *LRU[K, V]) Get(k K) (V, bool) {
c.mu.RLock()
defer c.mu.RUnlock()
if e, ok := c.items[k]; ok {
c.order.MoveToFront(e) // RACE: list mutation under RLock
return e.Value.(*entry[K, V]).val, true
}
var zero V
return zero, false
}
Bug. MoveToFront mutates the list. Multiple goroutines holding RLock simultaneously will race on the list pointers. Eventually the list corrupts (cycles, nil dereferences).
Fix. Either use sync.Mutex, or split into Get (Lock) and Peek (RLock):
func (c *LRU[K, V]) Get(k K) (V, bool) {
c.mu.Lock()
defer c.mu.Unlock()
// ...
}
func (c *LRU[K, V]) Peek(k K) (V, bool) {
c.mu.RLock()
defer c.mu.RUnlock()
// ... no MoveToFront
}
Bug 5 — Len lock-free under contention¶
Bug. list.List.Len reads internal state. Without synchronization, this races with concurrent Set/Remove operations modifying the list.
Fix.
Bug 6 — Non-power-of-two shards with bitmask¶
type ShardedLRU struct {
shards [10]*lru.Cache[string, int] // 10 shards
}
func (c *ShardedLRU) pick(k string) *lru.Cache[string, int] {
h := hash(k)
return c.shards[h & 9] // mask of (10-1) = 9
}
Bug. & 9 is & 0b1001. Indexes 2, 3, 4, 5, 6, 7 never appear (those bits are unset in 9). Only shards 0, 1, 8, 9 receive any keys.
Fix. Use a power-of-two shard count (16, 32, 64). Then the mask N-1 is all ones.
Bug 7 — Capturing loop variable in goroutine¶
func StartWorkers(items []string) {
cache, _ := lru.New[string, int](1024)
for i, item := range items {
go func() {
cache.Add(item, i) // captures item and i by reference
}()
}
}
Bug. Pre-Go 1.22, all goroutines share the same item and i. They see the final values, not their per-iteration values.
Fix. Pass as parameters:
On Go 1.22+, the for-range scoping fix makes this work, but explicit is still clearer.
Bug 8 — Eviction callback panics¶
cache, _ := lru.NewWithEvict[string, *File](128, func(k string, f *File) {
f.Close() // panics if f is nil
})
Bug. If the value is nil (e.g., the cache was given cache.Add(k, nil)), the callback panics. The panic happens under the cache's lock; the cache may be left in an inconsistent state.
Fix. Nil-check in the callback:
Or panic-recover in the callback to prevent crashing the cache.
Bug 9 — Race between Peek and Add¶
func GetOrAdd[K comparable, V any](c *lru.Cache[K, V], k K, v V) V {
if cur, ok := c.Peek(k); ok {
return cur
}
c.Add(k, v)
return v
}
Bug. Between Peek and Add, another goroutine may insert a different value for k. Two callers may both Peek (miss), both Add. The second Add overwrites the first.
Fix. Use PeekOrAdd, which is atomic:
func GetOrAdd[K comparable, V any](c *lru.Cache[K, V], k K, v V) V {
prev, ok, _ := c.PeekOrAdd(k, v)
if ok {
return prev
}
return v
}
Bug 10 — Cache key includes timestamp¶
func (s *Service) Get(ctx context.Context, userID string) (*User, error) {
key := userID + ":" + time.Now().Format(time.RFC3339)
if u, ok := s.cache.Get(key); ok {
return u, nil
}
// ...
}
Bug. Every call generates a new key (timestamp varies). The cache always misses; every request goes to the loader. The cache is pure overhead.
Fix. Use a stable key:
If you need time-bucketed caching, round the timestamp to the bucket:
Bug 11 — Sharding by first character¶
Bug. Bias toward the most common first character. For user-prefix keys like "user:", all keys hit the same shard.
Fix. Use a hash function:
func (c *Cache) pick(k string) *shard {
var h maphash.Hash
h.SetSeed(c.seed)
h.WriteString(k)
return c.shards[h.Sum64()&c.shardMask]
}
Bug 12 — Writing to the cached value¶
Bug. Another goroutine reading the same u sees the modified value. Worse, if u is being marshalled to JSON elsewhere, the mutation may corrupt the output.
Fix. Treat cached values as immutable. To "update", create a new value and re-cache:
Bug 13 — Cache size pinned at 0¶
cache, err := lru.New[string, int](capacity)
if err != nil {
log.Printf("cache disabled: %v", err)
cache = &lru.Cache[string, int]{} // zero value
}
Bug. The zero-valued Cache is unusable. Subsequent calls panic with nil pointer dereference.
Fix. Either error out or use a no-op cache:
Or define a no-op cache type implementing the interface:
type noopCache struct{}
func (n *noopCache) Get(k string) (int, bool) { return 0, false }
func (n *noopCache) Add(k string, v int) {}
Bug 14 — TTL check after Get promotes¶
func (c *TTLCache) Get(k string) (string, bool) {
e, ok := c.inner.Get(k) // promotes recency
if !ok {
return "", false
}
if time.Now().After(e.expireAt) {
return "", false // returns miss, but recency was already updated
}
return e.val, true
}
Bug. Get promotes the entry to MRU. Then the TTL check fails. The cache now contains an expired entry at the MRU position — it will live longest before evicting.
Fix. Use Peek for the TTL check:
e, ok := c.inner.Peek(k)
if !ok || time.Now().After(e.expireAt) {
c.inner.Remove(k)
return "", false
}
c.inner.Get(k) // now promote
return e.val, true
Or store TTL outside the cache and remove on expiry.
Bug 15 — Long callback under lock¶
cache, _ := lru.NewWithEvict[string, *Result](128, func(k string, r *Result) {
if err := publishToKafka(r); err != nil {
log.Printf("kafka publish failed: %v", err)
}
})
Bug. Kafka publish can take 50-500 ms. The callback runs under the cache's lock, blocking all other operations on the cache for that duration.
Fix. Send to a buffered channel; a worker goroutine publishes:
ch := make(chan *Result, 1000)
go func() {
for r := range ch {
publishToKafka(r)
}
}()
cache, _ := lru.NewWithEvict[string, *Result](128, func(k string, r *Result) {
select {
case ch <- r:
default:
// queue full; drop or log
}
})
Bug 16 — Two caches sharing one mutex¶
type Service struct {
mu sync.Mutex
users *lru.Cache[string, *User]
sessions *lru.Cache[string, *Session]
}
func (s *Service) GetUser(id string) *User {
s.mu.Lock()
defer s.mu.Unlock()
if u, ok := s.users.Get(id); ok { return u }
// ...
}
Bug. Each cache has its own internal mutex. Wrapping them with s.mu adds a useless second lock and unrelated caches serialize against each other.
Fix. Remove s.mu:
Bug 17 — Reading cache.Keys() while iterating¶
func (s *Service) DumpCache() {
keys := s.cache.Keys()
for _, k := range keys {
v, _ := s.cache.Get(k) // promotes, may evict another
log.Printf("%s = %v", k, v)
}
}
Bug. Each Get promotes the key, which may evict another key. By the time you iterate over keys, some keys are no longer in the cache.
Fix. Use Peek to avoid mutating recency:
Bug 18 — Sharing context across loaders¶
func (s *Service) Load(ctx context.Context, k string) (*Value, error) {
s.ctx = ctx // store for use by singleflight
v, err, _ := s.sf.Do(k, s.loadOne)
return v.(*Value), err
}
func (s *Service) loadOne() (interface{}, error) {
return s.fetch(s.ctx, ...) // uses the stored ctx
}
Bug. Multiple goroutines store their ctx into s.ctx. The last writer wins; the loader uses a context that may not match the caller. If one caller's context is cancelled, the loader may abort, breaking other callers.
Fix. Use the appropriate context for the loader; do not store across calls:
v, err, _ := s.sf.Do(k, func() (interface{}, error) {
return s.fetch(context.Background(), ...) // long-lived ctx
})
Or use singleflight.DoChan and select on multiple contexts.
Bug 19 — Forgotten error when loading¶
Bug. If loader failed, v is the zero value (often nil). The cache now holds nil; future Gets return (nil, true). Callers think they got a value when they didn't.
Fix.
Bug 20 — Eviction policy assumption in callback¶
cache, _ := lru.NewWithEvict[string, int](100, func(k string, v int) {
if v > 1000 {
log.Printf("evicted important value: %s = %d", k, v)
}
})
Bug. The callback runs for every eviction, not just specific ones. Under high load, it logs continuously; the log itself becomes a bottleneck.
Fix. Log at a sample rate, or move the value-check to where the value is added:
var evicts atomic.Uint64
cache, _ := lru.NewWithEvict[string, int](100, func(_ string, _ int) {
evicts.Add(1)
})
Bug 21 — Map iteration order assumption¶
func (c *MyCache) FindLRU() (string, int) {
for k, e := range c.items { // iteration order is random
if isOldest(e) {
return k, e.val
}
}
return "", 0
}
Bug. Go map iteration order is random. The loop returns some "oldest" entry but not necessarily the oldest one across calls.
Fix. Use the linked list to find the LRU end:
Bug 22 — singleflight.Forget race¶
Bug. Forget removes the key from the in-flight set. But if another goroutine is currently in Do for the same key and the loader is still running, that goroutine still gets the result. If you then start a new Do for the same key, you create a duplicate concurrent load.
Fix. Only Forget on error:
v, err, _ := sf.Do(key, func() (interface{}, error) {
v, err := loader()
if err != nil {
sf.Forget(key) // allow retry
return nil, err
}
return v, nil
})
Bug 23 — Comparing pointers to entries¶
e1, _ := cache.GetOldest()
// ... some operations ...
e2, _ := cache.GetOldest()
if e1 == e2 { // bug?
// both calls returned the same entry
}
Bug. GetOldest returns the value, not a pointer to the cache's entry. The comparison is on the values; for non-pointer types it's comparing values; for pointer types, two distinct calls may return the same key but different *list.Element pointers (depending on internal reuse).
Fix. Compare keys:
Bug 24 — Sharded cache without seed¶
type ShardedLRU struct {
shards []*lru.Cache[string, int]
}
func (c *ShardedLRU) pick(k string) *lru.Cache[string, int] {
return c.shards[hash(k) % len(c.shards)] // no seed
}
Bug. If hash is deterministic (e.g., FNV), an attacker can craft keys that all hash to the same shard, defeating sharding.
Fix. Use maphash with a random seed:
type ShardedLRU struct {
shards []*lru.Cache[string, int]
seed maphash.Seed
}
func (c *ShardedLRU) pick(k string) *lru.Cache[string, int] {
var h maphash.Hash
h.SetSeed(c.seed)
h.WriteString(k)
return c.shards[h.Sum64() & uint64(len(c.shards)-1)]
}
Bug 25 — Resize during eviction storm¶
Bug. If Len is 1M and newCap is 1K, Resize evicts 999K entries while holding the lock. The cache is unavailable for the duration.
Fix. Resize incrementally:
for cache.Len() > newCap {
cache.RemoveOldest()
if cache.Len() > newCap {
runtime.Gosched() // yield to other goroutines
}
}
cache.Resize(newCap) // final adjustment
Or schedule the resize during low traffic.
Conclusion¶
These are real bugs encountered in production code. Many are subtle — the code looks correct on first read. Concurrency bugs in particular are hard to spot in review and harder to reproduce in tests.
Tools that help:
go test -race— catches data races at runtime.go vet— catches some structural issues.staticcheck— catches common Go mistakes.- Code review by someone who has seen concurrency bugs before.
The cost of a cache bug in production: hours of debugging, possibly customer-visible incidents, sometimes data loss. The cost of careful review: 30 minutes. Always invest in the review.