Memoization & Caching — Professional Level¶

Category: Object & State Patterns — understand the cache below the API: hashing, memory layout, eviction internals, and the cost of every hit and miss.

Prerequisites: Junior · Middle · Senior Focus: Under the hood

Table of Contents¶

Introduction
The Cost of a Hit vs a Miss
Hashing and Key Equality
LRU Internals
Window-TinyLFU: How Caffeine Evicts
lru_cache Internals (CPython)
Memory Footprint
Benchmarks
Diagrams
Related Topics

Introduction¶

A cache trades memory for time, but each layer of that trade has a measurable cost: the hash of the key, the probe into the table, the bookkeeping that eviction requires. At the professional level you should be able to:

Explain why a cache miss costs more than no cache at all (lookup plus compute plus store).
Reason about key hashing and equality as the thing that makes or breaks hit rate.
Describe how LRU and Window-TinyLFU maintain their orderings in O(1).
Predict the per-entry memory cost in Python, Java, and Go.

The Cost of a Hit vs a Miss¶

Every cached call pays a lookup. The break-even depends on hit rate:

cost_no_cache   = compute
cost_with_cache = lookup + (miss_rate × (compute + store))

A cache helps only when lookup + miss_rate × (compute + store) < compute, i.e. when compute ≫ lookup and hit rate is high. Concretely:

A hash-map lookup is ~10–50 ns.
If compute is ~30 ns (a couple of arithmetic ops), caching is a net loss — you added a lookup to save nothing.
If compute is ~30 µs (a DB round-trip, a render), a 90% hit rate is a ~10× win.

The miss is the worst case: lookup → compute → store. A workload that's all misses is strictly slower than no cache. Measure hit rate before claiming a speedup.

Hashing and Key Equality¶

The cache key flows through hashCode/__hash__ and equals/__eq__. Two requirements:

Consistency: equal keys must hash equally, and equality must be stable while the key is in the cache. A key whose hash changes after insertion is lost (it now lands in a different bin) — a leak and a permanent miss.
Cheap and well-distributed: a slow or clustering hash turns O(1) lookups into O(n) bucket walks.

// A record gives you value-based hashCode/equals for free → safe cache key
record Key(String user, String currency) {}

# Tuples of hashable items are the canonical Python cache key
key = (user, currency)          # hashable, value-equal, immutable

// Go map keys must be comparable; structs of comparable fields work,
// slices/maps/functions do NOT (compile error).
type Key struct{ User, Currency string }

Argument normalization is part of the key contract: if f(1, 2) and f(2, 1) should hit the same entry, sort or canonicalize before keying — otherwise you fragment the cache and tank the hit rate.

LRU Internals¶

A correct LRU must answer two questions in O(1): "which entry was used least recently?" and "mark this entry most-recently used." The standard structure is a hash map + doubly-linked list:

hashmap: key → node            (O(1) lookup)
list:    MRU ⇄ … ⇄ LRU         (O(1) move-to-front, O(1) evict-tail)

Get(hit): map lookup → unlink node → push to front. O(1).
Put(new): insert at front; if over capacity, drop the tail node and its map entry. O(1).

from collections import OrderedDict   # move_to_end gives O(1) LRU

class LRU:
    def __init__(self, cap):
        self.cap, self.d = cap, OrderedDict()
    def get(self, k):
        if k not in self.d: return None
        self.d.move_to_end(k)          # mark MRU
        return self.d[k]
    def put(self, k, v):
        self.d[k] = v
        self.d.move_to_end(k)
        if len(self.d) > self.cap:
            self.d.popitem(last=False)  # evict LRU

CPython's lru_cache uses exactly this — a dict plus a circular doubly-linked list of (prev, next, key, result) nodes.

Window-TinyLFU: How Caffeine Evicts¶

Pure LRU has a flaw: a burst of one-shot keys (a scan) flushes the genuinely hot set. Caffeine uses Window-TinyLFU, the current state of the art:

A small admission window (LRU) catches recency.
A frequency sketch (a Count-Min Sketch, ~ few bits per counter) estimates how often each key has been seen, cheaply and approximately.
On eviction contention, the TinyLFU admission filter asks: is the candidate seen more often than the victim it would replace? If not, it's rejected — protecting the hot set from scan pollution.

The frequency counters age (halved periodically) so the cache adapts to shifting workloads. The result: near-optimal hit rates at O(1) per operation with a few bytes of overhead per entry. This is why "use Caffeine" beats "hand-roll LRU" almost every time.

lru_cache Internals (CPython)¶

functools.lru_cache is implemented in C (_functools) for speed. Key facts:

Key construction: positional args become a tuple; keyword args are appended with a separator; the kwargs/typed handling means f(3) and f(3.0) are the same key unless typed=True.
Thread safety: the cache dict updates are guarded so concurrent calls don't corrupt the structure — but the user function may still run twice on a simultaneous cold miss (the lock is not held across the call).
maxsize=None: skips all the linked-list LRU bookkeeping entirely — it's a plain dict, faster per call but unbounded.
cache_info() exposes hits, misses, maxsize, currsize — free instrumentation.

@lru_cache(maxsize=None, typed=True)   # typed: 3 and 3.0 are distinct keys
def f(x): ...

Memory Footprint¶

Per-entry overhead matters when you bound by size and care about heap.

Python¶

@lru_cache(maxsize=N)
def f(k): ...

Each entry stores: the key tuple, the result, and an LRU node (prev, next, key, result). Rough cost: ~200–300 bytes per entry beyond the key/value payloads, dominated by Python object headers and the dict's open-addressing slots.

Java (Caffeine)¶

Each entry: a node object (key ref, value ref, access/write timestamps for expiry, frequency bits), plus the ConcurrentHashMap bucket. Roughly ~60–100 bytes/entry of bookkeeping on top of the key/value objects themselves.

Go (map)¶

A map[K]V entry costs the key + value inline in buckets (8 entries/bucket) plus tophash bytes and overflow pointers — roughly ~50 bytes/entry of overhead for small types, less per entry when the bucket is full. No built-in eviction, so you pay for your own LRU structure on top.

Rule of thumb: budget a few hundred bytes per cached entry beyond the data. A 1M-entry cache is hundreds of MB of overhead alone. Bound deliberately.

Benchmarks¶

Apple M2 Pro, single thread, indicative numbers.

Fibonacci: naive vs memoized¶

fib(35) naive recursion      ~ 50,000,000 ns   (exponential)
fib(35) memoized             ~        2,000 ns  (linear)            ~25,000× faster

Lookup cost (hit path)¶

Python  dict get             ~  40 ns
Python  lru_cache hit        ~  90 ns   (key tuple build + LRU touch)
Java    ConcurrentHashMap    ~  20 ns
Java    Caffeine get (hit)   ~  35 ns
Go      map[K] read (no lock)~  10 ns
Go      map + RWMutex read   ~  25 ns

Break-even¶

compute < lookup            → cache is a NET LOSS (e.g. caching x*2)
compute ≈ µs, hit rate high → cache is a large win
all-miss workload           → strictly slower than no cache

Diagrams¶

LRU = hashmap + doubly-linked list¶

flowchart LR subgraph Map K1[keyA] --> N1 K2[keyB] --> N2 K3[keyC] --> N3 end subgraph List["MRU ⇄ LRU"] N1[A] <--> N2[B] <--> N3[C] end Evict["evict tail (C)"] -.-> N3

Hit/miss cost model¶

flowchart TD Call --> Lookup{hit?} Lookup -- hit --> H["cost = lookup (cheap)"] Lookup -- miss --> M["cost = lookup + compute + store (expensive)"] M --> Note["all-miss workload is slower than no cache"]

CPython source: Lib/functools.py and Modules/_functoolsmodule.c (lru_cache).
Caffeine / TinyLFU: "TinyLFU: A Highly Efficient Cache Admission Policy" (Einziger, Friedman, Manes).
Count-Min Sketch: Cormode & Muthukrishnan, the structure behind frequency estimation.
Distributed end: Redis eviction policies.

← Senior · Object & State · Next: Interview