Paging and Caching Theory — Interview Questions¶
Table of Contents¶
- Conceptual Questions
- The Marquee: LRU is k-competitive & the lower bound
- Belady's Anomaly & Stack Algorithms
- Randomization & Resource Augmentation
- Applied: Real Cache Policies
- Gotcha / Trick Questions
- Rapid-Fire Q&A
- Common Mistakes
- Tips & Summary
Conceptual Questions¶
Q1: Define the paging problem and a page fault. (Easy)¶
Answer: A cache (fast memory) holds at most k pages. A request stream σ = ⟨p₁, p₂, …⟩ arrives one page at a time. On request pᵢ:
- If
pᵢis in the cache → a hit, cost 0, no action. - If
pᵢis not in the cache → a page fault (miss), cost 1. The page must be brought in; if the cache is full, the policy must evict one resident page to make room.
The only decision is which page to evict on a fault — and it must be made online, without seeing future requests. The objective is to minimize the total number of faults. (This is the standard demand paging model: never fetch a page before it is requested.)
Q2: What is Belady's algorithm (LFD/MIN) and why is it optimal? (Medium)¶
Answer: Belady's MIN, also called LFD (Longest Forward Distance): on a fault, evict the resident page whose next use is farthest in the future (evict the one never used again first, if any).
Why optimal: An exchange argument. Take any optimal schedule that disagrees with MIN at the first fault; show you can swap its eviction for MIN's farthest-future page without increasing the total fault count, because the page MIN keeps is needed sooner and the page MIN evicts can be re-fetched no earlier than it was needed anyway. Repeating the swap converts any optimum into MIN's behavior, so MIN is offline-optimal — it is the OPT benchmark for paging.
Why we can't run it online: LFD requires knowing the future request stream (the "forward distance"). An online policy sees only the past, so MIN is purely an analytical yardstick, not a deployable policy.
Q3: Name the standard online paging policies. (Easy)¶
Answer:
- LRU (Least Recently Used) — evict the page whose last use is oldest. Bets on locality: the recent past predicts the near future.
- FIFO (First-In First-Out) — evict the page that entered the cache earliest, ignoring use.
- LFU (Least Frequently Used) — evict the page with the smallest access count.
- FWF (Flush-When-Full) — when a fault occurs and the cache is full, evict everything and start fresh. A theoretical "marking" baseline, not a practical policy.
- LIFO, RAND (evict a uniformly random page), and marking algorithms round out the catalogue.
LRU, FIFO, and FWF are all k-competitive; LFU and LIFO are not competitive (no constant bound).
The Marquee: LRU is k-competitive & the lower bound¶
Q4: Prove LRU is k-competitive. (Hard — the phase/marking argument)¶
Answer: Partition σ into phases. Phase 1 starts at the first request; a new phase begins at the first request that would make the phase contain k + 1 distinct pages. So each phase contains exactly k distinct pages (except possibly the last, which has ≤ k).
LRU's cost per phase: Within one phase only k distinct pages appear, and LRU never evicts a page it has touched in the current phase before evicting all the others first. The key lemma: LRU faults at most k times per phase — each of the k distinct pages in a phase faults at most once, because once loaded it stays resident for the rest of the phase (any of the ≤ k−1 other distinct pages touched meanwhile cannot push it out before it is re-used). So LRU ≤ k faults per phase.
OPT's cost per phase: Consider a phase Pᵢ together with the first request of phase Pᵢ₊₁. These span k + 1 distinct pages. At the start of Pᵢ, OPT has at most k pages cached, one of which is the first page of Pᵢ. Over the k + 1 distinct pages spanning Pᵢ plus the next phase's first request, OPT must fault at least once per phase (it cannot hold all k + 1 distinct pages in k slots). Charging carefully, OPT faults ≥ 1 time per phase.
Therefore over all phases: LRU ≤ k · (faults of OPT) + O(1), i.e. LRU is k-competitive. The same marking argument shows FIFO and FWF are also k-competitive.
Q5: Show no deterministic policy beats k. (Hard — the lower bound)¶
Answer: Use only k + 1 distinct pages {p₀, …, p_k}. The cache holds k, so exactly one of the k + 1 is missing at any moment. The adversary always requests the one page the algorithm does not have.
- Against this stream, the deterministic algorithm faults on every single request — say
Nfaults overNrequests. - OPT on the same stream: OPT sees the future, so on each fault it evicts the page used farthest ahead. Among
k + 1pages, the one farthest in the future won't be requested for at least the nextk − 1steps, so OPT faults at most once everykrequests →OPT ≤ N/k + O(1).
Thus ALG / OPT ≥ N / (N/k) = k. Since this holds for every deterministic algorithm, no deterministic paging policy is better than k-competitive — and LRU/FIFO match it, so the bound k is tight.
Q6: Why is the competitive ratio identical for LRU and FIFO if LRU is clearly better in practice? (Medium)¶
Answer: Both are exactly k-competitive — worst-case competitive analysis cannot distinguish them. The adversary's "always request the missing page" stream defeats both equally. LRU's real-world superiority comes from locality of reference (recently-used pages are likely to be reused), which the worst-case adversary deliberately destroys. To separate the two analytically you need a refined model — e.g. the access-graph model or distributional / Markov request models — where LRU provably beats FIFO. This is the canonical example of competitive analysis being too pessimistic to match practice; see ../01-competitive-analysis/interview.md.
Belady's Anomaly & Stack Algorithms¶
Q7: What is Belady's anomaly? (Medium)¶
Answer: Belady's anomaly: for FIFO, increasing the cache size k can increase the number of faults on some request streams — more memory making things worse. The classic stream 1 2 3 4 1 2 5 1 2 3 4 5 produces 9 faults with k = 3 but 10 faults with k = 4. It is counterintuitive but real, and it is specifically a property of FIFO (and LIFO, RAND).
Q8: Why can't LRU exhibit Belady's anomaly? (Hard)¶
Answer: Because LRU (like MIN and LFU) is a stack algorithm, satisfying the inclusion property: at every point in the request stream, the set of pages a cache of size k would hold is a subset of what a cache of size k + 1 would hold. Formally the contents are nested across cache sizes — they form a "stack" ordered by recency.
The inclusion property directly implies monotonicity: any page that is a hit with cache size k is also a hit with size k + 1 (it's in the larger cache too). So faults can only decrease or stay equal as k grows — no anomaly is possible. FIFO violates inclusion (its contents depend on insertion order, not a size-independent priority), which is exactly why it can misbehave. The deep reason: stack algorithms order pages by a size-independent priority (recency for LRU, frequency for LFU, forward-distance for MIN), and a bigger cache just keeps a longer prefix of that same fixed ordering.
Randomization & Resource Augmentation¶
Q9: How does randomization beat the deterministic barrier of k? (Hard — MARKER)¶
Answer: The RANDOMIZED MARKING algorithm (MARKER). Each page has a mark bit; all start unmarked at the beginning of a phase.
- On a request: mark the page. If it's a fault and the cache is full, evict a uniformly random unmarked page.
- When all
kpages become marked, a new phase begins and all marks are cleared.
Result: MARKER is 2H_k-competitive against an oblivious adversary, where H_k = 1 + 1/2 + ⋯ + 1/k ≈ ln k is the k-th harmonic number. This is an exponential improvement: k → 2 ln k. The intuition: a phase has some number m of "new" pages (not present last phase); randomly choosing among unmarked pages means the k − m "old" pages are each evicted with only probability proportional to 1/i, summing to a harmonic — not linear — expected fault count.
The optimal randomized ratio is exactly H_k (achieved by the EQUITABLE / Partition algorithm), and H_k is also a matching lower bound — no randomized policy beats H_k against an oblivious adversary (provable via Yao's principle). Always name the oblivious adversary model: randomization gives no advantage against an adaptive-offline adversary.
Q10: Explain resource augmentation for paging. (Hard)¶
Answer: Resource augmentation gives the online algorithm a larger cache than OPT, then compares them. Let LRU use cache size k while OPT uses a smaller size h ≤ k. Then:
This is the Sleator–Tarjan bound. The point is the dramatic decay: when k is even modestly larger than h, the ratio collapses toward 1.
h = k→ ratiok(the standard bound — no extra resource).h = k/2→ ratio≈ 2.k = 2h→ ratio2h/(h+1) ≈ 2for largeh.
Practical reading: "a little more cache helps a lot." Doubling the online cache relative to the benchmark drops the worst-case ratio from k to roughly 2. It explains why real systems over-provision cache slightly and behave near-optimally — and it's a far more predictive lens than the bare k-competitive bound.
Applied: Real Cache Policies¶
Q11: Which real policies approximate LRU or Belady? (Medium)¶
Answer:
- CLOCK (Second-Chance) — an
O(1), low-overhead approximation of LRU using a circular buffer and one reference bit per page. Used in OS page replacement (Linux's page reclaim is a CLOCK variant). Avoids LRU's per-access list maintenance. - ARC (Adaptive Replacement Cache) — balances recency vs frequency with two LRU lists and ghost entries, self-tuning the split; scan-resistant. Used in ZFS.
- LRU-K — evicts by the
K-th most recent access (typically LRU-2), distinguishing one-hit-wonders from genuinely hot pages. - TinyLFU / W-TinyLFU — a frequency sketch (Count-Min) admission filter in front of an LRU/SLRU; the basis of Caffeine, a top-tier production cache, with hit rates near Belady on real traces.
- Learned / learning-augmented caching — uses an ML predictor of next-access time to mimic LFD: evict the page predicted to be used farthest in the future. With a good predictor it approaches Belady; theory (e.g. LMARKER, learning-augmented analysis) gives robustness guarantees that degrade gracefully when predictions are wrong — bounded by
O(min(consistency, robustness)).
The throughline: every strong practical policy is trying to approximate Belady's farthest-future eviction using only past signals (recency, frequency, or a learned predictor). See ../../21-advanced-structures/01-lru-cache/interview.md for the LRU data structure itself.
Q12: Why is LFU not competitive, and what are its pitfalls? (Medium)¶
Answer: LFU has no constant competitive ratio. A page accessed heavily early then never again can keep a high count and never get evicted, occupying a slot forever while genuinely hot new pages thrash — the adversary exploits exactly this. Practical pitfalls:
- Cache pollution / stale hot pages: old high-frequency pages never age out.
- Scan vulnerability: a one-time sweep of many pages each gets count 1; under naive LFU they're indistinguishable from each other, but pure LFU can still evict a moderately-used page in favor of nothing useful.
- No recency signal: frequency alone misses temporal locality.
Fixes that make frequency usable in practice: aging / decay of counts (windowed frequency), or combining frequency with recency (ARC, W-TinyLFU). Note LFU is a stack algorithm (no Belady anomaly), but stack-ness doesn't imply competitiveness — those are independent properties.
Gotcha / Trick Questions¶
Q13: "Is LRU always within k× of optimal in practice?" (Medium)¶
Answer: The k× is a worst-case ceiling, not a typical-case prediction. k-competitive means LRU(σ) ≤ k·OPT(σ) + O(1) for every σ, including the adversary's "always request the missing page" stream. On real workloads with locality, LRU is usually within a small constant of OPT, often a few percent of the optimal fault count. The k factor only materializes on adversarial, locality-free streams that real programs essentially never produce. (Same trap as the competitive-ratio-as-average-case gotcha in ../01-competitive-analysis/interview.md.)
Q14: "FIFO and LRU have the same competitive ratio, so just use the simpler FIFO, right?" (Medium)¶
Answer: No. Equal worst-case ratio (k) says nothing about typical performance, where LRU's locality exploitation makes it consistently better, and FIFO can even suffer Belady's anomaly (faults rising with cache size). Worst-case competitive analysis is simply blind to the locality that dominates real traces. Choose by empirical hit rate and the inclusion property, not by competitive ratio.
Q15: "Can an online policy ever match Belady?" (Easy)¶
Answer: No deterministic online policy can match MIN in general — MIN needs the future, and the k-lower-bound (Q5) proves any online policy is forced to k× on some stream. Online policies can only approximate MIN using past signals or predictions (learning-augmented caching). The gap is fundamental: it's the price of not seeing the future, exactly the kind of information handicap competitive analysis quantifies.
Rapid-Fire Q&A¶
| # | Question | Answer |
|---|---|---|
| 1 | Cost model for paging? | Fault = 1, hit = 0; minimize faults |
| 2 | Offline optimum for paging? | Belady's MIN / LFD — evict farthest-future use |
| 3 | Why can't MIN run online? | It needs the future request stream |
| 4 | LRU competitive ratio? | k (tight) |
| 5 | FIFO competitive ratio? | k (tight) |
| 6 | Deterministic lower bound? | k — adversary requests the missing page |
| 7 | LFU competitive ratio? | None — not competitive |
| 8 | What is FWF? | Flush-When-Full; also k-competitive |
| 9 | RANDOMIZED MARKING ratio? | 2H_k ≈ 2 ln k (oblivious) |
| 10 | Optimal randomized ratio? | H_k ≈ ln k (tight) |
| 11 | Which adversary for randomized bounds? | Oblivious |
| 12 | Belady's anomaly affects? | FIFO (not LRU/LFU/MIN) |
| 13 | Why is LRU anomaly-free? | Stack algorithm — inclusion property |
| 14 | Resource-augmentation ratio (LRU k vs OPT h)? | k / (k − h + 1) |
| 15 | CLOCK approximates? | LRU (O(1) with reference bits) |
| 16 | Policy that mimics Belady via prediction? | Learning-augmented caching |
| 17 | Production frequency-based cache? | W-TinyLFU (Caffeine) / ARC |
Common Mistakes¶
- Saying LRU is "always within k× in practice." That's the worst-case ceiling; real locality makes it near-optimal. The
konly appears on adversarial streams. - Claiming Belady can be implemented online. LFD needs the future; it's only an analytical benchmark (
OPT). - Thinking equal competitive ratio means equal performance. LRU and FIFO are both
k-competitive but behave very differently on real traces. - Believing more cache always helps. FIFO can fault more with a bigger cache (Belady's anomaly); only stack algorithms guarantee monotonic improvement.
- Quoting
2H_korH_kwithout the adversary model. They are oblivious-adversary bounds; randomization gives no edge against adaptive-offline. - Calling LFU competitive. It has no constant ratio — stale hot pages and scans break it. Stack-ness ≠ competitiveness.
- Confusing the resource-augmentation ratio. It's
k/(k−h+1)with the online cachek ≥ h; settingh = krecovers the plain ratiok.
Tips & Summary¶
- Lead with the benchmark: "OPT is Belady's MIN — evict the farthest-future page — which is offline-optimal but unrealizable online." Stating this frames every other answer.
- Memorize the canonical numbers: deterministic k (tight, both LRU and FIFO); randomized MARKER
2H_k, optimalH_k ≈ ln k(oblivious); resource augmentationk/(k−h+1). These are the constants interviewers expect on sight. - Narrate the lower bound as a story: with
k+1pages the adversary always requests the one you lack, so you fault every step while OPT faults once perk— ratiok. - For LRU's upper bound, use phases: each phase has
kdistinct pages, LRU faults ≤ k and OPT faults ≥ 1 per phase. - Tie theory to practice: competitive analysis can't separate LRU from FIFO, but locality and the inclusion property (no Belady anomaly) do — and CLOCK/ARC/TinyLFU/learned-caching all chase Belady with past-only signals.
- Always name the adversary model for any randomized bound, and remember Belady's anomaly is a FIFO-specific failure that stack algorithms (LRU/LFU/MIN) provably avoid.
Related: Competitive Analysis — Interview · LRU Cache — Interview · Paging & Caching Theory — Middle
In this topic
- interview
- tasks