Tasks
Practice tasks for base rates and expected value. Several require a worked numeric answer — show the arithmetic. Global constraints: state every probability and payoff explicitly, keep payoffs in one currency per task, label any ruin / irreversible outcome separately (never average it into EV), and when estimating, prefer a reference class over imagination. A correct EV number with an un-flagged ruin risk is a wrong answer.
Task 1 — Compute a basic EV and decide¶
A nightly batch can run with or without a checksum verification pass. Without: 100% it completes fast, but 4% of nights it silently corrupts output (cost 250). With: always +5 cost (the extra pass), and corruption drops to 0.2% (cost 250).
Compute EV (as expected cost) for each option and choose. Then answer: does the "corruption" outcome change your reasoning beyond the EV?
Deliverable: two EV-cost numbers, a choice, and one sentence on whether corruption is recoverable (and if not, why that overrides EV).
Worked answer
Checksum wins (5.5 < 10.0). If corruption is **irreversible** (no upstream copy to re-derive from), it's a ruin-flavored outcome — you'd run the check even if the EV were closer, because you can't average over a permanent data loss.Task 2 — Spot the base-rate neglect¶
A teammate insists a 2am latency spike "must be a kernel scheduler regression — the graph looks exactly like one." Your incident history: of the last 25 incidents, 17 were config/deploy changes, 5 were a dependency, 2 were capacity, 1 was a kernel issue.
State the base-rate prior, what your first hypothesis should be, and name the bias the teammate is exhibiting.
Worked answer
Prior: P(config/deploy) ≈ 17/25 = 68%; P(kernel) ≈ 4%. First hypothesis: check the most recent deploy/config change. The teammate is judging by **representativeness** ("looks like a kernel bug") and neglecting the base rate. Investigate by frequency, not resemblance.Task 3 — Reference-class forecast¶
You must estimate a "migrate service to new message queue" task. Inside-view gut feel: 5 days. Past infra migrations (estimate → actual): 4→9, 6→11, 3→8, 8→14, 5→9.
Compute the median actual/estimate ratio and produce a reference-class estimate with p50 and a conservative commitment number.
Worked answer
Ratios: 2.25, 1.83, 2.67, 1.75, 1.80 → sorted 1.75, 1.80, 1.83, 2.25, 2.67 → **median 1.83**. Use the upper ratios (≈2.5) for a p80 commitment: `5 × 2.5 ≈ 12–13 days`. Commit ~12–13d externally; plan capacity to ~9d. The 5-day gut figure is the planning fallacy.Task 4 — Posterior from a noisy alert (base rate + EV of action)¶
An anomaly detector fires "possible data breach." It catches real breaches 90% of the time and false-alarms on 3% of normal days. Real breaches occur ~1 day in 2,000.
Compute P(real breach | alarm). Then: a full incident response costs 8 units; missing a real breach costs 5,000 units. Should you respond on every alarm?
Worked answer
Over 2,000 days: 1 real (≈0.9 caught), 1,999 clean (≈60 false alarms at 3%). EV of responding to an alarm: even at 1.5% real, expected averted loss = `0.015 × 5,000 = 75` vs response cost 8. **Respond** — the asymmetry (cheap response, catastrophic miss) makes it +EV despite the low posterior. (And a breach may be a ruin-class event, reinforcing "respond.")Task 5 — Retry vs fail-fast EV, then break it¶
A call to a flaky downstream. Fail-fast: 100% immediate error (cost 10). Retry once: 75% the retry succeeds (cost 2 for latency), 25% it also fails (cost 13).
Compute EV-cost of each and choose. Then describe the scenario where this EV-based choice becomes dangerous.
Worked answer
Retry wins (4.75 < 10). **Dangerous when** failures are *correlated* — if the downstream is fully down, P(success) → 0, retries just multiply load and cause a retry storm/cascading failure (a fat-tailed, possibly ruin-class outcome). Gate retries behind a circuit breaker; the EV math assumed independent transient failures.Task 6 — Expected value of information (run the spike?)¶
You'll choose architecture A or B. Blind: A has EV +60 but a 35% chance it's unsuitable (−40); B is safe at +25. A 3-day spike (cost 18) would reveal whether A is suitable.
Compute EV with and without the spike, the EVI, and decide.
Worked answer
EVI (22.75) > spike cost (18) → **run the spike**. Note: blind EV(A) ties B at 25, so the spike is what makes A clearly worth pursuing — its value is avoiding A's −40 case.Task 7 — Spot the ruin risk¶
Rank these three +EV proposals and flag any that must be rejected or re-engineered regardless of EV:
- Deploy strategy: +5 EV/deploy, 1.2% chance per deploy of unrecoverable prod-DB corruption.
- Caching change: +30 EV, worst case is a 2-minute stale-read window (auto-heals).
- Cost optimization: +12 EV, worst case is a 1-in-3 chance of a recoverable 10-minute degradation.
Worked answer
#1 is a **ruin item** — unrecoverable corruption. Reject as framed despite +EV: over 60 deploys, `0.988⁶⁰ ≈ 48%` chance of catastrophe. Re-engineer to reversible (verified backups, expand/contract) before it's allowed. #2 and #3 have bounded, recoverable downside → optimize on EV normally (#2 then #3). **Survive first, optimize second.**Task 8 — Build vs buy EV table¶
Build the EV table (cost currency, 2-year horizon) and decide.
- Buy: 80% vendor fine (cost 100k); 20% forced migration (cost 240k).
- Build: 50% smooth (cost 160k); 50% overruns at your reference ratio (cost 300k).
Then state one non-EV factor that could legitimately flip the decision.
Worked answer
EV says **buy** (128k < 230k). Legitimate flip: if the capability is a *strategic differentiator* (your moat), the value of owning it is undercounted in pure cost-EV; or if "buy" carries vendor lock-in *tail risk* (price hike / shutdown) you must weigh outside the EV.Task 9 — Prioritize a reliability backlog by ROI¶
Rank by ROI = (expected loss averted) / (fix cost):
| Item | P/quarter | Cost if it happens | Fix cost |
|---|---|---|---|
| Idempotent payment webhook | 0.25 | 400k | 12k |
| p99 latency cut | 0.90 | 25k | 35k |
| Flaky deploy step | 0.40 | 80k | 10k |
| Log lib migration | 0.10 | 15k | 30k |
Worked answer
Order: **webhook → deploy → latency → logs.** The log migration is near-zero ROI now — defer it despite being tempting hygiene.Task 10 — St. Petersburg / unbounded-EV trap¶
A vendor pitches a feature where "the upside is basically unlimited." Their headline EV is enormous, driven almost entirely by a <1% scenario with a giant payoff.
Explain, referencing the St. Petersburg paradox, why a huge headline EV doesn't justify the bet, and what you'd compute instead.
Worked answer
St. Petersburg (Bernoulli, 1738): a game with *infinite* EV that no rational person pays much for, because we maximize **utility** (concave, diminishing) not raw EV, and the EV is dominated by negligible-probability tail events. Same here: an EV propped up by a <1% giant payoff is fragile and unactionable. Compute instead the EV **excluding the fat tail** (does it still pay?), the **downside/variance**, and the **utility** to the org — and demand a capped-downside, reversible pilot before committing.Task 11 — SRE: error budget as EV¶
Your SLO is 99.95% monthly availability ≈ 21.6 min/month budget. You've spent 14 min this month. A risky migration has a 20% chance of a 30-min outage and 80% chance of 0.
Compute expected budget spend and decide whether to ship now or wait for next month's budget.
Worked answer
Expected spend (6) < remaining (7.6), so on *average* it fits. **But** the *worst case* (30 min) blows the budget by 22 min — if a single full SLO miss has outsized consequences (customer SLA penalties = a tail/ruin term), wait or canary to shrink the 30-min blast radius. EV says marginal-go; tail-awareness says de-risk first.Task 12 — Design a base-rate-driven runbook step¶
Write the first three steps of an incident runbook so that the team's measured base rates (≈70% config/deploy, ≈15% dependency, ≈10% capacity) are enforced by process, not left to whoever is calmest. Explain which bias each step counters.
Worked answer
1. **"What changed in the last N hours?"** — list recent deploys/config/flag flips and consider rollback. Counters base-rate neglect by forcing the 70% prior first. 2. **"Check dependency health dashboards."** — the next-most-likely class (15%). Counters tunnel-vision on the first theory. 3. **"Check capacity/saturation signals."** — the 10% class. Only after 1–3 do you entertain exotic hypotheses (the residual few %). Steps are *ordered by base rate*, so attention is allocated by frequency, not by the most vivid story — institutionalizing the correction for representativeness bias.Where to go next¶
- Update priors with evidence: reasoning under uncertainty.
- Tail and irreversible risk in depth: risk and failure probabilities.
- Turning reference classes into ranges: estimation under uncertainty.
- The biases behind these traps: cognitive biases in code decisions · evaluating tradeoffs objectively.
- Section root: probabilistic thinking · engineering thinking.
In this topic
- interview
- tasks