Senior
What? At the senior level, reasoning under uncertainty becomes a quantitative discipline: you distinguish risk (known distribution) from uncertainty (unknown distribution) from ignorance (unknown unknowns), you calibrate your own confidence against reality, and you treat every estimate as a distribution rather than a point.
How? You build measured priors, run sequential Bayesian updates as evidence streams in, track your own hit rate to stay calibrated, surface variance instead of hiding it behind means, and choose actions by expected outcome under explicit assumptions — communicating all of it honestly to the people who depend on you.
1. Three things people lump together: risk, uncertainty, ignorance¶
Frank Knight (Risk, Uncertainty, and Profit, 1921) drew a line that most engineers blur. Add a third category that the software world forces on us, and you get a crucial taxonomy:
| Type | You know the outcomes? | You know the probabilities? | Engineering example |
|---|---|---|---|
| Risk | yes | yes | "This disk has a 0.5%/yr AFR" — measured, distribution known |
| Uncertainty (Knightian) | yes | no | "How likely is this brand-new dependency to have a critical CVE this year?" — outcomes known, probability genuinely unknown |
| Ignorance | no | no | The failure mode you've never imagined — the unknown unknown |
Why this matters: the right tool differs per category.
- For risk, compute. Expected value, error budgets, redundancy math all apply cleanly.
- For uncertainty, you can't compute a clean EV because you don't trust the probabilities — so you buy optionality and robustness instead: reversible decisions, feature flags, canaries, kill switches. You hedge against a distribution you can't pin down.
- For ignorance, no probability helps at all — you invest in detection and recovery (observability, blast-radius limits, fast rollback) because you literally cannot enumerate what you're defending against.
Senior failure mode: treating Knightian uncertainty as if it were risk — slapping a confident-looking probability on something you fundamentally cannot estimate, then optimizing against that fake number. A made-up "5% chance" can be more dangerous than an honest "we don't know," because it looks rigorous.
2. Calibration: is your 70% actually 70%?¶
Saying "I'm 70% confident" is only useful if claims you label 70% come true ~70% of the time. That property is calibration, and it is trainable — the central finding of Philip Tetlock's Superforecasting and his decades of forecasting research: ordinary people who track their predictions and update honestly beat credentialed experts who don't.
A calibration curve plots your stated confidence against your actual hit rate:
Actual hit rate
100% │ ╱ ← perfect calibration
│ ╱
75% │ ╱ ● ← you here: claims you call 90%
│ ╱ only come true 75% → OVERCONFIDENT
50% │ ╱
│ ╱
25% │ ╱
│ ╱
0%└────────────────────────────────
0% 25% 50% 75% 100%
Stated confidence
Most engineers are overconfident: their 90% claims hit 70%, their "definitely done by Friday" lands Tuesday-next-week. A few are underconfident. You only find out which by logging predictions and scoring them.
A clean way to score: the Brier score (mean squared error of probabilistic predictions), lower is better.
Brier = mean( (forecast − outcome)² )
forecast = your stated probability (0–1)
outcome = 1 if it happened, 0 if not
Predict 0.9, it happens → (0.9 − 1)² = 0.01 (good)
Predict 0.9, it doesn't → (0.9 − 0)² = 0.81 (brutal — confident & wrong)
The squared term punishes confident wrong answers hard, which is exactly the discipline overconfident engineers need.
How to actually calibrate yourself:
- Write predictions with probabilities before outcomes are known — incident postmortems, estimate confidence, "this PR will need a follow-up fix: 30%."
- Revisit and score them. A simple spreadsheet beats nothing.
- Bucket by confidence (50s, 70s, 90s) and check each bucket's real hit rate.
- Adjust: if your 90% bucket hits 70%, start saying 70% when you feel 90%.
3. Sequential updating: belief as a running quantity¶
The middle level ran one Bayesian update. In real incident response and debugging, evidence arrives in a stream, and your posterior after one piece becomes the prior for the next. The odds form makes this trivial — just keep multiplying likelihood ratios:
Worked debugging trace — "Is the latency spike caused by the new caching layer we shipped today?"
Prior: We ship ~10 changes/day; this one touches the hot path.
Reasonable prior the cache is the cause: 30% → odds 0.30/0.70 = 0.43
Evidence 1: Spike started 4 min after the cache deploy.
Much more likely if cache is the cause. LR ≈ 5.
odds: 0.43 × 5 = 2.15 → P ≈ 68%
Evidence 2: Cache hit-rate dashboard is flat/normal.
If the cache were broken we'd expect hit-rate anomalies. LR ≈ 0.4 (evidence AGAINST).
odds: 2.15 × 0.4 = 0.86 → P ≈ 46%
Evidence 3: Same spike visible in a service that doesn't use the cache.
Strongly against. LR ≈ 0.1.
odds: 0.86 × 0.1 = 0.086 → P ≈ 8%
Conclusion: Probably NOT the cache. Look for a shared dependency (DB, network).
Notice that evidence pointing away (LR < 1) is as informative as evidence pointing toward. Engineers who only seek confirming evidence (classic confirmation bias — see cognitive biases) never get to update down, so they chase the wrong root cause for an hour.
4. Point estimates that lie: the variance you hid¶
A senior engineer never reports a single number for anything that has a distribution. The classic offenders:
Latency¶
A "mean response time of 80ms" can hide a p99 of 900ms. Users live in the tail. Always report percentiles (p50/p95/p99) and remember that you cannot average percentiles or add them naively across services — a request hitting two services each with p99=100ms does not have p99=200ms; tail latencies compound non-linearly.
Estimates / deadlines¶
"It'll take 3 days" is a point estimate of a right-skewed distribution. The honest form is a range with a confidence:
Bad: "3 days."
Good: "Most likely 3 days; 80% confident it lands within 2–6 days;
if the auth refactor is worse than expected, it could be 2 weeks."
The tail (the 2 weeks) is the entire reason the estimate matters to a planner. Hiding it behind "3 days" is the most expensive point estimate in software. This is the heart of estimation under uncertainty.
Capacity / load¶
"We can handle 1,000 RPS" — at what percentile of request cost? With what variance in payload size? A mean-based capacity plan falls over the first time real traffic's variance spikes, even if the mean is within budget.
The discipline: whenever a single number leaves your mouth, append the spread. "X, with a tail out to Y in the Z% case." If you don't know the spread, that's the finding — say so.
5. Decision-making under uncertainty: act on expected outcomes¶
You will never have certainty when the decision matters, so optimize expected outcome, not guaranteed outcome — and weight by cost, not just probability.
Two decisions, same probability, opposite calls:
| Decision | P(bad) | Cost if bad | Cost if you wait | Call |
|---|---|---|---|---|
| Merge a small refactor | 5% | minor revert | blocks 3 people | Merge |
| Run irreversible data migration | 5% | unrecoverable data loss | one day delay | Wait / make reversible |
Same 5%, but the asymmetry of consequences flips the decision. Seniors think in terms of:
- Reversibility — Bezos's "one-way vs two-way doors." Spend your uncertainty budget on irreversible decisions; move fast on reversible ones.
- Blast radius — bound the cost of being wrong (canary 1% of traffic) so that a wrong bet is cheap to discover and undo.
- Expected value of information — sometimes the best action is to reduce uncertainty cheaply first (run the migration on a staging copy, add a metric) before committing.
This bleeds directly into risk and failure probabilities and base rates and expected value.
6. Communicating uncertainty without losing trust¶
Honesty about uncertainty increases credibility when done well and destroys it when done badly. The difference is structure.
| Don't | Do |
|---|---|
| "It might work, might not, who knows." | "70% confident this fixes it; the risk is the retry logic. If it doesn't, we'll know within an hour from this metric." |
| Bury the range; report only the happy path. | Lead with the most-likely outcome, then state the tail and its trigger. |
| Hide a changed estimate. | "New info moved my estimate from 3 days to a week — here's what changed." Updating is a strength, not a flip-flop. |
| Give false precision: "94.3% sure." | Use honest granularity: "roughly 9 in 10." |
Principle: Quantify the confidence, name the dominant risk, and state what evidence would change your mind. People can plan around "70% with this risk." They cannot plan around "should be fine."
7. Nate Silver's distinction: signal vs noise¶
Nate Silver's The Signal and the Noise makes a point seniors feel daily: more data does not automatically mean more knowledge — it often means more noise, more spurious correlations, more chances to fool yourself. A flood of dashboards can lower the signal-to-noise ratio. The Bayesian frame is the antidote: start from a real prior, and only let diagnostic evidence (high or low likelihood ratio) move you. Evidence with LR ≈ 1 is noise dressed as insight — ignore it no matter how much of it you have.
8. Senior checklist¶
- I distinguish risk / uncertainty / ignorance and apply the matching tool (compute / hedge / detect-and-recover).
- I log predictions with probabilities and score my calibration (Brier, hit-rate buckets).
- I update sequentially and let disconfirming evidence move me down.
- I never ship a bare point estimate where a distribution lives; I report percentiles and ranges-with-confidence.
- I choose actions by expected outcome weighted by the cost of being wrong, favoring reversibility.
- I communicate uncertainty as "most-likely + tail + trigger," and I treat updating my estimate as a strength.
| Continue | Topic |
|---|---|
| Org-scale: probabilistic SLOs, error budgets, risk to leadership | professional.md |
| Interview practice | interview.md |
| Hands-on posterior/calibration exercises | tasks.md |
| Section overview | Probabilistic Thinking |
In this topic
- junior
- middle
- senior
- professional