Performance Roadmap¶

"Make it work, make it right, make it fast — in that order, but only if 'fast' is on the requirements list."

This roadmap is about measuring, profiling, and optimising the runtime cost of code — latency, throughput, memory, cache behaviour, contention — and protecting hot paths against regression over the lifetime of a system.

Looking for language-internals substrate (memory model, scheduler, GC algorithms)? See Language Internals.

Looking for system-design level capacity planning (back-of-envelope QPS, sharding, load balancing)? See System Design and the system-design-estimation skill.

Looking for production diagnostics (when slow becomes an incident)? See Diagnostics.

Why a Dedicated Roadmap¶

Most performance content is either intro-level ("use a hash map for lookup") or deep specialisation ("here's how SSE intrinsics work for image filtering"). The senior middle ground — how to measure honestly, what flame graphs are actually telling you, when not to optimise — is rarely consolidated.

Roadmap	Question it answers
Testing	Does it work?
Build Systems	Can I build it reproducibly?
Performance (this)	Is it fast enough — and how do I keep it that way?

Sections¶

Each topic ships the full five-tier set — junior / middle / senior / professional / interview.

#	Topic	Focus	Status
01	Profiling	CPU / memory / allocation profiles, flame graphs, pprof / perf / Instruments / async-profiler	✅
02	Benchmarking & Microbenchmarks	Micro-benchmarks done right (avoiding DCE, JIT warm-up, branch-prediction noise); macro-benchmarks; statistical stability	✅
03	Latency & Throughput	Little's Law, the p99 trap, tail-at-scale, coordinated omission, queueing, budgets	✅
04	CPU-Bound Optimization	Profile-first, the memory hierarchy, branch prediction, SIMD, data layout, PGO	✅
05	Memory & Allocation Optimization	Allocation rate vs residency, escape analysis, GC pressure, allocators, GOMEMLIMIT/OOMKills	✅
06	Concurrency & Contention	Amdahl & USL, lock contention, false sharing, cache coherence, scheduler effects, scaling curves	✅
07	Performance Budgets & Regression Testing	Budgets as SLOs, benchstat/Mann-Whitney, change-point detection, CI gates, trend dashboards	✅

The 01-profiling section is further split into four sub-topics — CPU, Memory, Allocation, and Flame Graphs — each with the full five-tier set.

Languages¶

Examples in Go (pprof, benchstat, runtime/trace), Java (JFR, async-profiler, JMH, GC logs), Python (cProfile, py-spy, tracemalloc, pytest-benchmark), Rust (cargo bench, criterion, perf, flamegraph), and C/C++ (perf, Intel VTune, valgrind callgrind).

Status¶

✅ Content-complete. All 7 sections — including the 4 profiling sub-topics — are written across the full five-tier set (junior / middle / senior / professional / interview): 50 topic files in total.

References¶

Systems Performance — Brendan Gregg (the canonical text; the USE method)
Java Performance: The Definitive Guide — Scott Oaks
Designing Data-Intensive Applications — Martin Kleppmann (response-time chapters)
Methodology talks — Aleksey Shipilëv (JMH and "the magic of -XX:+PrintCompilation")
Tail at Scale — Dean & Barroso (the classic on p99 latency)

Project Context¶

Part of the Senior Project — a personal effort to consolidate the essential knowledge of software engineering in one place.