Skip to content

Performance Roadmap

"Make it work, make it right, make it fast — in that order, but only if 'fast' is on the requirements list."

This roadmap is about measuring, profiling, and optimising the runtime cost of code — latency, throughput, memory, cache behaviour, contention — and protecting hot paths against regression over the lifetime of a system.

Looking for language-internals substrate (memory model, scheduler, GC algorithms)? See Language Internals.

Looking for system-design level capacity planning (back-of-envelope QPS, sharding, load balancing)? See System Design and the system-design-estimation skill.

Looking for production diagnostics (when slow becomes an incident)? See Diagnostics.


Why a Dedicated Roadmap

Most performance content is either intro-level ("use a hash map for lookup") or deep specialisation ("here's how SSE intrinsics work for image filtering"). The senior middle ground — how to measure honestly, what flame graphs are actually telling you, when not to optimise — is rarely consolidated.

Roadmap Question it answers
Testing Does it work?
Build Systems Can I build it reproducibly?
Performance (this) Is it fast enough — and how do I keep it that way?

Sections

Each topic ships the full five-tier set — junior / middle / senior / professional / interview.

# Topic Focus Status
01 Profiling CPU / memory / allocation profiles, flame graphs, pprof / perf / Instruments / async-profiler
02 Benchmarking & Microbenchmarks Micro-benchmarks done right (avoiding DCE, JIT warm-up, branch-prediction noise); macro-benchmarks; statistical stability
03 Latency & Throughput Little's Law, the p99 trap, tail-at-scale, coordinated omission, queueing, budgets
04 CPU-Bound Optimization Profile-first, the memory hierarchy, branch prediction, SIMD, data layout, PGO
05 Memory & Allocation Optimization Allocation rate vs residency, escape analysis, GC pressure, allocators, GOMEMLIMIT/OOMKills
06 Concurrency & Contention Amdahl & USL, lock contention, false sharing, cache coherence, scheduler effects, scaling curves
07 Performance Budgets & Regression Testing Budgets as SLOs, benchstat/Mann-Whitney, change-point detection, CI gates, trend dashboards

The 01-profiling section is further split into four sub-topics — CPU, Memory, Allocation, and Flame Graphs — each with the full five-tier set.


Languages

Examples in Go (pprof, benchstat, runtime/trace), Java (JFR, async-profiler, JMH, GC logs), Python (cProfile, py-spy, tracemalloc, pytest-benchmark), Rust (cargo bench, criterion, perf, flamegraph), and C/C++ (perf, Intel VTune, valgrind callgrind).


Status

Content-complete. All 7 sections — including the 4 profiling sub-topics — are written across the full five-tier set (junior / middle / senior / professional / interview): 50 topic files in total.


References

  • Systems Performance — Brendan Gregg (the canonical text; the USE method)
  • Java Performance: The Definitive Guide — Scott Oaks
  • Designing Data-Intensive Applications — Martin Kleppmann (response-time chapters)
  • Methodology talks — Aleksey Shipilëv (JMH and "the magic of -XX:+PrintCompilation")
  • Tail at Scale — Dean & Barroso (the classic on p99 latency)

Project Context

Part of the Senior Project — a personal effort to consolidate the essential knowledge of software engineering in one place.