Skip to content

Distributed Systems

The body of theory and practice for building systems out of independent failing components. Once a system spans more than one machine, the rules change — partial failure becomes the norm, time becomes ambiguous, and consistency stops being free.

Content under this section is being filled in. The structure below shows the planned coverage; pages marked coming soon link to themselves until written.


Planned sections

  • CAP & PACELC Theorems — the foundational impossibility results; what partition tolerance actually costs and why "AP vs CP" is a simplification.
  • Consensus — Paxos, Raft, Multi-Paxos, ZAB; leader election, log replication, and the cost of agreement.
  • Replication — single-leader, multi-leader, leaderless; synchronous vs asynchronous; replication lag and read-your-writes.
  • Sharding — partitioning strategies (range, hash, geographic); rebalancing; cross-shard queries and joins.
  • Distributed Transactions — 2PC and its descendants, Sagas, TCC, and why most "transactions" across services aren't really transactions.
  • Event-Driven — event sourcing, CQRS, log-as-source-of-truth; choreography vs orchestration.
  • Vector Clocks & CRDTs — capturing causality without a global clock; convergent and commutative data types for offline-tolerant systems.
  • Service Mesh — Istio, Linkerd; the data-plane vs control-plane split; mTLS, retries, circuit breaking moved out of application code.
  • Resilience Patterns — circuit breakers, bulkheads, timeouts, retries with jitter, hedged requests, backpressure.
  • Distributed Tracing — OpenTelemetry, span propagation, sampling strategies; making cross-service latency observable.

Why this matters

Most failure modes in modern systems are distributed-systems failures wearing application clothing: a timeout that looked like a bug, a cache that lost coherence, a service that retried into a thundering herd, a "consistent" read that wasn't. The patterns in this roadmap give those failures names and standard cures.


  • System Design — distributed-systems primitives assembled into recognisable architectures.
  • Architecture Anti-Patterns — Distributed Monolith, The Knot, Database-as-IPC — the failure modes distributed-systems discipline prevents.
  • Backend → API Design — boundary contracts between services.
  • Backend → Redis — the most common building block for caching, queueing, and lightweight coordination.

References

  • Designing Data-Intensive Applications — Martin Kleppmann (2017) — the modern canonical reference; replication, consensus, stream processing in one book.
  • Database Internals — Alex Petrov (2019) — storage engines and distributed-storage internals.
  • Distributed Systems — Maarten van Steen & Andrew Tanenbaum (4th ed., 2023) — academic foundation.
  • Designing Distributed Systems — Brendan Burns (2018) — patterns for containerised distributed systems.
  • The Tail at Scale — Dean & Barroso (2013) — why latency variance dominates large fan-out systems.

Project Context

Part of the Senior Project — a personal effort to consolidate the essential knowledge of software engineering in one place.