Distributed Systems¶
The body of theory and practice for building systems out of independent failing components. Once a system spans more than one machine, the rules change — partial failure becomes the norm, time becomes ambiguous, and consistency stops being free.
Content under this section is being filled in. The structure below shows the planned coverage; pages marked coming soon link to themselves until written.
Planned sections¶
- CAP & PACELC Theorems — the foundational impossibility results; what partition tolerance actually costs and why "AP vs CP" is a simplification.
- Consensus — Paxos, Raft, Multi-Paxos, ZAB; leader election, log replication, and the cost of agreement.
- Replication — single-leader, multi-leader, leaderless; synchronous vs asynchronous; replication lag and read-your-writes.
- Sharding — partitioning strategies (range, hash, geographic); rebalancing; cross-shard queries and joins.
- Distributed Transactions — 2PC and its descendants, Sagas, TCC, and why most "transactions" across services aren't really transactions.
- Event-Driven — event sourcing, CQRS, log-as-source-of-truth; choreography vs orchestration.
- Vector Clocks & CRDTs — capturing causality without a global clock; convergent and commutative data types for offline-tolerant systems.
- Service Mesh — Istio, Linkerd; the data-plane vs control-plane split; mTLS, retries, circuit breaking moved out of application code.
- Resilience Patterns — circuit breakers, bulkheads, timeouts, retries with jitter, hedged requests, backpressure.
- Distributed Tracing — OpenTelemetry, span propagation, sampling strategies; making cross-service latency observable.
Why this matters¶
Most failure modes in modern systems are distributed-systems failures wearing application clothing: a timeout that looked like a bug, a cache that lost coherence, a service that retried into a thundering herd, a "consistent" read that wasn't. The patterns in this roadmap give those failures names and standard cures.
Related¶
- System Design — distributed-systems primitives assembled into recognisable architectures.
- Architecture Anti-Patterns — Distributed Monolith, The Knot, Database-as-IPC — the failure modes distributed-systems discipline prevents.
- Backend → API Design — boundary contracts between services.
- Backend → Redis — the most common building block for caching, queueing, and lightweight coordination.
References¶
- Designing Data-Intensive Applications — Martin Kleppmann (2017) — the modern canonical reference; replication, consensus, stream processing in one book.
- Database Internals — Alex Petrov (2019) — storage engines and distributed-storage internals.
- Distributed Systems — Maarten van Steen & Andrew Tanenbaum (4th ed., 2023) — academic foundation.
- Designing Distributed Systems — Brendan Burns (2018) — patterns for containerised distributed systems.
- The Tail at Scale — Dean & Barroso (2013) — why latency variance dominates large fan-out systems.
Project Context¶
Part of the Senior Project — a personal effort to consolidate the essential knowledge of software engineering in one place.