Skip to content

Databases Roadmap

The vendor-agnostic concept layer for databases — the one canonical home for the ideas that apply regardless of engine: the relational model, transactions, consistency, indexing, replication, sharding, and the SQL-vs-NoSQL decision.

How this fits with the rest of the repo

Databases live at several altitudes. To avoid duplication, each altitude has one home and links to the others:

Layer Home What it covers
Concepts (this roadmap) Backend/databases/ Vendor-agnostic theory: ACID, normalization, CAP, indexing, replication, sharding
Specific engines postgresql-dba · mongodb · redis · elasticsearch How a concrete engine implements & operates these concepts
Engine internals computer-science/20-database-internals How storage works inside: LSM vs B-tree, MVCC implementation, WAL, query planner
At system-design altitude system-design/12-databases Choosing a datastore for a system; sharding/replication as scaling & availability strategies; CAP tradeoffs in distributed design
On-disk data structures DSA · trees · advanced B-tree, B+ tree, LSM-tree, Merkle tree as data structures

Rule of thumb: "What is ACID / how does an index work" → here. "Should I shard this service / which DB for this system" → system-design.


Topics

Data modeling

  1. Relational Model — relations, keys, constraints, relational algebra
  2. Normalization & Denormalization — 1NF→BCNF, when to denormalize
  3. ER Modeling — entities, relationships, cardinality, ER → schema
  4. SQL: DDL / DML / DQL / DCL — the four sublanguages
  5. Views — virtual & materialized views
  6. Stored Procedures & Triggers — server-side logic, pros/cons

Transactions & consistency

  1. Transactions & ACID — atomicity, consistency, isolation, durability
  2. Isolation Levels — read phenomena, the four SQL levels, snapshot isolation
  3. Locking & Concurrency Control — pessimistic vs optimistic, deadlocks
  4. MVCC — multi-version concurrency (concept; impl in engine-internals)
  5. BASE & Eventual Consistency — the NoSQL relaxation
  6. CAP Theorem — consistency / availability / partition tolerance
  7. PACELC — the latency-vs-consistency extension of CAP

Performance

  1. Indexing — B-tree/hash/GIN/GiST concepts, composite indexes, when not to index
  2. Query Optimization — EXPLAIN, join strategies, statistics

Scaling & distribution

  1. Replication — leader-follower, multi-leader, sync vs async, lag
  2. Partitioning & Sharding — horizontal/vertical, shard keys, rebalancing
  3. Database Federation — splitting by function
  4. SQL vs NoSQL — the decision and its tradeoffs
  5. NoSQL Data Models — key-value, document, wide-column, graph

Operations & analytics

  1. OLTP vs OLAP & Warehousing — transactional vs analytical, star schema
  2. Connection Pooling — sizing, PgBouncer/HikariCP
  3. Caching at the DB Layer — buffer pool, query cache, read-through
  4. Backup & Recovery — WAL archiving, PITR, RTO/RPO

roadmap.sh coverage (Computer Science → Databases)

Every subtopic of the roadmap.sh CS Databases node maps here:

roadmap.sh Here
SQL vs NoSQL 19
Normalization / Denormalization 02
Entity-Relationship Model 03
DDL, DML, DQL, DCL 04
Locking 09
ACID Model 07
BASE 11
CAP Theorem 12
PACELC 13
Indexes 14
Views 05
Transactions 07
Stored Procedures 06
Database Federation 18
Replication 16
Sharding 17

Status: skeleton — folders created, content to be written.