Database Engineering
"The database is never just a detail. It is the center of gravity of every production system."
What You'll Master
This guide covers the full spectrum of database engineering — from choosing the right data model to tuning queries under production load. You will understand why PostgreSQL uses MVCC, how sharding decisions break apart at the seams, and what Instagram actually changed to serve 1 billion photos a day.
Companion reading for System Design Ch09 — Databases: SQL and Ch10 — Databases: NoSQL, which cover the selection layer. This guide goes deeper: internals, operations, and architecture.
The Learning Path
Work through the parts sequentially — each builds on concepts from the previous.
Part 1 — Foundations
Storage engines, data models, indexes, and transactions. The non-negotiable bedrock every database engineer must own.
| # | Chapter | Difficulty | ~Time |
|---|---|---|---|
| 01 | The Database Landscape | Beginner | 30 min |
| 02 | Data Modeling for Scale | Intermediate | 35 min |
| 03 | Indexing Strategies | Intermediate | 40 min |
| 04 | Transactions & Concurrency Control | Advanced | 45 min |
Part 2 — Engine Deep Dives
Under the hood of PostgreSQL, MySQL, the major NoSQL families, and specialized engines for time-series, search, and vectors.
| # | Chapter | Difficulty | ~Time |
|---|---|---|---|
| 05 | PostgreSQL in Production | Intermediate | 40 min |
| 06 | MySQL & Distributed SQL | Intermediate | 35 min |
| 07 | NoSQL at Scale | Intermediate | 45 min |
| 08 | Specialized Databases | Intermediate | 50 min |
Part 3 — Scaling & Operations
How to keep databases alive, fast, and consistent when they grow beyond a single machine.
| # | Chapter | Difficulty | ~Time |
|---|---|---|---|
| 09 | Replication & High Availability | Advanced | 50 min |
| 10 | Sharding & Partitioning | Advanced | 45 min |
| 11 | Query Optimization & Performance | Advanced | 50 min |
| 12 | Backup, Migration & Disaster Recovery | Intermediate | 55 min |
Part 4 — Real-World Design
Full case studies from companies that solved hard database problems at scale. Specific numbers, specific decisions, specific regrets.
| # | Chapter | Difficulty | ~Time |
|---|---|---|---|
| 13 | Instagram: PostgreSQL at Scale | Advanced | 30 min |
| 14 | Discord: Data Layer Evolution | Advanced | 30 min |
| 15 | Uber: Geospatial Database Design | Advanced | 30 min |
| 16 | Database Selection Framework | Intermediate | 35 min |
Prerequisites
Before starting Part 1, you should be comfortable with:
- [ ] Basic SQL (SELECT, JOIN, GROUP BY)
- [ ] What a primary key and foreign key are
- [ ] General understanding of how a web application uses a database
You do not need to be a DBA. This guide teaches database internals from first principles.
If you want the big-picture view of database selection first, read System Design Ch09 and Ch10 before starting here.
How to Use This Guide
- Read Part 1 completely — indexes and transactions are referenced in every subsequent chapter
- Run the SQL examples — theory without execution is incomplete; use a local PostgreSQL instance
- Draw the diagrams — B-tree traversal, MVCC tuple chains, and replication topologies are learned by sketching
- Attempt the practice questions — each chapter has three difficulty tiers
- Return to case studies — Part 4 is most valuable after completing Parts 1–3
Total estimated time: ~11 hours across 16 chapters
$ cat handbook --sections
database/
part-1-foundations/ 4 chapters (~2.5 hrs)
part-2-engines/ 4 chapters (~2.8 hrs)
part-3-operations/ 4 chapters (~3.3 hrs)
part-4-real-world/ 4 chapters (~2.1 hrs)
Comments powered by Giscus. Enable GitHub Discussions on the repo to activate.