Chapter 1: The Database Landscape

"Choosing a database without understanding the landscape is like choosing a route without a map."

Mind Map

Overview

Every database in production today was designed around a specific set of trade-offs made decades ago. PostgreSQL was designed in 1986 to maximize correctness. Cassandra was designed in 2007 to maximize write throughput across data centers. Redis was designed in 2009 to maximize speed by keeping everything in memory. Understanding why each system made its original trade-offs is the prerequisite for choosing the right one for your workload.

This chapter maps the landscape: nine categories of databases, the two dominant storage engine architectures, how the Write-Ahead Log makes durability possible, and why PostgreSQL is quietly becoming the platform of choice for workloads it was never originally designed for.

A Note on PostgreSQL Focus

This guide uses PostgreSQL as the primary reference implementation for relational database concepts. This is an intentional design choice — PostgreSQL's open architecture makes internals observable and educational. However, every concept (WAL, MVCC, indexing, replication) has equivalents in MySQL, SQL Server, and Oracle. If your shop runs MySQL or DynamoDB, the principles apply — translate the specifics. Part 2 covers MySQL, NoSQL, and specialized engines in dedicated chapters.

Database Categories

Nine categories cover almost every workload you will encounter in production.

Category	Model	Query Language	Consistency	Best For
RDBMS	Tables + rows	SQL	Strong (ACID)	General-purpose structured data
Document	JSON documents	MongoDB Query / SQL	Eventual or strong	Hierarchical data, variable schema
Key-Value	Hash map	GET/SET/DEL	Eventual	Caching, sessions, counters
Wide-Column	Column families	CQL / HQL	Tunable	Time-series, write-heavy, sparse data
Graph	Nodes + edges	Cypher / Gremlin	Strong	Relationships, recommendations, fraud
Time-Series	Tagged measurements	InfluxQL / SQL	Strong	Metrics, IoT, events
Search	Inverted index	Lucene / DSL	Near real-time	Full-text search, faceted navigation
NewSQL	Tables + rows	SQL	Distributed ACID	Global transactions at scale
Vector	Embeddings	ANN queries	Eventual	Semantic search, ML retrieval

RDBMS: Still the Default

Relational databases remain the default for most applications because they handle the widest variety of queries correctly. An RDBMS stores data in tables (relations), enforces referential integrity with foreign keys, and executes any ad-hoc SQL query the application sends — including JOINs across multiple tables that were not anticipated when the schema was designed.

The cost is predictability. RDBMSes scale vertically easily but horizontal scaling (sharding, distributed transactions) requires significant engineering.

Write QPS Varies Enormously by Workload

A common rule of thumb is "10–50K write QPS on a single primary" but this is highly workload-dependent. Simple INSERTs with no secondary indexes on NVMe SSDs can exceed 100K/sec. Complex transactions with multiple indexes and foreign keys may plateau at 5K/sec. The ceiling depends on: row size, index count, transaction complexity, fsync behavior, and hardware. Use this range as a conversation starter, not a hard limit — always benchmark your specific workload before deciding to shard.

NoSQL: Designed for Specific Access Patterns

NoSQL databases traded SQL expressiveness for specific performance guarantees. Key-value stores sacrifice queryability for O(1) reads. Wide-column stores sacrifice joins for write throughput. Document stores sacrifice normalization for schema flexibility. These are not inferior choices — they are better choices for the specific workloads they were designed for.

NewSQL: Distributed SQL

NewSQL databases attempt to combine the SQL interface and ACID guarantees of traditional RDBMSes with the horizontal scalability of NoSQL systems. CockroachDB and Google Spanner achieve this through distributed consensus (Raft/Paxos) and multi-version concurrency control. The cost: higher latency on single-row operations compared to a local PostgreSQL query, and substantially higher operational complexity.

Use NewSQL when you need: SQL, ACID transactions, and geographic distribution. Do not use it as a drop-in PostgreSQL replacement — the latency characteristics are different enough to surprise you.

Evolution Timeline

The NoSQL revolution was not a rejection of relational theory — it was a pragmatic response to scaling limits that SQL databases had not yet solved. Today, PostgreSQL has absorbed many lessons from that era (native JSON columns, LISTEN/NOTIFY for streaming, logical replication for CDC) while maintaining its correctness guarantees.

2024–2026 Landscape Shifts

Three structural changes since 2024 affect how you should think about database selection.

Redis → Valkey: The Open-Source Fork

Redis relicensed from BSD to a dual SSPL/RSALv2 model in 2024. In response, the Linux Foundation chartered Valkey — a BSD-3 fork of Redis — which is now the default redis package in Fedora 42, Ubuntu 26.04 LTS, and Debian 13. AWS, GCP, and Oracle Cloud offer Valkey as a drop-in managed service.

Redis 8 (2025) re-added an AGPL option, but the ecosystem split is established. As of 2026:

	Redis	Valkey
License	BSLv1 + AGPLv3	BSD-3 (Linux Foundation)
Distro default	No	Yes (Fedora 42, Ubuntu 26.04, Debian 13)
Managed cloud	Redis Cloud	AWS/GCP/Oracle (drop-in)
Recommendation	Existing deployments	Greenfield projects

For new deployments, prefer Valkey unless you have a specific reason to use Redis's paid modules.

Vector Databases: Now Table-Stakes

As of 2026, vector search is no longer a specialist feature — it is expected infrastructure for any AI-augmented product. Three tiers have emerged:

Integrated — pgvector 0.8+ (production-grade, HNSW index, supported by all major cloud Postgres providers). Best when you already run PostgreSQL and have < 10M embeddings.
Dedicated OSS — Qdrant (Rust, excellent payload-filtered ANN), Weaviate (hybrid BM25 + ANN), Milvus (distributed, 1B+ scale). Best when vector search is a primary workload.
Managed SaaS — Pinecone Serverless, Weaviate Cloud. Best when operational overhead must be zero.

See Ch08 — Specialized Databases for a detailed selection matrix.

Edge & Serverless SQLite Renaissance

SQLite is no longer "embedded-only." Turso (libSQL) and Cloudflare D1 enable SQLite at the network edge with embedded-replica patterns: the database runs inside the application VM for microsecond-latency reads, syncing writes to a remote primary. Neon (acquired by Databricks in May 2025) provides serverless PostgreSQL with instant branch-per-PR workflows.

As of 2026

These positions reflect mid-2026 production reality. Valkey adoption in enterprise environments typically lags distro defaults by 6–12 months.

Storage Engine Fundamentals

Every database sits atop one of two fundamental storage architectures. Understanding these shapes everything else: read/write performance characteristics, crash recovery behavior, compaction overhead, and index design.

Page-Oriented Storage (B-Tree Based)

Page-oriented engines divide disk into fixed-size pages (typically 8KB in PostgreSQL, 16KB in MySQL). Data is organized as a B-tree where each node is one page. An update modifies the page in-place after writing to the WAL.

Strengths: Excellent read performance, predictable O(log n) lookups, in-place updates minimize write amplification for single-row updates.

Weaknesses: Page splits under write load cause fragmentation; random writes are slower than sequential; vacuum/bloat management required (PostgreSQL-specific).

Log-Structured Merge-Tree (LSM)

LSM engines never update data in-place. Every write appends to an in-memory buffer (memtable), which is periodically flushed to immutable sorted files on disk (SSTables). A background compaction process merges SSTables to remove deleted entries and maintain read performance.

Strengths: Sequential writes make LSM extremely fast for write-heavy workloads; used by Cassandra, RocksDB, LevelDB; excellent for time-series and append-only data.

Weaknesses: Read amplification (must check multiple levels), compaction I/O overhead (write amplification factor of 10–30×), higher space amplification during compaction.

Choosing Between Them

Workload	Recommended Engine	Reason
OLTP (many small reads and writes)	B-Tree (PostgreSQL, MySQL)	Low read latency, in-place updates
Write-heavy, high-cardinality	LSM (Cassandra, RocksDB)	Sequential writes, no locking
Analytics (large scans)	Column-oriented (Parquet, ClickHouse)	Compression, vectorized execution
Mixed OLTP + analytics	HTAP (TiDB, SingleStore)	Separate row + column stores

WAL Mechanics

The Write-Ahead Log is the mechanism that makes database durability possible without sacrificing performance. The rule is simple but profound: before any data page is modified, the change must first be written to the WAL and fsync'd to disk.

The Write Path

Why WAL Makes Durability Possible

The key insight: sequential writes to the WAL are orders of magnitude faster than random writes to data files. PostgreSQL commits a transaction the moment the WAL entry is fsynced — the actual data file update can happen later (during checkpoint) because the WAL contains enough information to replay the change.

If the server crashes between step 3 and step 5, PostgreSQL replays the WAL on restart to bring data files back to a consistent state. This is crash recovery.

WAL and Replication

The WAL is also the mechanism for streaming replication. PostgreSQL primary servers stream WAL records to standbys, which replay them to stay in sync. This is the same WAL used for crash recovery — replication is just crash recovery applied to a different machine in real-time.

WAL Level Controls What Gets Logged

wal_level = minimal — crash recovery only. wal_level = replica — crash recovery + streaming replication. wal_level = logical — full change data capture (CDC), required for logical replication and tools like Debezium.

The PostgreSQL-as-Platform Trend

PostgreSQL's extension system has matured to the point where a single PostgreSQL cluster can replace multiple specialized databases in your stack. This is not theory — Instagram, Notion, Stripe, and Shopify all run PostgreSQL as the core of their data platform.

Extension	Adds to PostgreSQL	Replaces
pgvector	Vector similarity search, ANN indexes	Pinecone, Weaviate (at moderate scale†)
TimescaleDB	Time-series hypertables, compression, continuous aggregates	InfluxDB (at moderate ingest rates†)
PostGIS	Geospatial types, spatial indexes, geographic queries	Specialized geo DBs
pg_partman	Automated table partitioning management	Manual partition scripts
Citus	Horizontal sharding, distributed queries	Standalone sharding middleware
pg_cron	Database-side job scheduling	External cron + application logic
pglogical	Logical replication, selective table sync	External CDC tools

When to Use PostgreSQL-as-Platform

The strategy works well when:

Your team already operates PostgreSQL (one less system to learn)
Workload volumes are moderate (vector search under 10M embeddings, time-series under 50GB/day)
You want ACID consistency across multiple data types in one transaction

†Performance caveat: pgvector at >5M vectors shows measurably worse recall and p99 latency compared to purpose-built HNSW implementations (Qdrant, Weaviate) based on ANN Benchmarks data. TimescaleDB compression ratios are worse than InfluxDB IOx at high cardinality per TimescaleDB vs InfluxDB benchmarks. The thresholds above (10M embeddings, 50GB/day) are rough rules of thumb — always benchmark your specific workload and query patterns. See Ch08 — Specialized Databases for detailed benchmarks.

The strategy breaks down when:

You need sub-millisecond vector ANN at >10M embeddings (dedicated vector DB wins)
You need petabyte-scale time-series ingest at >1M points/sec (dedicated TSDB with LSM wins)
You need deep Elasticsearch-style relevance scoring and faceting (dedicated search wins)

Do Not Over-Consolidate

PostgreSQL-as-Platform reduces operational overhead but can create resource contention. A heavy vector search workload on the same instance as your OLTP transactions degrades both. Use pg_bouncer connection pooling, read replicas for analytics, and connection limits per workload type.

Database Categories Deep Dive

Wide-Column Stores (Cassandra, HBase)

Wide-column stores are often misunderstood. The name refers to the data model, not column-oriented storage (which is something different used in analytics). A Cassandra table has:

A partition key that determines which node stores the row
A clustering key that determines the sort order within a partition
Columns that can vary per row (hence "wide")

The critical design rule: all queries must filter by partition key. Cassandra does not have JOINs, secondary indexes are expensive, and ALLOW FILTERING is a full table scan. You design your Cassandra schema entirely around the queries you need to run — one table per query pattern.

Graph Databases (Neo4j, TigerGraph)

Graph databases store nodes and relationships as first-class citizens. A MATCH (u:User)-[:FOLLOWS]->(v:User) WHERE u.id = 1 RETURN v query in Cypher traverses the graph by pointer chasing — no JOINs, no index lookups. This is dramatically faster than the equivalent SQL self-join for deep traversals (6+ hops).

Graph databases are the right choice when your data is inherently a network: social graphs, knowledge graphs, fraud detection (linked accounts), recommendation engines (collaborative filtering).

Vector Databases (pgvector, Pinecone)

Vector databases store dense floating-point embeddings (typically 768–1536 dimensions for language models) and answer approximate nearest neighbor (ANN) queries: "find the 10 documents whose embedding is closest to this query embedding." This powers semantic search, retrieval-augmented generation (RAG), and image similarity.

sql

-- pgvector example: find 10 most similar product descriptions
SELECT id, description, embedding <-> '[0.1, 0.3, ...]'::vector AS distance
FROM products
ORDER BY distance
LIMIT 10;

The <-> operator computes L2 distance. <#> computes inner product (for normalized vectors). <=> computes cosine distance.

Case Study: Shopify's Database Strategy

Shopify processes $75+ billion in annual merchant sales and serves millions of concurrent storefronts. Their database architecture illustrates the trade-offs of the current landscape.

Core transactional layer: MySQL (InnoDB) sharded via Vitess. Shopify chose MySQL in 2004 when PostgreSQL's replication story was immature. They stayed on MySQL because Vitess — originally built by YouTube — provides transparent horizontal sharding over MySQL with minimal application changes. As of 2023, Shopify runs hundreds of MySQL shards behind Vitess.

Analytics layer: ClickHouse for real-time merchant analytics dashboards. ClickHouse is a column-oriented OLAP database that can scan billions of rows per second on modest hardware. Shopify routes all analytical queries — "show me my sales by day over the last 90 days" — to ClickHouse rather than MySQL to avoid analytics traffic competing with transactional traffic.

Caching layer: Redis/Valkey for sessions, rate limiting, inventory locks, and hot product data. Handles ~1M operations/second per Shopify shard at sub-millisecond latency.

Search: Elasticsearch for merchant product catalog search. Full-text search with fuzzy matching, faceting, and relevance ranking.

Key Decisions:

Vitess enables horizontal MySQL scaling without rewriting the application
ClickHouse handles analytic queries at a cost 100× lower than running analytics on MySQL
Redis absorbs the hot read path — most product page views never reach MySQL
Separate search engine avoids full-text indexing overhead on the transactional database

The lesson: At Shopify's scale, polyglot persistence is not optional — it is the only architecture that works. Each system does exactly one thing at maximum efficiency.

Chapter	Relevance
Ch02 — Data Modeling for Scale	How to design schemas for your chosen database category
Ch03 — Indexing Strategies	Index types for B-tree and specialized storage engines
Ch05 — PostgreSQL in Production	Deep dive into PostgreSQL internals and tuning
System Design Ch09	SQL database selection from a system design perspective
System Design Ch10	NoSQL selection and CAP theorem application

Common Mistakes

Mistake	Why It Happens	Impact	Fix
Choosing DB by popularity instead of access pattern	"Everyone uses X" mentality	Mismatched performance characteristics from day one	Map your access patterns first, then evaluate databases
Treating NoSQL as "better SQL"	Marketing and hype	Surprised by lack of JOINs, transactions, secondary indexes	Understand what each NoSQL type trades away before adopting
Using graph DB for simple hierarchical data	"Graphs are good for relationships"	Unnecessary operational complexity for simple parent-child trees	Use PostgreSQL recursive CTEs or `ltree` for hierarchy depth < 10
Ignoring WAL level until replication is needed	Set-it-and-forget-it config	Switching from `minimal` to `replica` requires a restart	Set `wal_level = replica` from the start; the overhead is minimal
Consolidating onto PostgreSQL-as-Platform without isolation	Simplicity is appealing	OLTP and analytical/vector workloads compete for CPU and I/O	Use dedicated read replicas or separate instances for workload isolation

Practice Questions

Beginner

Storage Engine Choice: A startup is building a task management application where each task has a flexible set of custom fields (different for each organization). They expect 100K reads/day and 10K writes/day. Should they use PostgreSQL with JSONB columns, MongoDB, or a key-value store? Justify your answer with reference to the access patterns.
Model Answer
Consider: do they need ACID transactions across tasks? Do they need to query by custom field values? What is the write/read ratio? JSONB in PostgreSQL handles schema flexibility while keeping SQL expressiveness — a good default unless you have a specific reason for a document store.
WAL Recovery: A PostgreSQL server crashes immediately after a transaction commits. When the server restarts, how does PostgreSQL ensure the committed data is not lost? What role does the WAL play, and why is it sufficient even if the data page was not yet written to the main data file?
Model Answer
The WAL is fsync'd before the commit returns. On restart, PostgreSQL replays all WAL records since the last checkpoint. A committed transaction's WAL record contains all the information needed to reconstruct the data page change.

Intermediate

Database Selection: An e-commerce company wants to add three features: (a) "customers who bought this also bought" recommendations, (b) product inventory tracking with sub-millisecond reads, (c) full-text product search with faceting. For each feature, choose a database category and justify. Can any two features share a database?
Model Answer
(a) Graph DB or collaborative filtering in Redis. (b) Redis for the hot path, backed by PostgreSQL for durability. (c) Elasticsearch or PostgreSQL with tsvector for moderate scale. Redis + PostgreSQL could share infrastructure for (a) and (b).
LSM vs B-Tree: A metrics collection service ingests 500K events per second from IoT sensors, retains data for 30 days, and answers range queries like "average temperature for sensor X over the last 24 hours." Would you use a B-tree based PostgreSQL or an LSM-based time-series database? What are the specific technical reasons?
Model Answer
LSM wins: 500K writes/sec would cause severe B-tree page split contention and WAL pressure. LSM's sequential append model handles write-heavy workloads much better. TimescaleDB (PostgreSQL extension) uses chunk-based partitioning to give PostgreSQL LSM-like write characteristics for time-series.

Advanced

PostgreSQL-as-Platform Analysis: A team is considering consolidating their current stack (PostgreSQL + Redis + Elasticsearch + Pinecone) onto a single PostgreSQL instance with pgvector, full-text search, and pg_cron. They process 50K transactions/day, 5M vector searches/day, and 2M full-text searches/day. Analyze the trade-offs of consolidation vs. separation. What would you recommend and why?
Model Answer
5M vector searches/day is ~58/sec, and 2M full-text searches/day is ~23/sec. At this scale, consolidation is feasible with proper connection pooling and read replicas. The risk is resource contention between OLTP (CPU + I/O for row fetches) and vector search (CPU-intensive ANN computation). A read replica dedicated to search + vector workloads solves this. Pinecone is only necessary above ~50M embeddings at sub-100ms p99 — below that, pgvector with an HNSW index is comparable.

References & Further Reading

The PostgreSQL Documentation — Storage Architecture
Bigtable: A Distributed Storage System for Structured Data — Chang et al., Google (2006)
Dynamo: Amazon's Highly Available Key-Value Store — DeCandia et al. (2007)
The Log-Structured Merge-Tree — O'Neil et al. (1996)
Vitess: Scaling MySQL at YouTube — Sougou et al.
"Designing Data-Intensive Applications" — Martin Kleppmann, Chapters 2–3
pgvector: Open-Source Vector Similarity Search for Postgres

Part 1: Foundations

Part 2: Engine Deep Dives

Part 3: Scaling & Operations

Part 4: Real-World Design

Chapter 1: The Database Landscape

Mind Map

Overview

Database Categories

RDBMS: Still the Default

NoSQL: Designed for Specific Access Patterns

NewSQL: Distributed SQL

Evolution Timeline

2024–2026 Landscape Shifts

Redis → Valkey: The Open-Source Fork

Vector Databases: Now Table-Stakes

Edge & Serverless SQLite Renaissance

Storage Engine Fundamentals

Page-Oriented Storage (B-Tree Based)

Log-Structured Merge-Tree (LSM)

Choosing Between Them

WAL Mechanics

The Write Path

Why WAL Makes Durability Possible

WAL and Replication

The PostgreSQL-as-Platform Trend

When to Use PostgreSQL-as-Platform

Database Categories Deep Dive

Wide-Column Stores (Cassandra, HBase)

Graph Databases (Neo4j, TigerGraph)

Vector Databases (pgvector, Pinecone)

Case Study: Shopify's Database Strategy

Common Mistakes

Practice Questions

Beginner

Intermediate

Advanced

References & Further Reading

Chapter 1: The Database Landscape ​

Mind Map ​

Overview ​

Database Categories ​

RDBMS: Still the Default ​

NoSQL: Designed for Specific Access Patterns ​

NewSQL: Distributed SQL ​

Evolution Timeline ​

2024–2026 Landscape Shifts ​

Redis → Valkey: The Open-Source Fork ​

Vector Databases: Now Table-Stakes ​

Edge & Serverless SQLite Renaissance ​

Storage Engine Fundamentals ​

Page-Oriented Storage (B-Tree Based) ​

Log-Structured Merge-Tree (LSM) ​

Choosing Between Them ​

WAL Mechanics ​

The Write Path ​

Why WAL Makes Durability Possible ​

WAL and Replication ​

The PostgreSQL-as-Platform Trend ​

When to Use PostgreSQL-as-Platform ​

Database Categories Deep Dive ​

Wide-Column Stores (Cassandra, HBase) ​

Graph Databases (Neo4j, TigerGraph) ​

Vector Databases (pgvector, Pinecone) ​

Case Study: Shopify's Database Strategy ​

Related Chapters ​

Common Mistakes ​

Practice Questions ​

Beginner ​

Intermediate ​

Advanced ​

References & Further Reading ​

Chapter 1: The Database Landscape

Mind Map

Overview

Database Categories

RDBMS: Still the Default

NoSQL: Designed for Specific Access Patterns

NewSQL: Distributed SQL

Evolution Timeline

2024–2026 Landscape Shifts

Redis → Valkey: The Open-Source Fork

Vector Databases: Now Table-Stakes

Edge & Serverless SQLite Renaissance

Storage Engine Fundamentals

Page-Oriented Storage (B-Tree Based)

Log-Structured Merge-Tree (LSM)

Choosing Between Them

WAL Mechanics

The Write Path

Why WAL Makes Durability Possible

WAL and Replication

The PostgreSQL-as-Platform Trend

When to Use PostgreSQL-as-Platform

Database Categories Deep Dive

Wide-Column Stores (Cassandra, HBase)

Graph Databases (Neo4j, TigerGraph)

Vector Databases (pgvector, Pinecone)

Case Study: Shopify's Database Strategy

Related Chapters

Common Mistakes

Practice Questions

Beginner

Intermediate

Advanced

References & Further Reading