Skip to contentSkip to content
0/47 chapters completed (0%)

Chapter 16: Database Selection Framework

"There is no best database. There is only the right database for your access pattern, consistency requirement, and operational capacity."

Mind Map

Prerequisites

This chapter synthesizes concepts from the entire guide. You should be familiar with the full database taxonomy from Ch01 and consistency models from Ch04. The real-world case studies (Ch13Ch15) illustrate the consequences of selection decisions made under this framework.

The Decision Framework

Database selection is an architectural decision that is expensive to reverse. A startup that chooses MongoDB in year 1 and needs to migrate to PostgreSQL in year 3 faces weeks of engineering time, significant risk, and possible downtime — exactly the pattern Discord experienced in Chapter 14.

The framework below applies four sequential filters. Stop when you have a clear answer.

Decision Factors: Priority Order

These five factors determine database selection in priority order. Do not skip a factor with a clear answer to reach a preferred answer.

Factor 1: Consistency Requirement

Consistency NeedImplicationExample
Strong (ACID)All writes must be immediately visible; no lost updatesBank balance, inventory count, order state
Read-your-writesA user sees their own writes immediatelySocial media profile update
EventualWrites propagate eventually; brief inconsistency acceptableLike counts, view counts, follower counts
CausalOperations with causal dependency are ordered correctlyComment threads, version history

Consistency is Non-Negotiable for Financial Data

Never choose eventual consistency for data where inconsistency has monetary consequences. A user's account balance must use ACID transactions. "We'll reconcile later" has ended companies. Start with the consistency requirement and let it constrain your options — do not start with a preferred database and rationalize the consistency story.

Factor 2: Read/Write Ratio

PatternTypical RatioRecommended Strategy
Read-heavy (content)100:1 readsPrimary + many read replicas; aggressive caching
Write-heavy (logging)1:100 readsLSM store (Cassandra, ScyllaDB); avoid B-tree
Balanced (OLTP)10:1 readsPostgreSQL or MySQL; PgBouncer pooling
Mixed OLTP + OLAPComplexSeparate OLTP and OLAP stores; CDC pipeline

Instagram's feed (100:1 read ratio) drove their aggressive denormalization and replica strategy, covered in Chapter 13. Discord's messages (heavily write-biased, append-only) drove the move to an LSM-based store in Chapter 14.

Factor 3: Data Model

Data ModelBest FitDo Not Use When
RelationalStructured, normalized, JOIN-heavyData has no natural foreign-key relationships
DocumentNested, variable schema, entity-centricYou need to query across entity boundaries
Wide-columnAppend-heavy, high-cardinality partition keyYou need random updates or row-level locking
Key-ValueO(1) lookup by known key, no range queriesYou need to query by value or range scan
Time-SeriesMonotonically increasing timestamp, same schemaSchema varies across measurements
GraphRelationship traversal 3+ hops deepMost queries access by node ID, not edge traversal
VectorSemantic similarity, embedding-based retrievalYou need exact match, not approximate

Factor 4: Scale Projection

Scale projections should be 12-month forecasts, not 10-year fantasies. Over-engineering for scale that never arrives wastes engineering effort and adds operational complexity that slows development.

The most common mistake: choosing Tier 3 or Tier 4 infrastructure for a Tier 1 workload. A startup processing 1,000 writes/day does not need Cassandra. A single PostgreSQL instance on a $100/month server handles this trivially and offers better ACID guarantees.

Factor 5: Operational Burden

FactorManaged CloudSelf-Hosted
Setup timeMinutesDays–weeks
Backup managementAutomaticManual or scripted
FailoverAutomaticManual + runbooks
Version upgradesManagedManual, disruptive
Monthly costHigher (2–4×)Lower
ControlLimitedFull
Team expertise neededLowHigh

For teams with fewer than 5 engineers, managed databases (RDS, Cloud SQL, PlanetScale, Neon) almost always win on total engineering cost, even though the monthly bill is higher than self-hosted.

Cost Comparison

Monthly cost estimates for a production workload: 1TB of data, 1,000 writes/sec, 10,000 reads/sec, single region.

OptionMonthly CostHidden CostsTCO Notes
RDS PostgreSQL (db.r6g.2xlarge)~$700/moRead replica +$700, backups +$100, data transferPredictable; managed failover included
Aurora PostgreSQL (Serverless v2)~$400–1,200/moScales with ACU usage; I/O costs add upGood for variable workloads; Aurora Global ~2×
DynamoDB (on-demand)~$500–1,500/moRead/write capacity units add up with variable load; GSI doubles costOps-free; cost unpredictable at scale
CockroachDB Cloud (Dedicated)~$1,500/moEgress, cross-region replicationDistributed SQL; 3-node minimum
Cassandra self-hosted (3×c5.2xlarge)~$600/mo (EC2 only)Ops time: 8–16h/month; expertise; monitoringCheapest at scale; high operational tax
PlanetScale (Business)~$400/moBranches free; connection limits on lower tiersMySQL + Vitess; zero-downtime migrations
Neon (Pro)~$200/moCompute scales with usage; only PostgreSQL-compatibleServerless PG; great for dev/test, growing for prod

The Hidden Cost of Self-Hosted

Self-hosted Cassandra at 3 nodes costs ~$600/month in EC2. But a senior engineer costs ~$200/hour. If Cassandra consumes 10 hours/month of SRE time (upgrades, tuning, incidents), the real cost is $600 + $2,000 = $2,600/month — more than CockroachDB Cloud's managed offering. Calculate Total Cost of Ownership, not just infrastructure cost.

Polyglot Persistence Patterns

When to Use Multiple Databases

Polyglot persistence — using different databases for different workloads — is the right architecture when:

  1. A workload has fundamentally different access patterns (OLTP vs. OLAP)
  2. A workload requires capabilities absent from the primary database (full-text search, vector search)
  3. A workload has different consistency requirements (ACID for transactions, eventual for caches)

Common Production Combinations

Use CasePrimary DBAdd-On DBReason
E-commercePostgreSQLRedis + ElasticsearchCache product pages; search catalog
MessagingCassandra or ScyllaDBRedis + PostgreSQLLSM for messages; PG for user accounts
Ride-sharingMySQL/DocstoreRedis + ClickHouseLocation cache; trip analytics
SaaSPostgreSQLRedis + pgvectorCache sessions; AI semantic search
GamingRedisPostgreSQLLeaderboards in Redis; player data in PG

Anti-Pattern: Different DB Per Microservice

One dangerous pattern is assigning a different database technology to each microservice, justified by "each service should own its data store."

This pattern creates operational overhead proportional to the number of database technologies. Each requires separate expertise, monitoring, backup procedures, and incident runbooks. The result: a 6-person engineering team operating 6 different database systems, spending more time on database operations than on product features.

Rule of thumb: Until you have dedicated DBAs or SREs, use at most 2–3 database technologies: one OLTP store (PostgreSQL), one cache (Redis), and optionally one search engine (Elasticsearch or Typesense) or analytics store (ClickHouse).

Migration Strategy Matrix

When to Migrate

SymptomLikely CauseMigration Target
P99 read latency > 500ms on simple queriesIndex not in RAM; connection saturationAdd read replicas; add PgBouncer
Write throughput ceiling (~10K writes/sec)Single primary bottleneckSharding or LSM-based store
Schema changes take hoursLarge table ALTER TABLEPlanetScale; pt-online-schema-change
ACID not required; writes dominateWrong data modelCassandra / ScyllaDB
Need full-text search on large textPostgreSQL tsvector not sufficientElasticsearch sidecar
Joins across millions of rows are slowDenormalization needed or wrong DBDenormalize schema; consider document store

Never Migrate for the Wrong Reason

Common wrong reasons: "NoSQL is more scalable" (not always true; PostgreSQL scales to 100M+ rows), "MongoDB is more flexible" (JSONB in PostgreSQL offers equal flexibility), "everyone is using Cassandra" (without matching write volume). Always diagnose the specific bottleneck before choosing a migration target.

Migration Patterns

Three patterns cover most database migrations:

PatternBest ForRiskDowntime
CDCLive databases; any sizeReplication lag during cut-overNear-zero
Dual-WriteSmall-to-medium; when CDC unavailableWrite amplification; consistency windowNear-zero
Blue-GreenLarge databases; full-fidelity copyLong preparation timeNear-zero

From → To Decision Table

FromToWhyPattern
MongoDBPostgreSQLACID needed; joins neededCDC (Debezium)
MongoDBCassandraWrite volume exceeds MongoDB capacityDual-write + backfill
PostgreSQL (single)PostgreSQL (sharded)Write ceiling hitDual-write; shard by domain key
MySQLPostgreSQLJSON, JSONB, arrays; PostGIS neededCDC (AWS DMS or pglogical)
Cassandra 3.xScyllaDBJVM GC pauses; reduce node countCQL-compatible; dual-write
Self-hostedCloud managedTeam too small for ops burdenBackup restore into managed service
Separate OLTP + OLAPPostgreSQL + ClickHouseAnalytics degrading OLTP performanceCDC from OLTP → ClickHouse

These four trends have concrete engineering implications for database selection decisions you are making today. Each trend includes specific adoption thresholds so you can assess whether it applies to your workload.

Trend 1: PostgreSQL Extensions Replace 80% of Specialized DBs

By 2025, PostgreSQL with extensions handles use cases that previously required separate databases:

ExtensionReplacesLimitation
pgvector + HNSW indexPinecone, Weaviate (< 50M vectors)ANN recall degrades above ~5–10M vectors (as of pgvector 0.7, 2024); purpose-built indexes (Pinecone, Qdrant) maintain recall at 50M+ — see ANN Benchmarks
TimescaleDBInfluxDB (< 50GB/day ingest)InfluxDB 3.0 (Apache Arrow engine, 2024) benchmarks higher at sustained >1M points/sec ingest
PostGISDedicated geo DBsUber-scale real-time location still needs H3 + custom
pg_search (ParadeDB)Elasticsearch (< 100M documents)Elasticsearch still wins for complex relevance scoring at scale (100M+ docs, as of 2024); ParadeDB is pre-1.0
CitusCustom sharding middlewareCockroachDB has better distributed transaction story

Benchmarks Last Verified: Early 2025

The thresholds above (pgvector recall at >5M vectors, pg_search at >100M documents, TimescaleDB ingest rates) are approximate crossover points observed in benchmarks as of early 2025. These numbers shift with every release — pgvector 0.8 added parallel index builds, ParadeDB is rapidly improving. Always benchmark your specific workload before making a database selection based on published thresholds.

The implication: start with PostgreSQL + extensions. Only migrate to a specialized system when you hit a measurable limit.

Trend 2: Serverless Databases

Serverless databases scale to zero when idle and scale instantly under load. Relevant for development environments, variable-traffic applications, and startups:

ProductBased OnKey Feature
NeonPostgreSQLBranching (copy-on-write database branches for dev/test)
PlanetScaleMySQL + VitessSchema branching; zero-downtime migrations
Aurora Serverless v2MySQL/PostgreSQL0.5 ACU minimum; per-second billing
TursolibSQL (SQLite fork)Edge-distributed; 10ms reads from 35+ edge locations
Cloudflare D1SQLiteEdge SQLite in Cloudflare Workers; zero ops

Serverless is Not for All Workloads

Serverless databases that scale to zero have cold-start latency (100–500ms for the first query after idle). For production APIs with consistent traffic, a reserved-instance database is cheaper and faster. Serverless shines for: staging environments, low-traffic microservices, development databases, and variable-traffic multi-tenant SaaS.

Trend 3: AI-Native Vector Storage

Every major database is adding vector support in 2024–2025:

DatabaseVector SupportIndex TypeNotes
PostgreSQL + pgvectorNative extensionHNSW, IVFFlatBest for < 5–10M vectors with ACID (as of 0.7/0.8, 2024)
MongoDB AtlasAtlas Vector SearchHNSWCombines document + vector in one system
CassandraDataStax AstraSAI (Storage-Attached Index)Cassandra-scale vector search
RedisRedis Vector SimilarityHNSW, FLATIn-memory; low latency
DynamoDBNot nativeRequires separate vector layer

The trend: vector search is becoming a primitive in every database, not a specialized system. For new applications, start with PostgreSQL + pgvector unless you project > 5–10 million embeddings with < 100ms p99 ANN latency.

Trend 4: Edge Databases

Applications running at the network edge (CDN PoPs, Cloudflare Workers, Fastly Compute) need databases that can respond locally:

Edge databases accept eventual consistency (reads from local replica may lag primary by 100–500ms) in exchange for sub-10ms read latency globally.

Case Study: Database Selection from Startup to Scale

This case study applies every framework principle to a single application evolving from MVP to 10M users.

Application: "EventFlow" — an event ticketing platform where users discover events, purchase tickets, and receive real-time venue updates.

MVP Phase (0–10K users)

Requirement: Manage events, tickets, users, and payments. Consistency is critical (ticket oversell = product failure). Team: 3 engineers.

Decision: PostgreSQL on RDS (db.t4g.medium, $50/month).

sql
-- EventFlow MVP schema (single PostgreSQL database)
CREATE TABLE events (
    id           BIGSERIAL PRIMARY KEY,
    name         TEXT NOT NULL,
    venue_id     BIGINT REFERENCES venues(id),
    start_time   TIMESTAMPTZ NOT NULL,
    capacity     INT NOT NULL,
    tickets_sold INT NOT NULL DEFAULT 0
);

-- Prevent oversell with optimistic locking
UPDATE events
SET tickets_sold = tickets_sold + 1
WHERE id = ? AND tickets_sold < capacity
RETURNING id;
-- If 0 rows updated: sold out

No Redis, no Elasticsearch, no message queue. YAGNI.

Growth Phase (10K–500K users)

Symptom: Event pages load slowly during high-traffic launches. Database CPU spikes to 90% on event release day.

Decision: Add Redis for hot event data and a read replica for search.

python
# Cache hot event data with 60-second TTL
def get_event(event_id: int) -> Event:
    cached = redis.get(f"event:{event_id}")
    if cached:
        return Event.from_json(cached)

    event = db.query("SELECT * FROM events WHERE id = %s", [event_id])
    redis.setex(f"event:{event_id}", 60, event.to_json())
    return event

Also add Elasticsearch for event discovery (search by name, location, category). Sync via Debezium CDC from PostgreSQL.

Scale Phase (500K–10M users)

Symptom: Ticket purchase writes during major events (10K tickets/second during Taylor Swift release) overwhelm the single PostgreSQL primary.

Decision: Shard tickets table by event_id. Events table remains on unsharded PostgreSQL (it is read-mostly and fits on one machine).

Final architecture at 10M users:

ComponentDatabasePurpose
User accounts, eventsPostgreSQL (single)ACID, normalized
Ticket purchasesPostgreSQL (sharded × 8 by event_id)Write throughput
Session cacheRedisSub-ms session lookup
Event discoveryElasticsearchFull-text + geo search
AnalyticsClickHouseBusiness intelligence

Key insight: The architecture grew incrementally to match actual load. Starting with the Scale Phase architecture at MVP would have taken 3× longer to build and introduced operational complexity that slowed the team down.

Summary Checklist

Before finalizing any database selection decision:

  • [ ] What is the exact access pattern? (key lookup, range scan, full-text, geospatial, graph traversal)
  • [ ] What consistency level is required? (ACID, read-your-writes, eventual)
  • [ ] What is the 12-month write volume projection?
  • [ ] What is the read/write ratio?
  • [ ] Does the team have operational expertise for this database?
  • [ ] Is a managed service available that reduces operational burden?
  • [ ] Have you tried to solve the problem with PostgreSQL + extensions first?
  • [ ] If choosing a specialized database, what is the migration path back if requirements change?

Common Mistakes

MistakeWhy It HappensImpactFix
Choosing technology before defining access patternsEngineers have database preferences; requirements feel vague early onSchema, indexes, and sharding strategy all become wrong for the actual query shapeWrite out the top 5 queries and their expected QPS before evaluating any database
Adding databases to the stack without consolidation analysisSpecialized DBs feel exciting; each team picks their ownPolyglot sprawl — operational burden grows faster than the benefit, cross-DB consistency impossibleApply the consolidation checklist: can PostgreSQL + an extension solve this? Only add a DB if the answer is clearly no
Treating the framework as a one-time decisionSelection happens at project start and is never revisitedDatabase is kept past its useful life even after requirements shiftRe-evaluate the top decision factor every 6 months or after a major scale change
Underestimating operational burden of self-hosted distributed databasesBenchmarks and demos show only the happy pathCassandra or CockroachDB self-hosted requires dedicated expertise; teams without it face silent data loss and unavailabilityUse managed services unless the team has already operated the database in production
ChapterRelevance
Ch01 — Database LandscapeThe full taxonomy of database categories this framework navigates
Ch04 — Transactions & ConcurrencyConsistency models that drive the first decision factor
Ch09 — Replication & High AvailabilityReplication strategies for the growth phase
Ch10 — Sharding & PartitioningSharding options for the scale phase
Ch13 — Instagram: PostgreSQL at ScalePostgreSQL sharding in production
Ch14 — Discord: Data Layer EvolutionCase study of a migration driven by wrong initial choice
Ch15 — Uber: Geospatial DesignCustom database engineering for geospatial requirements

Quick Reference Card

A one-page summary of database selection criteria across all categories covered in this guide.

Database Categories at a Glance

CategoryExamplesConsistencyMax Write QPS (single node)Query FlexibilityOperational ComplexityBest For
RDBMSPostgreSQL, MySQLStrong (ACID)10K–100K†Any SQL query, JOINsMediumGeneral-purpose, OLTP
DocumentMongoDB, CouchDBConfigurable20K–50KFlexible JSON queriesLow–MediumVariable schema, rapid dev
Key-ValueRedis, DynamoDBEventual/Strong100K+ (Redis in-memory)GET/SET by key onlyLowCaching, sessions, counters
Wide-ColumnCassandra, ScyllaDBTunable50K–500KPartition key requiredHighWrite-heavy, time-series
GraphNeo4j, TigerGraphStrong5K–20KTraversal queriesMediumSocial graphs, fraud, recommendations
Time-SeriesClickHouse, TimescaleDBStrong/Eventual1M–4M (ClickHouse)SQL + time functionsMedium–HighMetrics, IoT, analytics
SearchElasticsearch, MeilisearchNear real-time10K–50K (indexing)Full-text, facetedHigh (Elasticsearch)Full-text search, logs
NewSQLCockroachDB, TiDB, SpannerDistributed ACID10K–50K per nodeFull SQLHighGlobal ACID at scale
Vectorpgvector, Pinecone, QdrantEventualVaries‡ANN similarityLow (managed)Semantic search, RAG, ML

†Highly workload-dependent. See Ch01 for nuance. ‡Vector DB QPS depends on index type, dimensions, and recall target. pgvector HNSW at 768 dimensions: ~500–2,000 QPS per core (as of 0.7, 2024).

Decision Cheat Sheet

QuestionIf Yes →If No →
Need ACID transactions?RDBMS or NewSQLConsider NoSQL
Need JOINs across entities?RDBMSDocument or Key-Value
Write-heavy (>50K writes/sec)?Wide-Column or Time-SeriesRDBMS handles it
Need global distribution?NewSQL (CockroachDB, Spanner)Single-region RDBMS
Schema changes frequently?Document DB or RDBMS + JSONBStrict RDBMS schema
Need full-text search?Elasticsearch (large scale) or PostgreSQL GIN (moderate)Skip search engine
Need vector similarity?pgvector (<5M) or dedicated vector DB (>5M) — thresholds shift with releasesNot applicable
Team < 5 engineers?Managed service (RDS, Atlas, DynamoDB)Self-hosted is viable
Budget-constrained?PostgreSQL (free, extensible)Evaluate managed options

Cost Quick Reference (1TB, 1K writes/sec, single region)

OptionMonthly CostOps Burden
RDS PostgreSQL (db.r6g.xlarge)~$300–500Low
Aurora PostgreSQL Serverless v2~$400–1,200Very Low
DynamoDB on-demand~$500–1,500None
CockroachDB Cloud~$800–1,500Very Low
Self-hosted Cassandra (3 nodes)~$600 + ops timeHigh

Practice Questions

Beginner

  1. Framework Application: A startup is building a recipe sharing app. Users can search recipes by ingredient, cuisine, and dietary restriction. Recipes have a flexible set of tags. The team has 2 engineers and expects 50,000 active users in year 1. Walk through the decision framework and choose a database (or combination). Justify each step.

    Model Answer Step 1 (Data model): Recipes are structured but have flexible tags — JSONB in PostgreSQL handles this. Step 2 (Consistency): Saving a recipe must be consistent but like counts can be eventual. Step 3 (Scale): 50K users is Tier 1 — a single PostgreSQL instance easily handles this. Step 4 (Ops): 2 engineers should use managed RDS. Result: PostgreSQL on RDS, with a tsvector index or pg_search for ingredient search. No separate Elasticsearch — you'd add it only if search performance becomes a measurable problem.
  2. Cost Trade-off: A 3-engineer startup uses self-hosted Cassandra (3 nodes on EC2) at $600/month. The team spends 12 hours/month on Cassandra operations (upgrades, incidents, tuning). Their senior engineers bill at $200/hour. What is the true monthly TCO? How does this compare to CockroachDB Cloud at $1,500/month?

    Model Answer TCO = $600 (infrastructure) + 12 × $200 (engineering time) = $600 + $2,400 = $3,000/month. CockroachDB Cloud at $1,500/month is 50% cheaper in true TCO and frees 12 engineering hours for product development. The infrastructure line item is misleading; always include operations time.

Intermediate

  1. Polyglot Decision: An e-commerce team runs PostgreSQL for orders and user data. They want to add: (a) product search with faceting and typo-tolerance, (b) session management for 1M concurrent users, (c) real-time inventory counts visible across all users within 1 second. Design a polyglot architecture. For each addition, choose a database and justify. What consistency model does each use?

    Model Answer (a) Elasticsearch or Typesense for product search — synced via CDC from PostgreSQL. Eventual consistency (search index lags writes by 100–500ms). (b) Redis for sessions — in-memory key-value at sub-millisecond latency. Eventual (Redis replication is asynchronous). (c) Inventory is trickier — requires strong consistency to prevent oversell. Keep inventory in PostgreSQL with row-level locking or optimistic concurrency (`WHERE inventory > 0`). Cache the display count in Redis with short TTL (5–10 seconds) — riders see "available" while true count is in PostgreSQL.
  2. Migration Planning: A team runs a MongoDB-backed application with 500M documents in a messages collection. They need to migrate to Cassandra for write throughput. Design the dual-write migration: what does the application code change look like during migration? How do you handle documents that exist in MongoDB but haven't been migrated to Cassandra yet? How do you validate completeness?

    Model Answer Phase 1 (dual-write): Application writes to both MongoDB (old) and Cassandra (new). Read from Cassandra first; fall back to MongoDB if missing. Phase 2 (backfill): A background job reads all MongoDB documents in chunks and writes to Cassandra. Track progress with a cursor on `_id`. Phase 3 (validate): Sample 1% of documents, compare checksums in both stores. Alert if mismatch > 0.01%. Phase 4 (cut-over): Stop writing to MongoDB. Phase 5 (decommission): After 2 weeks of successful operation, delete MongoDB collection. Edge case: updates to documents during backfill — dual-write ensures Cassandra always has the latest version; the backfill can overwrite with the MongoDB version safely since dual-write already sent the latest version.

Advanced

  1. Comprehensive Architecture: Design the full database architecture for a global ride-sharing startup planning to launch in 5 cities, targeting 500K rides in year 1 and 10M rides in year 3. Include: user accounts, driver accounts, trip records, real-time driver locations, analytics, and event notifications. For each data type: choose a database, justify the choice, identify the partition/shard key, and describe the consistency model. Show how data flows between the databases.

    Model Answer Year 1 (MVP): Single PostgreSQL for user/driver/trip data (ACID required). Redis for driver locations (key = driver_id, TTL = 8s). No analytics DB yet — query PostgreSQL with read replica. Year 3 (Scale): Add H3-based location lookup in Redis (partition by H3 cell). Shard trips table by city_id (5 shards). Add ClickHouse for analytics (CDC from PostgreSQL shards via Debezium + Kafka). Add PostgreSQL read replicas per region for driver/user profile reads. The data flow: App → PostgreSQL (OLTP) → Kafka (CDC) → ClickHouse (OLAP) + Elasticsearch (search). Driver pings → Kafka (location.updates) → Redis (H3 cells, TTL 8s). Key insight: the architecture at year 1 and year 3 differ significantly — don't build year 3 architecture at year 1. The transition points are measurable (specific latency thresholds, QPS ceilings) not arbitrary timelines.

References

Comments powered by Giscus. Enable GitHub Discussions on the repo to activate.

Built with VitePress + Dracula Theme