Skip to contentSkip to content
0/47 chapters completed (0%)

Chapter 9: Databases — SQL

Chapter banner

Relational databases have powered the internet for 50 years. They remain the default choice — until you have a deliberate reason to leave.


Mind Map


RDBMS Fundamentals

A Relational Database Management System (RDBMS) organizes data into tables (relations) with rows and columns. Tables reference each other through foreign keys. The relational model enforces schema at write time, guarantees referential integrity, and exposes a declarative query language (SQL).

Core concepts:

ConceptDescription
TableNamed collection of rows sharing the same schema
Primary KeyUnique row identifier — enables O(1) lookups
Foreign KeyColumn referencing a primary key in another table
IndexAuxiliary data structure accelerating queries
TransactionGroup of operations that succeed or fail together
ViewVirtual table defined by a query

ACID Properties

ACID (covered in depth in Ch03 — Core Trade-offs) is the contract every RDBMS transaction upholds:

  • Atomicity — A transaction either commits fully or rolls back entirely. No partial writes survive a crash.
  • Consistency — Every transaction takes the database from one valid state to another. Constraints (NOT NULL, UNIQUE, FK) are always enforced.
  • Isolation — Concurrent transactions execute as if they were sequential. The isolation level (READ COMMITTED, REPEATABLE READ, SERIALIZABLE) controls the trade-off between correctness and throughput.
  • Durability — Once a commit acknowledgment is sent, data survives crashes. Achieved via write-ahead log (WAL) and fsync.

Normalization

Normalization removes redundancy to improve data integrity and reduce write anomalies:

Normal FormRuleBenefit
1NFAll column values are atomic; no repeating groupsEnables relational operations
2NFNo partial dependency on a composite keyEliminates redundant data per partial key
3NFNo transitive dependency (non-key → non-key)Each fact stored in exactly one place

When to denormalize: When read performance outweighs write simplicity. See the Denormalization section.


Indexing

An index is a separate data structure maintained by the database that speeds up row lookups at the cost of additional storage and slower writes.

B-Tree Index

The default index type in PostgreSQL and MySQL. A balanced tree where every leaf node is at the same depth.

                    [30 | 70]
                   /    |    \
           [10|20]   [40|60]   [80|90]
           /  |  \   /  | \   /  |  \
         [10][15][20][40][55][60][80][85][90]
              (leaf nodes with row pointers)
  • Reads: O(log n) for equality, range, prefix queries
  • Writes: O(log n) with page splits on insert
  • Best for: WHERE id = ?, WHERE created_at BETWEEN ?, ORDER BY columns

Hash Index

A hash map mapping column values to row locations.

  • Reads: O(1) average for equality queries
  • Writes: O(1) average
  • Limitation: Cannot support range queries or ordering — only = comparisons
  • Best for: WHERE session_id = ? (exact match only)

Index Strategies

Index TypeUse CaseTrade-off
Single columnSimple equality/range on one columnMinimal overhead
CompositeMulti-column WHERE + ORDER BYColumn order matters — leftmost prefix rule
CoveringAll query columns in index (no heap lookup)Large index, huge read speedup
PartialIndex only rows matching a conditionSmaller index for sparse predicates
ExpressionIndex on LOWER(email) or computed valueUseful for case-insensitive lookups

The leftmost prefix rule: A composite index on (a, b, c) can accelerate queries on (a), (a, b), or (a, b, c) — but not (b) or (c) alone.


Replication

Replication copies data across multiple database servers to achieve:

  • Read scaling — distribute read queries across replicas
  • High availability — failover if primary fails
  • Geographic distribution — read from nearby replica

Master-Slave Replication

One primary (master) accepts all writes. One or more replicas (slaves) receive those writes asynchronously and serve read queries.

Replication lag is the key weakness. If a replica is behind by 500ms and a user reads immediately after writing, they may see stale data — a form of eventual consistency.

Synchronous replication eliminates lag at the cost of write latency: master waits for at least one replica to confirm before acknowledging the client. PostgreSQL supports this via synchronous_standby_names.

Master-Master Replication

Both nodes accept reads and writes. Each node replicates to the other.

Write conflicts arise when two masters modify the same row simultaneously. Resolution strategies:

  1. Last-write-wins (LWW) — timestamp determines winner; simple but lossy
  2. Application-level merge — business logic merges conflicting versions
  3. Optimistic locking — version counter; retry on conflict

Replication Comparison

PropertyMaster-SlaveMaster-Master
Write throughputSingle writer (bottleneck)Distributed writes
Read throughputAll replicas serve readsBoth nodes serve reads
ConsistencyStrong on master, eventual on replicasEventual (conflict risk)
FailoverManual or automatic promotionAutomatic (other master takes over)
ComplexityLowHigh (conflict resolution required)
Best forRead-heavy, single-regionMulti-region active-active writes

Sharding (Horizontal Partitioning)

Sharding splits one large table across multiple independent database servers (shards). Each shard holds a subset of rows and can be on different hardware. Unlike replication (copies), sharding divides data.

Sharding Strategies

StrategyHow It WorksProsCons
Range-basedShard by value range (id 0–999K → Shard 1)Range queries on shard key are fastHotspots if values concentrate (e.g., new users)
Hash-basedshard = hash(key) % num_shardsUniform distributionRange queries require scatter-gather across all shards
Directory-basedLookup table maps key → shardFlexible; supports arbitrary layoutsLookup service becomes bottleneck / SPOF

Consistent Hashing for Resharding

Hash-based sharding has a critical flaw: when you add or remove a shard, hash(key) % N changes for almost every key, causing a massive data migration.

Consistent hashing (covered in detail in Ch06 — Load Balancing) places both keys and shards on a ring. Adding/removing a shard only remaps keys adjacent to that shard — typically 1/N of all keys.

Sharding Challenges

ChallengeDescriptionMitigation
Cross-shard joinsSQL JOIN across shards requires application-side mergeDenormalize, or use federation to avoid cross-shard queries
Cross-shard transactionsACID across shards requires 2-phase commit (2PC)Avoid distributed transactions; design shard key to contain transactions
HotspotsOne shard receives disproportionate trafficHash key or add shard prefix to spread load
ReshardingAdding shards requires data migrationConsistent hashing, virtual nodes, or range splitting
Schema changesDDL must run on all shardsAutomated migration tooling (gh-ost, pt-online-schema-change)

Federation (Functional Partitioning)

Federation splits the database by function rather than by row count. Instead of one monolithic database handling everything, you have separate databases per domain:

Benefits:

  • Each database is smaller → fits in memory → faster queries
  • Independent scaling: the Orders DB can be sharded separately from Users DB
  • Independent schema evolution: Products team owns their schema
  • Fault isolation: a broken Analytics DB does not affect Order processing

Trade-offs:

  • Cross-domain joins require application-level merges (e.g., "get orders with user names")
  • More database connections to manage
  • Referential integrity across databases must be handled at application level
  • Increases operational complexity (backups, migrations, monitoring × N databases)

Federation vs Sharding: Federation splits by domain (vertical cut); sharding splits by row range (horizontal cut). They are complementary — a federated Orders DB can itself be sharded across multiple servers.


Denormalization

Normalization stores each fact exactly once, which is great for writes but can require expensive multi-table JOINs for reads. Denormalization deliberately introduces redundancy to optimize read performance.

Common Denormalization Patterns

PatternDescriptionExample
Duplicate columnsCopy a column from a referenced tableStore user_name in orders to avoid JOIN
Pre-joined tablesFlatten a frequently joined vieworder_details merges orders + products
Materialized viewsPrecomputed query result stored as a tableDaily sales totals refreshed every hour
Precomputed aggregatesStore comment_count on posts tableAvoid SELECT COUNT(*) on every page load
Derived columnsStore computed value alongside sourceStore full_name = first_name + last_name

When to Denormalize

Denormalize when:

  • Read throughput is critical and JOINs are the bottleneck
  • Data is read much more often than written (read:write ratio > 10:1)
  • The JOIN tables are large and the query is in the hot path
  • The denormalized data changes infrequently (low sync overhead)

Do not denormalize when:

  • Write volume is high (maintaining redundant copies is expensive)
  • Data changes frequently (stale denormalized data is hard to avoid)
  • You have not first confirmed the JOIN is the actual bottleneck (profile first)

Write Complexity Trade-off

Every denormalized copy must be updated on every write. If user_name is stored in 5 tables, a user rename requires 5 UPDATE statements — ideally in a single transaction.


SQL Tuning

EXPLAIN ANALYZE

Before optimizing, measure. Use EXPLAIN ANALYZE (PostgreSQL) or EXPLAIN (MySQL) to show the query execution plan:

sql
EXPLAIN ANALYZE
SELECT o.id, u.name, p.title
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN products p ON o.product_id = p.id
WHERE o.created_at > NOW() - INTERVAL '7 days';

The output reveals: sequential scans vs index scans, estimated vs actual row counts, and which JOIN algorithm was chosen (nested loop, hash join, merge join).

Tuning Checklist

TechniqueDescription
Add indexes on filter columnsWHERE, JOIN ON, ORDER BY columns are index candidates
Avoid SELECT *Fetch only needed columns; enables covering indexes
Avoid functions on indexed columnsWHERE LOWER(email) = ? prevents index use — use expression index instead
Limit result setsAlways use LIMIT with OFFSET for pagination
Use connection poolingPgBouncer (PostgreSQL) or ProxySQL (MySQL) — reuse DB connections
Vacuum / ANALYZEPostgreSQL requires VACUUM to reclaim dead tuples and update statistics
Partition large tablesPostgreSQL table partitioning for time-series data (monthly partitions)
Read replicas for analyticsRoute SELECT aggregations to replica, not primary

When to Use SQL

CriterionUse SQL When...
Data relationshipsComplex relationships between entities (e.g., users → orders → products)
Transaction requirementsMulti-row, multi-table atomicity is required
Query patternsAd-hoc queries; analytics; JOIN-heavy reporting
SchemaSchema is known upfront and relatively stable
ComplianceFinancial, healthcare, or audit requirements demand ACID guarantees
Team familiaritySQL expertise is more common than NoSQL expertise on your team

Real-World Examples

Instagram: PostgreSQL → Sharding at Scale

Instagram started with a single PostgreSQL server. As it grew to 100M+ users, they implemented horizontal sharding using PostgreSQL:

  • Split media, users, follows, and likes into separate databases (federation)
  • Each federated database further sharded: 512 logical shards per database cluster
  • Logical shards are remappable to physical servers without application changes
  • Used consistent hashing to distribute user data across shards
  • Result: 100+ physical PostgreSQL servers, all hidden behind a routing layer

Key insight: Start monolithic. Federate first. Shard within each federated domain. Never shard prematurely.

MySQL at Facebook

Facebook runs MySQL at massive scale for their social graph storage:

  • Deployed thousands of MySQL shards partitioned by user ID
  • Built custom tools: MHA (Master High Availability) for automatic failover
  • Use semi-synchronous replication to reduce data loss window
  • Route reads to replicas; writes to master; cross-shard queries are minimized by design
  • Separate read and write paths at application layer via Tao (their graph layer)

Key Takeaway

SQL databases are the workhorse of the industry. Master replication before sharding, shard by a key that distributes writes evenly, federate by domain to keep each database small, and denormalize only after profiling proves the JOIN is the bottleneck. ACID guarantees are worth the complexity for data where correctness is non-negotiable.


Transaction Isolation Levels

ACID's "I" (Isolation) is not binary — there is a configurable spectrum. Higher isolation prevents more anomalies but reduces concurrency. Every major RDBMS defaults to something in the middle.

Isolation Anomalies

AnomalyDescriptionExample
Dirty ReadTransaction reads uncommitted data from another transactionRead a balance update that was later rolled back
Non-repeatable ReadReading the same row twice in one transaction yields different valuesRow updated by concurrent transaction between your two reads
Phantom ReadA range query returns different rows on re-executionNew row inserted by concurrent transaction matches your WHERE clause

Isolation Levels Matrix

Isolation LevelDirty ReadNon-repeatable ReadPhantom ReadPerformanceDefault In
Read UncommittedPossiblePossiblePossibleHighestRarely used
Read CommittedPreventedPossiblePossibleHighPostgreSQL, Oracle
Repeatable ReadPreventedPreventedPossible*MediumMySQL InnoDB
SerializablePreventedPreventedPreventedLowestCockroachDB, Spanner

*InnoDB prevents phantoms at Repeatable Read via gap locks; PostgreSQL uses MVCC snapshot isolation instead.

Dirty Read Scenario

When isolation levels matter in practice:

  • Read Committed (default in PostgreSQL): Safe for most Online Transaction Processing (OLTP) workloads. Allows non-repeatable reads — acceptable if you do not re-read rows within a transaction.
  • Repeatable Read: Use when a transaction reads the same data multiple times and must see consistent values — e.g., generating a report that spans multiple queries.
  • Serializable: Use for financial transfers, inventory deductions, or any operation where phantom reads could cause correctness failures.

Multi-Version Concurrency Control (MVCC)

Traditional locking blocks readers when writers are active. MVCC eliminates this by keeping multiple versions of each row, allowing readers and writers to proceed concurrently without blocking.

How PostgreSQL Implements MVCC

Every row in PostgreSQL has two hidden system columns:

  • xmin — the transaction ID that created this row version
  • xmax — the transaction ID that deleted/updated this row (0 if still live)

When a transaction updates a row, it does not overwrite the old version. Instead it:

  1. Marks the old row with xmax = current_txid
  2. Inserts a new row version with xmin = current_txid

MVCC vs Traditional Locking

PropertyMVCCTraditional Locking
Reader blocks writerNeverYes (shared lock)
Writer blocks readerNeverYes (exclusive lock)
Writer blocks writerYes (conflict on same row)Yes
Storage overheadDead tuples accumulate — need VACUUMNo dead tuples
Snapshot isolationNatural — each transaction sees a snapshotRequires range locks
Used byPostgreSQL, Oracle, MySQL InnoDBSQL Server (page locking mode)

VACUUM: PostgreSQL periodically runs VACUUM to reclaim storage from dead row versions. Without it, tables grow unboundedly. autovacuum handles this automatically but must be tuned for write-heavy tables.

Cross-reference: MVCC is the mechanism enabling PostgreSQL's Read Committed and Repeatable Read isolation levels. For how this interacts with distributed systems, see Ch03 — PACELC.


B-Tree vs LSM-Tree Storage Engines

The storage engine determines how data is physically laid out on disk — which directly governs read vs. write performance characteristics.

B-Tree (Read-Optimized)

B-Trees organize data in a balanced tree where every update is an in-place write to a specific page on disk.

Write path: locate page → load page from disk → modify in memory → write back
Read path:  traverse tree (O(log n)) → load leaf page → return row

Used by: PostgreSQL heap storage, MySQL InnoDB, SQLite, most traditional RDBMS.

LSM-Tree (Write-Optimized)

Log-Structured Merge Trees append all writes to an in-memory buffer (MemTable), flush to disk as immutable sorted files (SSTables), and merge (compact) those files in the background.

Used by: RocksDB, LevelDB, Cassandra, HBase, ScyllaDB, InfluxDB.

B-Tree vs LSM-Tree Comparison

PropertyB-TreeLSM-Tree
Write performanceMedium — in-place page writes, random I/OHigh — sequential append, batch flush
Read performanceHigh — direct tree traversalMedium — may check multiple SSTables + bloom filters
Write amplificationLow (~1×)High (compaction rewrites data multiple times, 10–30×)
Read amplificationLow (O(log n) pages)Medium (check MemTable + multiple SSTable levels)
Space amplificationLowMedium (tombstones + not-yet-compacted duplicates)
Update semanticsIn-place overwriteAppend new version; old version removed at compaction
CompressionPer-pagePer-SSTable (better compression ratios)
Best use caseRead-heavy OLTP (dashboards, reporting)Write-heavy workloads (IoT, logging, time-series)
Example databasesPostgreSQL, MySQL, SQLiteRocksDB, Cassandra, HBase

Rule of thumb: If writes dominate (>70% of operations), LSM-Tree is likely faster. If reads dominate or you need predictable read latency, B-Tree wins.


Database Locking Strategies

When multiple transactions access the same data, the database must coordinate to prevent anomalies. Two fundamental approaches:

Pessimistic Locking

Assume conflict will happen — acquire a lock before reading or writing.

sql
-- Pessimistic: lock the row for the entire transaction
BEGIN;
SELECT balance FROM accounts WHERE id = 1 FOR UPDATE;  -- acquires exclusive lock
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;  -- lock released

Optimistic Locking

Assume conflict is rare — proceed without locks, but check for conflicts at commit time using a version counter.

Locking Strategy Comparison

PropertyPessimisticOptimistic
Conflict assumptionHigh contention expectedLow contention expected
Lock durationEntire transactionNo lock held (check at commit)
Throughput under low contentionLower — unnecessary locking overheadHigher — no blocking
Throughput under high contentionBetter — avoids retry stormsLower — many retries
Deadlock riskYes — two transactions can deadlockNo — no locks held
ImplementationSELECT FOR UPDATE, LOCK TABLEVersion column + conditional UPDATE
Best forInventory deductions, seat booking, money transferUser profile updates, shopping cart changes

Deadlock example with pessimistic locking:

T1 locks Row A, waits for Row B
T2 locks Row B, waits for Row A
→ Deadlock detected by DB → one transaction rolled back → retry

Prevention: always acquire locks in the same order across transactions.


Advanced Index Types

Beyond standard B-Tree and Hash indexes, modern databases offer specialized index types that can dramatically improve performance for specific access patterns.

Index TypeHow It WorksBest ForOverheadExample
Covering IndexIncludes all columns needed by a query — no heap lookupHigh-frequency SELECT with known column setLarger index sizeCREATE INDEX ON orders (user_id) INCLUDE (status, total)
Partial IndexIndexes only rows matching a WHERE conditionSparse predicates (e.g., only active users)Smaller — indexes a subsetCREATE INDEX ON users (email) WHERE active = true
Expression IndexIndexes a computed expression, not a raw columnCase-insensitive lookups, derived valuesRecomputed on every writeCREATE INDEX ON users (LOWER(email))
GIN (Generalized Inverted)Inverted index over composite values (arrays, JSONB, tsvector)Full-text search, JSONB key queries, array @>Slower writes, large indexCREATE INDEX ON posts USING GIN (search_vector)
GiST (Generalized Search Tree)Pluggable tree for non-scalar typesGeometric data, range types, nearest-neighborComplex — depends on operator classCREATE INDEX ON locations USING GIST (coordinates)
BRIN (Block Range Index)Stores min/max per block range, not per rowHuge append-only tables with natural orderingTiny — 1 entry per block rangeCREATE INDEX ON events USING BRIN (created_at)

When to Use Each

Query: WHERE LOWER(email) = ?        → Expression index on LOWER(email)
Query: WHERE tags @> ARRAY['golang'] → GIN index on tags
Query: WHERE location <-> point ...  → GiST index on geometry column
Query: WHERE created_at > last_month → BRIN on append-only time-series table
Query: SELECT id, status FROM orders WHERE user_id = ?  → Covering index (user_id) INCLUDE (status)

Index bloat: Unused indexes slow down writes without helping reads. Periodically query pg_stat_user_indexes to identify zero-scan indexes and drop them.

Cross-references: Index storage engines (B-Tree internals) are covered in the section above. For how NoSQL databases handle indexing differently, see Ch10 — NoSQL. For replication and consistency trade-offs, see Ch03 — PACELC and Ch15 — Replication.


Case Study: Figma's PostgreSQL Scaling Journey

Figma is a browser-based collaborative design tool. By 2020 their user base was doubling annually — and their single PostgreSQL instance was reaching connection saturation, replication lag spikes, and slow query times on analytics workloads.

Context and Challenges

ChallengeRoot CauseSymptoms
Connection exhaustionEach web server opens its own PG connection; PG process-per-connection model limits ~500 concurrentFATAL: remaining connection slots reserved errors under load
Replication lagHeavy write workload on a single primary; replicas couldn't keep upAnalytics queries reading stale data; delayed dashboards
Partition growthA single documents table grew to hundreds of millions of rowsQuery planner choosing sequential scans despite indexes
Cross-team query couplingAll teams queried the same DB; a bad analytics query starved OLTPLatency spikes on the design canvas during batch reports

Solution Architecture

Figma addressed each challenge in layers rather than a single "big bang" migration:

Step 1 — PgBouncer for connection pooling

Instead of each app server holding a dedicated PostgreSQL connection, all connections first go through PgBouncer, which maintains a smaller pool of actual backend connections and multiplexes thousands of client connections onto them.

Step 2 — Read replicas for analytics isolation

Analytics and reporting queries were routed to dedicated read replicas, removing them from the primary's query queue entirely. This reduced replication lag on the primary because it was no longer competing with heavy read workloads.

Step 3 — Horizontal sharding by organization ID

For the largest tables (documents, files, comments), Figma sharded by organization_id — a natural boundary because most queries are scoped to a single organization. This keeps cross-shard queries rare.

Trade-offs and Decisions

DecisionBenefitTrade-off Accepted
PgBouncer transaction-mode poolingHandles thousands of clients with small backend poolPrepared statements and session-level variables not preserved across transactions
Shard key = organization_idCross-shard queries are rare; most work stays in one shardLarge enterprise orgs land on one shard — requires shard-level capacity planning
Read replicas for analyticsPrimary fully dedicated to OLTP writes and readsReplication lag means analytics may be seconds behind; acceptable for dashboards
Stay on PostgreSQL (not migrate to NoSQL)ACID guarantees; complex queries via SQL; team familiarityMore operational complexity than managed NoSQL; requires sharding discipline

Key Takeaways

  1. Connection pooling is not optional at scale. PostgreSQL's process-per-connection model caps at ~500-1000 connections; PgBouncer is the standard solution.
  2. Separate read workloads early. Analytics queries on the primary cause replication lag that affects OLTP responsiveness. Route to replicas before you feel the pain.
  3. Shard by your natural access boundary. organization_id was Figma's natural isolation unit — most queries never needed to cross it. Sharding along this boundary eliminated 95%+ of cross-shard joins.
  4. Relational DBs CAN scale horizontally. With sharding, connection pooling, and replica routing, PostgreSQL handled Figma's scale without migrating to NoSQL.

Cross-references: Connection pooling is mentioned in the SQL Tuning section above. For how shard routing interacts with consistency, see Ch03 — PACELC. For replication mechanics, see Ch15 — Replication.


Zero-Downtime Schema Migrations

The Problem

  • ALTER TABLE on a large table can lock it for minutes or hours
  • Dropping a column still referenced by running code causes errors
  • Adding a NOT NULL column without a default fails on existing rows

Expand-Contract Pattern

Three phases ensure no single deployment breaks running code or locks the table:

  1. Expand — Add the new column as nullable. Deploy application code that writes to both the old and new columns simultaneously. Running instances still use only the old column for reads.
  2. Migrate — Backfill all existing rows in batches (avoid one giant UPDATE). Verify row counts and spot-check values before proceeding.
  3. Contract — Once all rows are populated and all application instances have been updated to read from the new column, drop the old column.

Common Migration Patterns

OperationSafe ApproachUnsafe Approach
Add columnAdd nullable, backfill, then add constraintADD COLUMN NOT NULL
Remove columnStop reading in code, deploy, then dropDROP COLUMN immediately
Rename columnAdd new, copy data, drop old (expand-contract)RENAME COLUMN
Change typeAdd new column, dual-write, migrate, drop oldALTER TYPE directly
Add indexCREATE INDEX CONCURRENTLY (Postgres)CREATE INDEX (locks table)

Tools

  • gh-ost (GitHub) — online schema migration for MySQL; shadows the table via triggers and binary log
  • pt-online-schema-change (Percona) — MySQL online DDL using trigger-based copying
  • pgroll — PostgreSQL zero-downtime migrations using the expand-contract pattern natively
  • Flyway / Liquibase — version-controlled migration runners; provide ordering and checksums but do not themselves solve the locking problem

Database Security Essentials

Security is not a separate concern from database design — it must be built into the data model, query layer, and access patterns from the start. These are the critical security patterns every system design should address.

SQL Injection Prevention

SQL injection remains a top OWASP vulnerability. The defense is simple but non-negotiable: always use parameterized queries.

sql
-- VULNERABLE: string concatenation
SELECT * FROM users WHERE username = '" + input + "';
-- Attack: input = "admin' OR '1'='1" → returns all users

-- SAFE: parameterized query
SELECT * FROM users WHERE username = $1;
-- Input is treated as data, never as SQL code

Every major ORM (SQLAlchemy, Hibernate, ActiveRecord, Prisma) uses parameterized queries by default. Raw SQL escapes should only appear in explicitly reviewed code paths.

Encryption

LayerWhat to EncryptHow
In transitClient ↔ DB connectionsTLS/SSL (enforce sslmode=require or verify-full)
At restData files on diskTransparent Data Encryption (TDE) or filesystem encryption (LUKS, EBS encryption)
Column-levelPII (SSN, credit cards, health records)Application-level encryption (AES-256-GCM) before INSERT; DB stores ciphertext

Column-level encryption trade-off

Encrypting individual columns prevents the database from indexing or querying the plaintext. If you encrypt an email column, you cannot do WHERE email = ? without decrypting every row. Use deterministic encryption or blind indexing for fields that must be searchable.

Access Control

  • Principle of least privilege: Application database users should have only the permissions they need. A read API service gets SELECT only; a write service gets SELECT, INSERT, UPDATE but not DROP or ALTER.
  • Separate credentials per service: Never share a single database user across microservices. If one service is compromised, the blast radius is limited.
  • Row-Level Security (RLS): PostgreSQL and SQL Server support RLS policies that filter rows based on the current user/role — useful for multi-tenant SaaS where tenants share tables.

Audit Logging

For regulated industries (healthcare, finance), log all data access:

  • Who accessed what data, when, and from where
  • PostgreSQL: pgaudit extension for statement-level and object-level audit
  • MySQL: Enterprise Audit plugin or general query log (with performance impact)

Cross-reference: See Chapter 16 — Security & Reliability for authentication (OAuth 2.0, JWT), rate limiting, and reliability patterns that complement database security.


ChapterRelevance
Ch03 — Core Trade-offsPACELC model underpins SQL consistency choices
Ch10 — NoSQL DatabasesCompare SQL vs NoSQL trade-offs for data model decisions
Ch15 — Replication & ConsistencyReplication mechanics for read replicas and failover
Ch14 — Event-Driven ArchitectureDistributed transactions (2PC, Saga) across SQL databases
Ch16 — Security & ReliabilityAuthentication, authorization, encryption, and rate limiting patterns

Practice Questions

Beginner

  1. Replication Lag: A user updates their profile picture, then immediately loads their profile page and sees the old picture. Explain the root cause (read replica lag) and describe two architectural fixes at different layers of the stack.

    Hint The read was routed to a replica that hasn't received the write yet — fix either by reading from the primary for the user's own profile, or by using read-your-writes consistency via session-sticky routing.
  2. Denormalization Decision: An e-commerce site stores product_price in the order_line_items table (a denormalized copy from products). A junior engineer says this is wrong because prices are duplicated. Is it wrong? Defend your answer with reference to what the data represents.

    Hint Order line items record the price *at time of purchase* — denormalization is intentional here because the historical order must not change if the product price is later updated.

Intermediate

  1. Sharding Key Selection: You are sharding a messages table across 8 shards. Your options are: shard by sender_id, receiver_id, or conversation_id. Which do you choose, what queries does your choice enable, and what queries become expensive or impossible?

    Hint Shard by `conversation_id` to keep all messages for a conversation on one shard, enabling efficient history queries; sharding by sender or receiver scatters a conversation across shards.
  2. Federation Trade-off: An order service needs to display "orders with customer names." The orders DB and users DB are federated into separate databases. How do you handle the JOIN efficiently without cross-database queries, and what consistency guarantees does your approach make?

    Hint Denormalize `customer_name` into the orders table at write time, or perform the join at the application layer with a batch lookup — both avoid cross-database JOINs at the cost of some consistency.

Advanced

  1. Resharding Migration: Your hash-based sharding uses hash(id) % 4 across 4 shards. You need to add a 5th shard without downtime. Describe the full migration challenge (what percentage of rows must move), how consistent hashing reduces that percentage, and what double-write strategy you would use during the transition window.

    Hint Modulo resharding moves ~80% of rows; consistent hashing moves ~20% — use a shadow write to the new shard topology while reads still go to the old, then cut over after backfill validation.

References & Further Reading

  • "Designing Data-Intensive Applications" by Martin Kleppmann — Chapters 2–3 cover data models and storage engines in depth; the primary reference for B-Tree, LSM-Tree, and ACID internals.
  • "Use The Index, Luke"https://use-the-index-luke.com/ — Free online book dedicated to SQL indexing; covers the leftmost prefix rule, execution plans, and index-only scans with practical examples.
  • PostgreSQL documentation: EXPLAIN / Indexeshttps://www.postgresql.org/docs/current/using-explain.html — Official reference for reading query plans and understanding index selection.
  • "Zero-Downtime Migrations" — Stripe Engineering Blog — Describes the expand-contract pattern as applied at Stripe's scale, including dual-write strategies and backfill tooling.
  • "Online Schema Change at GitHub" — gh-ost documentation — https://github.com/github/gh-ost — Explains how gh-ost avoids table locks by using a ghost table and the MySQL binary log to replay writes during migration.

Next: Chapter 10 — Databases: NoSQL →Previous: Chapter 08 — CDN ←

Comments powered by Giscus. Enable GitHub Discussions on the repo to activate.

Built with VitePress + Dracula Theme