Chapter 7: NoSQL at Scale

"NoSQL is not a rejection of SQL. It is a deliberate choice to trade relational expressiveness for specific performance guarantees."

Mind Map

Prerequisites

This chapter assumes familiarity with the NoSQL category overview from Ch01 and access-pattern-driven design from Ch02. Review those first if needed.

DynamoDB: Single-Table Design

DynamoDB is a serverless key-value and document store from AWS that scales to millions of requests per second with consistent single-digit millisecond latency. The key to using it correctly is understanding that its data model is radically different from SQL.

The Single-Table Principle

In SQL, you normalize data into multiple tables and JOIN at query time. In DynamoDB, you precompute the joins at write time by storing multiple entity types in a single table, using overloaded attribute names.

Every DynamoDB table has:

Partition Key (PK): Determines which partition (physical node) stores the item
Sort Key (SK): Sorts items within a partition; enables range queries and hierarchical relationships

E-Commerce Single-Table Schema

All entity types (users, orders, order items, products) live in one DynamoDB table:

PK	SK	Attributes	Entity Type
`USER#u001`	`PROFILE#u001`	name, email, created_at	User profile
`USER#u001`	`ORDER#2024-01-15#ord001`	total, status	Order header
`USER#u001`	`ORDER#2024-01-15#ord002`	total, status	Order header
`ORDER#ord001`	`ITEM#sku123`	quantity, price	Order item
`ORDER#ord001`	`ITEM#sku456`	quantity, price	Order item
`PRODUCT#sku123`	`METADATA`	name, description, price	Product

javascript

// Access patterns enabled by this schema:

// 1. Get user profile
{ PK: "USER#u001", SK: "PROFILE#u001" }

// 2. Get all orders for a user (sorted by date)
{
  KeyConditionExpression: "PK = :pk AND begins_with(SK, :prefix)",
  ExpressionAttributeValues: { ":pk": "USER#u001", ":prefix": "ORDER#" }
}

// 3. Get all items in an order
{
  KeyConditionExpression: "PK = :pk AND begins_with(SK, :prefix)",
  ExpressionAttributeValues: { ":pk": "ORDER#ord001", ":prefix": "ITEM#" }
}

// 4. Get product metadata
{ PK: "PRODUCT#sku123", SK: "METADATA" }

GSI Overloading

Global Secondary Indexes (GSIs) add alternative access patterns. By using generic attribute names (GSI1PK, GSI1SK), you can answer multiple query patterns with one GSI:

PK	SK	GSI1PK	GSI1SK	Entity
`USER#u001`	`ORDER#2024-01-15#ord001`	`STATUS#pending`	`2024-01-15T10:00:00`	Order
`USER#u002`	`ORDER#2024-01-16#ord003`	`STATUS#shipped`	`2024-01-16T09:00:00`	Order

javascript

// GSI: find all pending orders across all users, sorted by time
{
  IndexName: "GSI1",
  KeyConditionExpression: "GSI1PK = :status",
  ExpressionAttributeValues: { ":status": "STATUS#pending" },
  ScanIndexForward: true  // ascending by GSI1SK (time)
}

Access Pattern Planning

Design your DynamoDB schema by listing access patterns first:

Access Pattern	PK	SK	GSI?
Get user by ID	`USER#{id}`	`PROFILE#{id}`	No
Get orders for user	`USER#{id}`	`ORDER#*`	No
Get items for order	`ORDER#{id}`	`ITEM#*`	No
Get orders by status	—	—	GSI1: `STATUS#{status}`
Get product by SKU	`PRODUCT#{sku}`	`METADATA`	No

Never Scan a DynamoDB Table

Scan reads every item in the table, consuming all provisioned capacity. Design every access pattern to use Query (with PK) or GetItem (with PK + SK). If you cannot avoid a scan, consider whether DynamoDB is the right tool or whether you need a GSI for that access pattern.

Cassandra: Partition and Clustering Keys

Cassandra is a wide-column store designed for massive write throughput across multiple data centers. Its data model is fundamentally different from both SQL and DynamoDB.

Partition Routing

Schema Design: Primary Key Anatomy

sql

-- Cassandra table: sensor readings
CREATE TABLE sensor_readings (
  sensor_id   UUID,       -- Partition key: determines node
  bucket      TEXT,       -- Partition key component: time bucketing (2024-01)
  timestamp   TIMESTAMP,  -- Clustering key: sort order within partition
  value       DOUBLE,
  unit        TEXT,
  PRIMARY KEY ((sensor_id, bucket), timestamp)  -- Composite partition + clustering
) WITH CLUSTERING ORDER BY (timestamp DESC);

-- Query: last 100 readings for sensor X in January 2024
SELECT * FROM sensor_readings
WHERE sensor_id = ? AND bucket = '2024-01'
ORDER BY timestamp DESC
LIMIT 100;

Wide Partition Anti-Pattern

A partition in Cassandra can store up to 2 billion cells (columns × rows). A partition that grows without bound becomes a hot partition — all reads and writes for that partition land on one node, creating a bottleneck.

sql

-- ANTI-PATTERN: unbounded partition
-- All posts for a popular user (millions of posts) → one partition → hot node
CREATE TABLE user_posts (
  user_id UUID,
  post_id TIMEUUID,
  content TEXT,
  PRIMARY KEY (user_id, post_id)
);

-- CORRECT: time-bucketed partition
-- Each partition holds at most one month of posts per user
CREATE TABLE user_posts (
  user_id   UUID,
  month     TEXT,   -- '2024-01', '2024-02', etc.
  post_id   TIMEUUID,
  content   TEXT,
  PRIMARY KEY ((user_id, month), post_id)
) WITH CLUSTERING ORDER BY (post_id DESC);

The time-bucketing strategy limits partition size while keeping related data co-located. The trade-off: queries spanning multiple months must query multiple partitions and merge results.

Tombstones: The Hidden Performance Killer

In Cassandra, DELETE does not remove data — it writes a tombstone marker. During reads, Cassandra must scan all tombstones in a partition before returning live rows. Heavy delete workloads cause tombstone accumulation that severely degrades read performance.

Concrete thresholds: Cassandra's default tombstone_warn_threshold is 1,000 and tombstone_failure_threshold is 100,000 tombstones per query. In practice, read latency degrades noticeably at ~10K tombstones per partition (p99 jumps 5–10×). At 100K+, queries time out or fail. Discord's migration post documented partitions with millions of tombstones causing cascading GC pauses — this was a primary driver for their move to ScyllaDB.

sql

-- Check tombstone count per partition (Cassandra nodetool)
-- nodetool tablehistograms keyspace.table

-- Design to avoid tombstones:
-- Option 1: Use TTL instead of DELETE
INSERT INTO events (id, data) VALUES (?, ?)
USING TTL 2592000;  -- Auto-expires after 30 days (no tombstone at read time)

-- Option 2: Partition by time bucket, DROP old partitions
-- Instead of deleting rows, let old month-buckets fall off naturally

Tombstone GC Grace Period

By default, tombstones are only removed by compaction 10 days after creation (gc_grace_seconds = 864000). If you delete heavily and read the same partition, you will read through all accumulated tombstones every time. Tune gc_grace_seconds to match your compaction frequency, or redesign to use TTL.

Compaction Strategies

Strategy	How It Works	Best For
STCS (SizeTieredCompaction)	Merges SSTables of similar size	Write-heavy workloads, minimal reads
LCS (LeveledCompaction)	Organized levels, like RocksDB	Read-heavy, needs low read amplification
TWCS (TimeWindowCompaction)	Compacts SSTables within time windows	Time-series data with TTL

MongoDB: Schema Patterns

MongoDB 8.x and SSPL Licensing

MongoDB 8.x is the current major release (as of 2026). MongoDB uses the Server Side Public License (SSPL) — not an OSI-approved open-source license — which requires cloud providers to open-source their entire service stack if they offer MongoDB as a managed service. This is why MongoDB is absent from Debian, Fedora, and RHEL default repositories. If SSPL licensing is a concern for your organization, evaluate PostgreSQL with JSONB + pgvector as a flexible alternative.

MongoDB stores documents (JSON/BSON) in collections. Unlike DynamoDB, it supports flexible queries without predefined access patterns — but schema design still matters enormously for performance.

Embedding vs Referencing

The fundamental MongoDB schema decision: store related data in one document (embedding) or separate documents with references (referencing):

javascript

// EMBEDDING: order with line items (bounded, always fetched together)
{
  _id: ObjectId("..."),
  user_id: "u001",
  status: "shipped",
  created_at: ISODate("2024-01-15"),
  items: [
    { sku: "SKU123", qty: 2, price: 19.99 },
    { sku: "SKU456", qty: 1, price: 49.99 }
  ],
  total: 89.97
}

// REFERENCING: user with separate orders collection
// users collection:
{ _id: "u001", name: "Alice", email: "a@x.com" }

// orders collection:
{ _id: ObjectId("..."), user_id: "u001", status: "shipped", total: 89.97 }

Advanced Schema Patterns

Bucket Pattern — group time-series measurements into one document per time window to reduce document count and improve compression:

javascript

// Instead of one document per reading (millions of docs):
{ sensor_id: "s001", timestamp: ISODate("..."), value: 23.4 }

// Bucket by hour (24 readings per document for minute-level data):
{
  sensor_id: "s001",
  bucket_hour: ISODate("2024-01-15T10:00:00Z"),
  count: 60,
  measurements: {
    "00": 23.4, "01": 23.5, "02": 23.3,  // minute offsets
    // ... 57 more
  },
  min: 23.1, max: 23.8, sum: 1405.2
}

Outlier Pattern — handle documents that grow beyond normal size (e.g., a viral post with 100K comments):

javascript

// Normal blog post (comments embedded)
{
  _id: "post001",
  title: "Normal Post",
  comments: [ /* up to 100 comments */ ],
  has_overflow: false
}

// Viral post (overflow to separate collection)
{
  _id: "post999",
  title: "Viral Post",
  comments: [ /* first 50 comments */ ],
  has_overflow: true  // signal to app: fetch /comment_overflow/post999 too
}
// comment_overflow collection:
{ post_id: "post999", comments: [ /* comments 51-100K */ ] }

Schema Versioning Pattern — handle rolling schema migrations without downtime:

javascript

// Add schema_version field to track document format
{
  _id: "u001",
  schema_version: 2,
  name: "Alice Smith",        // v2: split full_name into name
  // full_name: "Alice Smith" // v1 format — application handles both
}

// Application code:
function readUser(doc) {
  if (doc.schema_version === 1) {
    return { name: doc.full_name, ...doc };
  }
  return doc;  // v2 already correct
}

MongoDB Aggregation Pipeline

javascript

// Complex aggregation: daily revenue by product category, last 30 days
db.orders.aggregate([
  // Stage 1: filter to last 30 days
  { $match: { created_at: { $gte: new Date(Date.now() - 30 * 86400000) } } },
  // Stage 2: unwind line items array
  { $unwind: "$items" },
  // Stage 3: lookup product category
  { $lookup: {
    from: "products",
    localField: "items.sku",
    foreignField: "_id",
    as: "product"
  }},
  { $unwind: "$product" },
  // Stage 4: group by date + category
  { $group: {
    _id: {
      date: { $dateToString: { format: "%Y-%m-%d", date: "$created_at" } },
      category: "$product.category"
    },
    revenue: { $sum: { $multiply: ["$items.qty", "$items.price"] } },
    order_count: { $sum: 1 }
  }},
  // Stage 5: sort by date descending
  { $sort: { "_id.date": -1, revenue: -1 } }
]);

Redis and Valkey: Beyond Caching

License Shift (2024–2026)

Redis relicensed from BSD to BSLv1/RSALv2 in March 2024. In response, the Linux Foundation chartered Valkey — a BSD-3 fork — which is now the default redis package in Fedora 42, Ubuntu 26.04 LTS, and Debian 13. Redis 8 (2025) added an AGPL option, but major cloud providers (AWS, GCP, Oracle) offer Valkey as a drop-in managed service. For new projects: prefer Valkey. For existing deployments: evaluate migration cost; the APIs are wire-compatible.

Redis (and Valkey) are most commonly used as caches, but their data structures support production patterns that would require complex SQL or separate services.

Sorted Sets: Leaderboards and Rate Limiting

python

import redis
r = redis.Redis()

# Global leaderboard: O(log N) add, O(log N + K) range query
r.zadd("leaderboard:global", {"user:alice": 9850, "user:bob": 7420})

# Top 10 players
top_10 = r.zrevrange("leaderboard:global", 0, 9, withscores=True)

# Sliding window rate limiting: count requests in last 60 seconds
import time
now = time.time()
window_key = f"rate:user:alice:{int(now // 60)}"
count = r.incr(window_key)
r.expire(window_key, 120)  # Keep for 2 windows
if count > 100:
    raise RateLimitExceeded()

Redis Streams: Durable Event Log

Redis Streams (added in 5.0) provide a persistent, ordered log — similar to Kafka topics but within Redis:

python

# Producer: append event to stream
r.xadd("events:orders", {
    "event_type": "order_created",
    "order_id": "ord001",
    "user_id": "u001",
    "total": "89.97"
})

# Consumer group: competing consumers, each message processed once
r.xgroup_create("events:orders", "shipping-service", id="0", mkstream=True)

# Read new messages
messages = r.xreadgroup(
    groupname="shipping-service",
    consumername="worker-1",
    streams={"events:orders": ">"},  # ">" = only new messages
    count=10,
    block=1000  # Block up to 1 second for new messages
)

# Acknowledge after successful processing
for stream, entries in messages:
    for entry_id, data in entries:
        process_order(data)
        r.xack("events:orders", "shipping-service", entry_id)

Redis Modules and Valkey Compatibility

Module	Capability	Redis	Valkey	Use Case
RedisJSON	Native JSON storage with path queries	Yes	Community port	Config storage, user profiles, semi-structured data
RedisSearch	Full-text search, secondary indexes	Yes	Community port	Autocomplete, faceted search, filtering JSON docs
RedisTimeSeries	Time-series storage with aggregation	Yes	Partial	Metrics, IoT, monitoring dashboards
RedisBloom	Probabilistic structures (Bloom, Cuckoo, HLL)	Yes	Compatible	Unique visitor counting, duplicate detection
RedisGraph	Graph database using Cypher	Deprecated (2023)	N/A	Use Neo4j instead

Valkey Module Ecosystem

Valkey inherits Redis's module API. Community ports of RedisJSON and RedisSearch exist and are actively maintained. For managed cloud (AWS ElastiCache/MemoryDB, GCP Memorystore), check if the managed Valkey offering includes the modules you need — coverage varies by provider as of mid-2026.

Memory Management

bash

# redis.conf — production memory management
maxmemory 48gb
maxmemory-policy allkeys-lru    # Evict least recently used keys when full

# Key sizing: estimate memory before deploying
# String: 72 bytes overhead + value length
# Hash: 64 bytes overhead + (field: 64 bytes + value) × N fields
# Sorted Set: 128 bytes overhead + (64 bytes × N members)

# Monitor memory fragmentation
redis-cli INFO memory | grep -E "used_memory|fragmentation"

Persistence Options for Redis

RDB snapshots: periodic full-dataset dump. Fastest restart, data loss up to snapshot interval (5 minutes default). AOF (Append-Only File): logs every write command. Configurable fsync (always, everysec, no). everysec is the production default — at most 1 second of data loss on crash. RDB + AOF: best durability, slowest restart (replays AOF from last RDB).

NoSQL Consistency Patterns

NoSQL databases offer tunable consistency — a spectrum between eventual consistency (high availability, some data may be stale) and strong consistency (always current, higher latency).

Consistency Levels Compared

System	Level	Quorum Required	Behavior
Cassandra	`ONE`	1 replica	Fastest; may read stale data
Cassandra	`QUORUM`	(RF/2 + 1) replicas	Consistent if RF=3 and 2 nodes up
Cassandra	`ALL`	All replicas	Strongest; any replica down = failure
DynamoDB	Eventual Read	—	Default; up to ~1s stale
DynamoDB	Strongly Consistent Read	—	2× cost; always latest
MongoDB	`primary`	Primary only	Default; always latest
MongoDB	`primaryPreferred`	Primary or secondary	Falls back to secondary if primary down
MongoDB	`majority`	Majority of nodes	Consistent; slower

python

# Cassandra: per-query consistency level
from cassandra import ConsistencyLevel
from cassandra.query import SimpleStatement

# Read with QUORUM (balanced: consistent + available)
stmt = SimpleStatement(
    "SELECT * FROM orders WHERE user_id = %s",
    consistency_level=ConsistencyLevel.QUORUM
)
session.execute(stmt, [user_id])

# Write with QUORUM (most production use cases)
stmt = SimpleStatement(
    "INSERT INTO orders (id, user_id, total) VALUES (%s, %s, %s)",
    consistency_level=ConsistencyLevel.QUORUM
)

Read Repair and Hinted Handoff

Read Repair: When Cassandra reads with QUORUM, it compares versions across replicas. If replicas disagree, the latest version wins and is written back to stale replicas in the background. This is how eventual consistency converges.

Hinted Handoff: If a replica is temporarily down during a write, the coordinator stores a "hint" and replays the write when the replica comes back up. This provides durability without requiring all replicas to be available for every write.

Monitoring & Observability

Each NoSQL engine requires different monitoring approaches. Here are the essential metrics and tools per engine.

Engine	Key Metrics	Primary Tool	Critical Alert
DynamoDB	ConsumedReadCapacity, ConsumedWriteCapacity, ThrottledRequests, SuccessfulRequestLatency	AWS CloudWatch	ThrottledRequests > 0 (capacity exceeded)
Cassandra	Read/Write latency (p99), Tombstone scans/query, Pending compactions, Dropped mutations	`nodetool tablestats`, `nodetool tpstats`	Pending compactions > 50 or dropped mutations > 0
MongoDB	OpCounters, Replication lag, WiredTiger cache hit ratio, Connections current	`mongosh db.serverStatus()`, MongoDB Atlas metrics	Replication lag > 10s or cache hit ratio < 90%
Redis / Valkey	Used memory vs maxmemory, Connected clients, Keyspace hits/misses, Evicted keys	`redis-cli INFO` (works with both), Redis/Valkey Sentinel	Evicted keys > 0 (memory pressure) or hit ratio < 80%

bash

# Cassandra: check compaction and tombstone health
nodetool compactionstats
nodetool tablestats keyspace.table | grep -E "Tombstones|Partition"

# Redis: quick health check
redis-cli INFO stats | grep -E "keyspace_hits|keyspace_misses|evicted_keys"

# MongoDB: replication lag
mongosh --eval "rs.printSecondaryReplicationInfo()"

Monitoring Rule of Thumb

For any NoSQL engine: track p99 latency (not average), error/rejection rate, and capacity headroom (disk, memory, connections). Alert on thresholds, not anomalies — NoSQL workloads are inherently spiky.

Case Study: Netflix's Cassandra Deployment

Netflix is one of the world's largest Cassandra operators, running thousands of Cassandra nodes across multiple AWS regions.

Scale (2023): ~7,000 Cassandra nodes, ~2.5 petabytes of data, multiple keyspaces for different services, 99.9999% availability target.

The Core Use Case: Viewing History

Netflix's viewing history service tracks every play, pause, and resume event for every user across every device. Requirements:

Write: ~1M events/second globally
Read: user opens Netflix → load viewing history in <50ms
Retention: 2 years of history per user
Availability: must work even during AWS regional outages

Schema Design:

sql

-- Viewing history table
CREATE TABLE viewing_history (
  user_id   UUID,
  month     TEXT,      -- time bucket: '2024-01'
  show_id   UUID,
  watched_at TIMESTAMP,
  position  INT,       -- seconds watched
  completed BOOLEAN,
  PRIMARY KEY ((user_id, month), watched_at, show_id)
) WITH CLUSTERING ORDER BY (watched_at DESC)
  AND compaction = {'class': 'TWCS',  -- TimeWindowCompactionStrategy
                    'compaction_window_unit': 'DAYS',
                    'compaction_window_size': '1'};

Multi-DC Replication:

Netflix runs Cassandra with NetworkTopologyStrategy, replicating to 3 replicas per data center, across 3 AWS regions (us-east-1, us-west-2, eu-west-1). Writes use LOCAL_QUORUM (2 of 3 replicas in the local region) for low-latency writes. Cross-DC replication happens asynchronously.

The "Chaos Monkey" Validation:

Netflix's Chaos Engineering (randomly killing production instances) regularly validates that Cassandra's hinted handoff and read repair mechanisms maintain consistency after replica failures. Their experience: with proper schema design and QUORUM reads on critical paths, Cassandra maintains consistency despite frequent node failures.

Key Lessons:

TWCS compaction is essential for time-series data — it co-locates data by time window, making old-data expiration (DROP TABLE or TTL) extremely efficient
Time-bucketing by month limits partition size to manageable levels even for heavy users
LOCAL_QUORUM is the sweet spot: consistent within a region, low latency, cross-DC replication happens asynchronously
Cassandra's throughput advantage disappears for read-heavy workloads with many secondary indexes — Netflix routes complex analytics to Spark, not Cassandra

Chapter	Relevance
Ch01 — Database Landscape	NoSQL category overview and CAP theorem
Ch02 — Data Modeling for Scale	Access-pattern-driven design (applies to NoSQL)
Ch08 — Specialized Databases	When NoSQL is still not specialized enough
System Design Ch10	NoSQL selection in system design interviews

Common Mistakes

Mistake	Why It Happens	Impact	Fix
Using DynamoDB `Scan` instead of `Query`	"I need all records matching X" without setting up a GSI	Consumes all provisioned capacity; 100× more expensive than a Query	Design every access pattern as a `Query` on PK or a GSI before going live
Using Cassandra `ALLOW FILTERING` in production	"It works in dev"	Full partition scan on every query; degrades to O(N) regardless of cardinality	Redesign the table with the filter column in the partition key or clustering key
MongoDB unbounded array growth in embedded documents	"Embed for simplicity"	Document hits 16MB limit; index on the array grows without bound	Use the Outlier pattern — cap embedded array at N items, overflow to separate collection
Redis without `maxmemory` policy	Default Redis config has no eviction	Memory exhausts; Redis crashes or becomes unresponsive	Set `maxmemory` + `maxmemory-policy allkeys-lru` in every production Redis config
Cassandra tombstone accumulation from frequent deletes	Using DELETE for TTL-like cleanup	Read latency degrades 10-100× as tombstone scans grow	Replace explicit DELETEs with TTL (`USING TTL`) and TWCS compaction strategy

Practice Questions

Beginner

DynamoDB Access Pattern: You are designing a DynamoDB table for a messaging app. Users send messages to conversations. Access patterns: (1) get all messages in a conversation, sorted by time; (2) get all conversations for a user. Design the PK/SK structure and explain what entity types you would store in a single table.
Model Answer
PK: `CONV#{conversation_id}`, SK: `MSG#{timestamp}#{message_id}` for messages. For the user's conversations list: PK: `USER#{user_id}`, SK: `CONV#{last_message_timestamp}#{conversation_id}`. Both entity types in one table. For access pattern (2), a GSI where GSI1PK = `USER#{user_id}` and GSI1SK = `CONV#{last_message_timestamp}` gives conversations sorted by most recent activity.
Cassandra Partition Design: A Cassandra table stores IoT temperature readings with schema PRIMARY KEY (device_id, timestamp). After 6 months, queries for popular devices take 5+ seconds. What is the problem and how would you fix it?
Model Answer
Wide partition problem: all readings for a popular device accumulate in one partition (potentially billions of rows). Fix: add a time-bucket component to the partition key — `PRIMARY KEY ((device_id, month), timestamp)`. Now each partition holds at most one month of readings. The trade-off: queries spanning multiple months must hit multiple partitions and merge results in the application.

Intermediate

MongoDB Embedding Decision: A blog platform stores posts and comments. Posts average 5–20 comments, but viral posts can have 50,000+ comments. Should you embed comments in the post document or reference them? Design a schema that handles both cases.
Model Answer
Use the Outlier Pattern: embed the first N comments (e.g., 50) in the post document for normal posts. Add `has_overflow: true` flag when comments exceed 50 and store additional comments in a separate `comment_overflow` collection keyed by `post_id`. Application fetches overflow conditionally. This keeps the common case (< 50 comments) fast with no extra queries, while handling viral posts without hitting MongoDB's 16MB document size limit.
Redis vs Cassandra for Session Storage: Your authentication service needs to store user sessions. Requirements: 10M active sessions, each ~2KB, 100K reads/second, 20K writes/second, sessions expire after 24 hours, must survive a Redis restart. Compare Redis vs Cassandra for this use case.
Model Answer
Redis: 10M × 2KB = 20GB — fits comfortably in memory. With AOF `everysec` persistence, survives restart with at most 1 second of session loss (acceptable for session tokens that can be re-authenticated). Read at 100K/s at sub-millisecond latency. TTL handles expiration automatically — no DELETE needed. Cassandra: overkill for this use case — better for petabyte scale. The 24-hour TTL and sub-millisecond latency requirement favor Redis. Use Redis with RDB + AOF persistence and a standby replica for failover.

Advanced

DynamoDB Hot Partition: Your DynamoDB table stores product inventory. During a flash sale, product SKU-IPHONE-15 receives 50,000 reads/second. A single DynamoDB partition supports ~3,000 RCU/second. DynamoDB auto-scaling cannot respond fast enough. Design a solution without changing the application's API.
Model Answer
Write sharding + caching: (1) For the hot item, create N copies in DynamoDB with keys `PRODUCT#SKU-IPHONE-15#shard-{0..9}`. Reads randomly pick a shard, distributing load across 10 partitions (effective 30,000 RCU). Writes update all N shards (fan-out). (2) More practical at flash sale scale: cache the inventory in ElastiCache (Redis) with a short TTL (1–5 seconds). The application checks Redis first; cache miss triggers a DynamoDB read and repopulates cache. Redis can handle 100K+ reads/second at sub-millisecond latency. (3) Use DynamoDB DAX (in-memory cache layer) which is transparent to the application.

References

AWS DynamoDB Developer Guide — Best Practices
Alex DeBrie — The DynamoDB Book
Cassandra Data Modeling Documentation
MongoDB Schema Design Patterns
Redis Streams Introduction
Netflix Tech Blog — Cassandra at Netflix
Netflix Tech Blog — Chaos Engineering
"Designing Data-Intensive Applications" — Kleppmann, Ch. 2 (Data Models), Ch. 5 (Replication)
Cassandra: The Definitive Guide — Carpenter & Hewitt

Part 1: Foundations

Part 2: Engine Deep Dives

Part 3: Scaling & Operations

Part 4: Real-World Design

Chapter 7: NoSQL at Scale

Mind Map

DynamoDB: Single-Table Design

The Single-Table Principle

E-Commerce Single-Table Schema

GSI Overloading

Access Pattern Planning

Cassandra: Partition and Clustering Keys

Partition Routing

Schema Design: Primary Key Anatomy

Wide Partition Anti-Pattern

Tombstones: The Hidden Performance Killer

Compaction Strategies

MongoDB: Schema Patterns

Embedding vs Referencing

Advanced Schema Patterns

MongoDB Aggregation Pipeline

Redis and Valkey: Beyond Caching

Sorted Sets: Leaderboards and Rate Limiting

Redis Streams: Durable Event Log

Redis Modules and Valkey Compatibility

Memory Management

NoSQL Consistency Patterns

Consistency Levels Compared

Read Repair and Hinted Handoff

Monitoring & Observability

Case Study: Netflix's Cassandra Deployment

Common Mistakes

Practice Questions

Beginner

Intermediate

Advanced

References

Chapter 7: NoSQL at Scale ​

Mind Map ​

DynamoDB: Single-Table Design ​

The Single-Table Principle ​

E-Commerce Single-Table Schema ​

GSI Overloading ​

Access Pattern Planning ​

Cassandra: Partition and Clustering Keys ​

Partition Routing ​

Schema Design: Primary Key Anatomy ​

Wide Partition Anti-Pattern ​

Tombstones: The Hidden Performance Killer ​

Compaction Strategies ​

MongoDB: Schema Patterns ​

Embedding vs Referencing ​

Advanced Schema Patterns ​

MongoDB Aggregation Pipeline ​

Redis and Valkey: Beyond Caching ​

Sorted Sets: Leaderboards and Rate Limiting ​

Redis Streams: Durable Event Log ​

Redis Modules and Valkey Compatibility ​

Memory Management ​

NoSQL Consistency Patterns ​

Consistency Levels Compared ​

Read Repair and Hinted Handoff ​

Monitoring & Observability ​

Case Study: Netflix's Cassandra Deployment ​

Related Chapters ​

Common Mistakes ​

Practice Questions ​

Beginner ​

Intermediate ​

Advanced ​

References ​

Chapter 7: NoSQL at Scale

Mind Map

DynamoDB: Single-Table Design

The Single-Table Principle

E-Commerce Single-Table Schema

GSI Overloading

Access Pattern Planning

Cassandra: Partition and Clustering Keys

Partition Routing

Schema Design: Primary Key Anatomy

Wide Partition Anti-Pattern

Tombstones: The Hidden Performance Killer

Compaction Strategies

MongoDB: Schema Patterns

Embedding vs Referencing

Advanced Schema Patterns

MongoDB Aggregation Pipeline

Redis and Valkey: Beyond Caching

Sorted Sets: Leaderboards and Rate Limiting

Redis Streams: Durable Event Log

Redis Modules and Valkey Compatibility

Memory Management

NoSQL Consistency Patterns

Consistency Levels Compared

Read Repair and Hinted Handoff

Monitoring & Observability

Case Study: Netflix's Cassandra Deployment

Related Chapters

Common Mistakes

Practice Questions

Beginner

Intermediate

Advanced

References