Chapter 4: Transactions & Concurrency Control

"A transaction is a promise that either everything happens or nothing happens. Keeping that promise under concurrent load is the hardest problem in database engineering."

Mind Map

Prerequisites

This chapter assumes you understand WAL mechanics (Ch01) and the schema design patterns that drive concurrency requirements (Ch02). Review those first if needed.

Overview

Most application developers interact with transactions through a simple mental model: BEGIN, do some work, COMMIT. If something goes wrong, ROLLBACK. This model is correct for simple cases, but it breaks down when multiple transactions run concurrently — which is the normal state of any production database.

The core tension in concurrency control: higher isolation guarantees fewer anomalies but costs throughput. Every isolation level represents a deliberate trade-off. This chapter explains precisely what each level guarantees, what PostgreSQL's MVCC implementation does under the hood, and how to implement correct concurrency patterns in application code.

ACID Properties

ACID is a set of properties that define what a "correct" transaction means.

Property	Guarantee	Mechanism
Atomicity	All operations in a transaction succeed, or none do	WAL + rollback log
Consistency	Transaction brings database from one valid state to another	Constraints, triggers, application logic
Isolation	Concurrent transactions appear to execute serially	MVCC + locking
Durability	Committed data survives crashes	WAL fsynced before COMMIT returns

Consistency Is Not Guaranteed by the Database Alone

Atomicity, isolation, and durability are enforced by PostgreSQL. Consistency depends on correct application logic + proper constraint definitions. A transaction that atomically transfers $100 from an account with $50 balance violates consistency — but only if the application or database constraint catches it.

Isolation Levels Deep Dive

SQL defines four isolation levels. Each prevents a different set of concurrency anomalies.

The Anomalies

Before covering the levels, understand what anomalies you are protecting against:

Dirty Read: Transaction A reads data written by Transaction B that has not yet committed. If B rolls back, A has read data that never existed.

Non-Repeatable Read: Transaction A reads a row. Transaction B updates and commits that row. Transaction A reads the row again — different value. The same SELECT returns different results within one transaction.

Phantom Read: Transaction A queries "all orders > $100." Transaction B inserts a new order > $100 and commits. Transaction A re-runs the same query — sees an extra row. New rows appeared mid-transaction.

Write Skew: Two transactions both read the same set of rows, both decide to write based on what they read, and their combined writes produce an invalid state that neither transaction alone would have caused.

Lost Update: Two transactions both read a row, both modify it, and both commit. The second commit overwrites the first — one update is silently lost.

Isolation Level Guarantees

Isolation Level	Dirty Read	Non-Repeatable Read	Phantom Read	Write Skew
READ UNCOMMITTED	Possible	Possible	Possible	Possible
READ COMMITTED	Prevented	Possible	Possible	Possible
REPEATABLE READ	Prevented	Prevented	Possible*	Possible
SERIALIZABLE	Prevented	Prevented	Prevented	Prevented

*PostgreSQL's REPEATABLE READ also prevents phantom reads due to MVCC snapshot semantics. Standard SQL allows phantoms at REPEATABLE READ.

READ COMMITTED (PostgreSQL default)

Each statement in a transaction sees data committed immediately before the statement starts. A long-running transaction sees different snapshots per statement.

sql

-- Transaction A (READ COMMITTED - default)
BEGIN;
SELECT balance FROM accounts WHERE id = 1;  -- sees $100
-- Transaction B commits: UPDATE accounts SET balance = 50 WHERE id = 1;
SELECT balance FROM accounts WHERE id = 1;  -- now sees $50 (non-repeatable read!)
COMMIT;

READ COMMITTED is the PostgreSQL default because it provides the best performance (minimal blocking) while preventing the most dangerous anomaly (dirty reads). Most web applications using simple request-scoped transactions work correctly at this level.

REPEATABLE READ

Each transaction sees a snapshot of the database as of the first statement in the transaction. Subsequent reads of the same row always return the same value.

sql

-- Transaction A (REPEATABLE READ)
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT balance FROM accounts WHERE id = 1;  -- sees $100
-- Transaction B commits: UPDATE accounts SET balance = 50 WHERE id = 1;
SELECT balance FROM accounts WHERE id = 1;  -- still sees $100 (snapshot from txn start)
COMMIT;

Use REPEATABLE READ for: generating consistent reports, analytics queries that span multiple statements, any operation that needs a stable view of the database.

SERIALIZABLE with SSI

SERIALIZABLE guarantees that the outcome of concurrent transactions is equivalent to some serial (one-after-another) execution. PostgreSQL implements this via Serializable Snapshot Isolation (SSI), which tracks read/write dependencies between transactions and aborts transactions that would produce anomalous results.

sql

-- Classic write skew example: on-call scheduling
-- Rule: at least one doctor must always be on call
-- doctor_1 is on call, doctor_2 is on call

-- Transaction A: doctor_1 removes herself (sees doctor_2 is on call)
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT COUNT(*) FROM on_call WHERE date = '2024-01-15';  -- returns 2
UPDATE on_call SET active = false WHERE doctor_id = 1 AND date = '2024-01-15';
COMMIT;  -- might be aborted by SSI

-- Transaction B (concurrent): doctor_2 removes herself (sees doctor_1 is on call)
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT COUNT(*) FROM on_call WHERE date = '2024-01-15';  -- also returns 2
UPDATE on_call SET active = false WHERE doctor_id = 2 AND date = '2024-01-15';
COMMIT;  -- SSI detects the conflict, aborts one transaction

-- Result: one doctor remains on call

SSI detects the conflict by tracking that both transactions read the on-call count (a "predicate read") and both wrote to on_call — a read-write conflict cycle. One is aborted with ERROR: could not serialize access due to read/write dependencies among transactions.

Handle Serialization Failures

Applications using SERIALIZABLE must handle SQLSTATE 40001 (serialization failure) by retrying the transaction. This is not optional — SSI will abort transactions with detected conflicts. Libraries like psycopg2 expose this as TransactionRollbackError.

Version Note

PostgreSQL examples verified against PostgreSQL 17/18. The current major release is PostgreSQL 18 (18.4, May 2026); v19 is in beta (GA expected September 2026). Check the release notes for your version.

MVCC Deep Dive

PostgreSQL implements isolation levels through Multi-Version Concurrency Control (MVCC). The key principle: writers never block readers, and readers never block writers. Each transaction sees its own consistent snapshot of the database.

Tuple Versioning: xmin and xmax

Every row (tuple) in PostgreSQL has two hidden system columns:

xmin: the transaction ID that inserted this tuple
xmax: the transaction ID that deleted or updated this tuple (0 if still current)

sql

-- See the hidden system columns
SELECT xmin, xmax, id, username, email
FROM users
WHERE id = 1;

-- xmin=245 xmax=0: row was inserted by txn 245, still live
-- xmin=245 xmax=301: row was inserted by txn 245, deleted/updated by txn 301

Visibility Rules

When Transaction T reads a row, it applies this visibility check:

If xmin transaction committed before T's snapshot → the row was inserted and visible
If xmax = 0 → the row has not been deleted → visible
If xmax transaction committed before T's snapshot → the row was deleted before T started → not visible
If xmax transaction is still in progress → T cannot see the deletion yet → visible

This means PostgreSQL stores multiple versions of the same logical row:

users table (physical):
xmin=100, xmax=0,   id=1, username='alice',    email='old@example.com'  ← current
xmin=200, xmax=250, id=1, username='alice',    email='alice@example.com' ← deleted by txn 250
xmin=250, xmax=0,   id=1, username='alice',    email='new@example.com'   ← newest version

Transaction 260 would see the row with email='new@example.com' (xmin=250, xmax=0, xmin committed before 260's snapshot).

Vacuum: Reclaiming Dead Tuples

Old row versions (where xmax is set and the transaction committed) accumulate as "dead tuples." They waste disk space and slow down sequential scans. VACUUM removes them.

sql

-- See dead tuples accumulating
SELECT relname, n_live_tup, n_dead_tup, last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

-- Manual vacuum (autovacuum usually handles this)
VACUUM ANALYZE users;

-- VACUUM FULL: reclaims space fully but rewrites the table (use sparingly)
VACUUM FULL users;

Autovacuum Tuning

Autovacuum runs automatically when n_dead_tup > autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * n_live_tup.

For high-churn tables (like sessions, events, rate_limit_counters):

sql

-- Per-table autovacuum tuning (overrides global settings)
ALTER TABLE sessions SET (
    autovacuum_vacuum_scale_factor = 0.01,   -- trigger at 1% dead tuples (default 20%)
    autovacuum_vacuum_threshold = 100,        -- trigger after 100 dead tuples (default 50)
    autovacuum_analyze_scale_factor = 0.005   -- analyze at 0.5% changes
);

Transaction ID Wraparound: The Hidden Danger

PostgreSQL's transaction IDs are 32-bit integers, so they wrap around after ~2 billion transactions. If a table's xmin values are not vacuumed before wraparound, PostgreSQL can lose data. VACUUM prevents this by "freezing" old tuples (setting xmin to a special value). Monitor pg_database.datfrozenxid — if it approaches autovacuum_freeze_max_age (default 200M), autovacuum will aggressively vacuum all tables. This can cause I/O spikes on large databases.

Locking in Practice

MVCC handles read/write conflicts. Locking handles write/write conflicts.

Row-Level Locks

PostgreSQL acquires row-level locks automatically for UPDATE, DELETE, and SELECT FOR UPDATE.

sql

-- SELECT FOR UPDATE: lock the row for subsequent update
-- Other transactions trying to UPDATE this row will wait
BEGIN;
SELECT * FROM inventory WHERE product_id = 42 FOR UPDATE;
-- Now safely update knowing no other transaction can modify this row
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 42;
COMMIT;

-- SELECT FOR UPDATE SKIP LOCKED: skip rows that are already locked
-- Useful for job queues: each worker picks a different job
SELECT * FROM jobs
WHERE status = 'pending'
ORDER BY priority DESC
LIMIT 1
FOR UPDATE SKIP LOCKED;

Advisory Locks

Application-level locks that are not tied to any database object. PostgreSQL stores them in shared memory, not in tables.

sql

-- Acquire an advisory lock by integer key
-- Useful for: preventing concurrent runs of the same cron job
SELECT pg_try_advisory_lock(12345);  -- returns true if acquired, false if already held

-- Release the lock
SELECT pg_advisory_unlock(12345);

-- Session-scoped advisory lock (auto-released on disconnect)
SELECT pg_advisory_lock(hashtext('daily_report_job'));
-- ... run daily report ...
SELECT pg_advisory_unlock(hashtext('daily_report_job'));

Deadlock Detection

A deadlock occurs when two transactions each hold a lock the other needs. PostgreSQL detects deadlock cycles and aborts one transaction automatically.

Prevention: Always acquire locks in the same order. If your code updates both accounts.balance and transactions.amount, always update the account first, transaction second — never the reverse.

sql

-- Deadlock-prone: different lock ordering in concurrent transactions
-- Transaction A: UPDATE accounts WHERE id=1, then UPDATE accounts WHERE id=2
-- Transaction B: UPDATE accounts WHERE id=2, then UPDATE accounts WHERE id=1

-- Safe: consistent ordering (always lower id first)
UPDATE accounts SET balance = balance - 100 WHERE id = LEAST($1, $2);
UPDATE accounts SET balance = balance + 100 WHERE id = GREATEST($1, $2);

Optimistic Concurrency Control

Pessimistic locking (SELECT FOR UPDATE) works well when conflicts are frequent. When conflicts are rare, optimistic concurrency control (OCC) performs better by avoiding locks entirely and detecting conflicts only at commit time.

Version Column Pattern

sql

-- Add a version column to the table
ALTER TABLE products ADD COLUMN version INT NOT NULL DEFAULT 1;

-- Application code: read, then update with version check
BEGIN;
SELECT id, price, version FROM products WHERE id = $1;
-- ... application modifies price ...
UPDATE products
SET price = $new_price, version = version + 1
WHERE id = $1 AND version = $expected_version;

-- Check if any row was updated
-- If rowcount = 0: another transaction updated first → retry or report conflict
GET DIAGNOSTICS updated = ROW_COUNT;
IF updated = 0 THEN
    ROLLBACK;
    -- return conflict error to application
END IF;
COMMIT;

The WHERE version = $expected_version clause is the atomic compare-and-swap. If another transaction has incremented version since the read, the UPDATE matches zero rows, signaling the conflict.

When to Use OCC vs PCC

Approach	Best When	Downside
Pessimistic (SELECT FOR UPDATE)	Conflicts are frequent; can't afford retries	Locks held for transaction duration; deadlock risk
Optimistic (version column)	Conflicts are rare; retries are cheap	Retry logic required; wasted work on conflict
SERIALIZABLE	Complex business rules with write skew risk	Highest abort rate; requires retry logic

Distributed Transactions

When a transaction spans multiple databases (or services), the local transaction mechanisms above are insufficient. Two approaches dominate production systems.

Two-Phase Commit (2PC)

2PC is a protocol that ensures all participants in a distributed transaction either all commit or all abort.

2PC problems:

Blocking: If the coordinator crashes after PREPARE but before COMMIT/ABORT, all participants are blocked (holding locks) until coordinator recovers
Performance: Two round-trips + fsync at each participant before commit
Availability: Cannot commit if any participant is unavailable

PostgreSQL supports 2PC via PREPARE TRANSACTION / COMMIT PREPARED / ROLLBACK PREPARED, but it is rarely used directly in applications — distributed databases (CockroachDB, Spanner) implement it internally.

The Saga Pattern

The Saga pattern replaces a distributed ACID transaction with a sequence of local transactions, each with a corresponding compensating transaction for rollback.

Saga trade-offs:

Eventually consistent: Between steps, the system may be in a partial state visible to other transactions
No atomicity guarantee: If a compensating transaction fails, manual intervention may be required
Simpler failure model: No coordinator crash problem; each service recovers independently

Sagas are the dominant pattern for microservices transactions. Two implementation styles:

Choreography: Each service publishes events that trigger the next step (no central coordinator)
Orchestration: A dedicated saga orchestrator service tracks state and calls each step

Case Study: Stripe's Payment Idempotency

Stripe processes millions of payments daily. Network failures and client timeouts mean any operation can be retried — but a payment must not be charged twice. Stripe's idempotency implementation is a canonical example of applying concurrency control to real-world distributed systems.

The Problem: Client sends POST /v1/charges to create a $100 charge. Network times out after 30 seconds. Client doesn't know if the charge succeeded. Client retries. Without idempotency, the customer gets charged $200.

Stripe's Solution: Idempotency Keys

sql

CREATE TABLE idempotency_keys (
    idempotency_key VARCHAR(255)    NOT NULL,
    user_id         BIGINT          NOT NULL,
    request_method  VARCHAR(10)     NOT NULL,
    request_path    VARCHAR(255)    NOT NULL,
    request_params  JSONB           NOT NULL,
    response_code   INT,
    response_body   JSONB,
    -- Lock state during in-flight processing
    locked_at       TIMESTAMPTZ,
    -- Stored result for replay
    recovery_point  VARCHAR(50),
    created_at      TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
    PRIMARY KEY (user_id, idempotency_key)
);

The Flow:

Key details:

The locked_at column prevents two concurrent identical requests from both processing
The entire operation (lookup + insert + lock) is wrapped in a transaction with SELECT FOR UPDATE on the idempotency key
Stripe retains idempotency keys for 24 hours — after that, the same key creates a new charge

MVCC interaction: The SELECT FOR UPDATE on the idempotency key row ensures that even under READ COMMITTED isolation, concurrent requests with the same key serialize. One request wins the lock, completes, releases. The second request then reads the completed response and replays it.

The lesson: Production payment systems must be designed for failure from the start. Idempotency is not an optimization — it is a correctness requirement. The database mechanisms (transactions, row locks, MVCC) are the tools; the design pattern is the architecture.

Chapter	Relevance
Ch01 — The Database Landscape	WAL mechanics that make durability possible
Ch02 — Data Modeling for Scale	Schema patterns for idempotency and event sourcing
Ch03 — Indexing Strategies	How MVCC dead tuples interact with index maintenance
Ch09 — Replication & High Availability	How transaction isolation interacts with replication lag
System Design Ch15 — Data Replication & Consistency	CAP theorem and consistency models at the distributed level

Common Mistakes

Mistake	Why It Happens	Impact	Fix
Using SERIALIZABLE for all transactions	"Maximum safety = always correct"	Unnecessary abort rate; every transaction incurs SSI tracking overhead	Use READ COMMITTED (default) for most; upgrade to SERIALIZABLE only for complex business invariants with write skew risk
Holding locks during external API calls	Transaction wraps an HTTP call to payment provider	Lock held for seconds; timeouts cascade to deadlocks	Acquire data, close transaction, make external call, open new transaction to record result
Ignoring deadlock retry logic	"Deadlocks are rare"	Unhandled `SQLSTATE 40P01` crashes the request	Retry on `serialization_failure` (40001) and `deadlock_detected` (40P01) with exponential backoff
Using SELECT FOR UPDATE on high-contention rows without SKIP LOCKED	Unfamiliar with SKIP LOCKED	Job queue serializes — only one worker processes at a time	Use `FOR UPDATE SKIP LOCKED` for queue-style workloads

Practice Questions

Beginner

Isolation Level Selection: A bank transfer system deducts $500 from account A and adds $500 to account B. Which isolation level should you use, and what anomaly are you specifically protecting against? Show the SQL with the correct isolation level.
Model Answer
For a simple two-row update like a bank transfer, READ COMMITTED with `SELECT FOR UPDATE` on both rows is sufficient (prevents lost updates). SERIALIZABLE protects against more complex write skew scenarios. The SQL: `BEGIN; SELECT * FROM accounts WHERE id IN (A, B) FOR UPDATE; UPDATE accounts SET balance = balance - 500 WHERE id = A; UPDATE accounts SET balance = balance + 500 WHERE id = B; COMMIT;`
MVCC Basics: A row has xmin=100, xmax=0. Transaction 200 runs a SELECT. What does it see? Now transaction 300 updates the row (xmax=300, a new tuple has xmin=300, xmax=0). Transaction 200 runs SELECT again. What does it see under REPEATABLE READ vs READ COMMITTED? Why?
Model Answer
Under REPEATABLE READ: transaction 200 took a snapshot at its start (before txn 300 committed). It still sees the old version (xmin=100, xmax=0) — snapshot isolation prevents the non-repeatable read. Under READ COMMITTED: each SELECT takes a new snapshot; the second SELECT sees the new version (xmin=300, xmax=0) because txn 300 committed before the second SELECT ran.

Intermediate

Write Skew: A hospital application tracks which doctors are on call. The rule is: at least one doctor must be on call at all times. Two doctors both query "how many doctors are on call" (answer: 1) and both decide to go off call. Design a solution that prevents this using each of: (a) application-level locking, (b) optimistic concurrency control, (c) SERIALIZABLE isolation.
Model Answer
(a) Advisory lock: `SELECT pg_advisory_lock(hashtext('on_call_update'))` before any read-modify-write on the on_call table. (b) OCC: add a version column on a `on_call_schedule` row; both doctors read version=5, first commit sets version=6, second commit sees version mismatch and retries with fresh count. (c) SERIALIZABLE: use `BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE`; SSI detects the read-write dependency and aborts one transaction.
Deadlock Prevention: An order processing system updates two tables in every transaction: orders and line_items. Occasionally, deadlocks occur. The code currently updates line_items first, then orders. What is causing the deadlocks and what is the fix?
Model Answer
Deadlocks occur when two transactions acquire locks in different orders. If Transaction A updates order_id=1 line items then order_id=1 header, and Transaction B updates order_id=1 header then order_id=1 line items, they deadlock. Fix: always acquire locks in the same order — always update `orders` first, then `line_items`. Or: use `SELECT ... FOR UPDATE` on the `orders` row first to serialize all access at the beginning of the transaction.

Advanced

Distributed Transaction Design: An e-commerce checkout flow spans three services: (a) InventoryService decrements stock, (b) PaymentService charges the credit card, (c) OrderService creates the order record. Design a Saga implementation (choreography or orchestration) that handles: successful checkout, payment failure (must release inventory), order creation failure (must refund payment + release inventory). Include the compensating transactions and how you ensure each step is idempotent.
Model Answer
Use orchestration (simpler failure tracking). Saga states: STARTED → INVENTORY_RESERVED → PAYMENT_CHARGED → ORDER_CREATED → COMPLETED. Compensating transactions: ORDER_FAILED → refund_payment → PAYMENT_REFUNDED → release_inventory → SAGA_FAILED. Idempotency: each step takes a saga_id as idempotency key; InventoryService checks if `saga_id` already has a reservation before decrementing. PaymentService checks if `saga_id` already has a charge. This ensures retries are safe.

References & Further Reading

PostgreSQL Documentation — Transaction Isolation
PostgreSQL Documentation — MVCC
"Designing Data-Intensive Applications" — Chapter 7: Transactions
Serializable Snapshot Isolation in PostgreSQL — Dan R. K. Ports & Kevin Grittner (VLDB 2012)
Stripe: Designing Robust and Predictable APIs with Idempotency
Saga Pattern for Microservices — Chris Richardson
Two-Phase Commit Protocol — PostgreSQL Documentation
A Critique of ANSI SQL Isolation Levels — Berenson et al. (1995) — explains write skew

Part 1: Foundations

Part 2: Engine Deep Dives

Part 3: Scaling & Operations

Part 4: Real-World Design

Chapter 4: Transactions & Concurrency Control

Mind Map

Overview

ACID Properties

Isolation Levels Deep Dive

The Anomalies

Isolation Level Guarantees

READ COMMITTED (PostgreSQL default)

REPEATABLE READ

SERIALIZABLE with SSI

MVCC Deep Dive

Tuple Versioning: xmin and xmax

Visibility Rules

Vacuum: Reclaiming Dead Tuples

Autovacuum Tuning

Locking in Practice

Row-Level Locks

Advisory Locks

Deadlock Detection

Optimistic Concurrency Control

Version Column Pattern

When to Use OCC vs PCC

Distributed Transactions

Two-Phase Commit (2PC)

The Saga Pattern

Case Study: Stripe's Payment Idempotency

Common Mistakes

Practice Questions

Beginner

Intermediate

Advanced

References & Further Reading

Chapter 4: Transactions & Concurrency Control ​

Mind Map ​

Overview ​

ACID Properties ​

Isolation Levels Deep Dive ​

The Anomalies ​

Isolation Level Guarantees ​

READ COMMITTED (PostgreSQL default) ​

REPEATABLE READ ​

SERIALIZABLE with SSI ​

MVCC Deep Dive ​

Tuple Versioning: xmin and xmax ​

Visibility Rules ​

Vacuum: Reclaiming Dead Tuples ​

Autovacuum Tuning ​

Locking in Practice ​

Row-Level Locks ​

Advisory Locks ​

Deadlock Detection ​

Optimistic Concurrency Control ​

Version Column Pattern ​

When to Use OCC vs PCC ​

Distributed Transactions ​

Two-Phase Commit (2PC) ​

The Saga Pattern ​

Case Study: Stripe's Payment Idempotency ​

Related Chapters ​

Common Mistakes ​

Practice Questions ​

Beginner ​

Intermediate ​

Advanced ​

References & Further Reading ​

Chapter 4: Transactions & Concurrency Control

Mind Map

Overview

ACID Properties

Isolation Levels Deep Dive

The Anomalies

Isolation Level Guarantees

READ COMMITTED (PostgreSQL default)

REPEATABLE READ

SERIALIZABLE with SSI

MVCC Deep Dive

Tuple Versioning: xmin and xmax

Visibility Rules

Vacuum: Reclaiming Dead Tuples

Autovacuum Tuning

Locking in Practice

Row-Level Locks

Advisory Locks

Deadlock Detection

Optimistic Concurrency Control

Version Column Pattern

When to Use OCC vs PCC

Distributed Transactions

Two-Phase Commit (2PC)

The Saga Pattern

Case Study: Stripe's Payment Idempotency

Related Chapters

Common Mistakes

Practice Questions

Beginner

Intermediate

Advanced

References & Further Reading