Chapter 13: Microservices Architecture

The goal of microservices is not to make systems more complex — it is to make complexity manageable by assigning clear ownership. A well-designed microservices system is harder to build than a monolith, but far easier to evolve.

Mind Map

Monolith vs Microservices

Before choosing microservices, you must understand what you are trading away — and what you gain. Neither architecture is universally correct; the right choice depends on team size, deployment velocity requirements, and domain complexity.

Architecture Comparison

Comparison Table

Dimension	Monolith	Microservices
Deployment	Single unit — all-or-nothing releases	Independent — each service deploys separately
Scaling	Scale entire app even for one bottleneck	Scale individual services to demand
Complexity	Simple locally, complex as codebase grows	High operational complexity from day one
Team Autonomy	Shared codebase creates coordination overhead	Teams own their services end-to-end
Data Ownership	Shared database — easy joins, tight coupling	Each service owns its data — loose coupling
Testing	Straightforward integration testing	Service mocking, contract testing required
Latency	In-process calls (nanoseconds)	Network calls (milliseconds) per hop
Fault Isolation	One bug can crash the entire application	Failures are contained to individual services
Time to Market	Fast for small teams; slows with scale	Slower initial setup; faster long-term velocity
Observability	Single log stream, simpler tracing	Distributed tracing required (see Chapter 17)

Rule of thumb: Start with a well-structured monolith. Migrate to microservices when team size, deployment frequency, or scaling requirements genuinely justify the operational overhead. Amazon's rule: if a single team can't be fed with two pizzas, the service is too large.

Service Discovery

In a microservices system, service instances start and stop dynamically — containers are rescheduled, pods restart, and IP addresses change constantly. Service discovery solves the problem of how services find each other without hardcoded addresses.

Client-Side Discovery

The client is responsible for querying the service registry and selecting an instance using a load-balancing strategy (round-robin, least-connections, etc.).

Tools: Netflix Eureka, HashiCorp Consul, Apache Zookeeper

Tradeoff: Client gains control over load-balancing logic but must embed discovery logic in every service. Client libraries become a coupling point.

Server-Side Discovery

The client sends requests to a load balancer or router. The router queries the service registry and forwards the request to the appropriate instance. The client knows nothing about discovery.

Tools: AWS ALB with ECS service discovery, Kubernetes DNS + kube-proxy, NGINX Plus

Tradeoff: Simpler clients, but the load balancer becomes a critical path component. Kubernetes uses this model natively — a Service object gets a stable DNS name (order-service.default.svc.cluster.local) that resolves to the active pods.

Service Registry Tools

Tool	Model	Consensus	Best For
Consul	CP (strong consistency)	Raft	Multi-datacenter, health checking, KV store
etcd	CP	Raft	Kubernetes backing store, configuration
Zookeeper	CP	ZAB	Legacy Hadoop/Kafka ecosystems
Kubernetes DNS	DNS-based	N/A (backed by etcd)	Native Kubernetes workloads
Eureka	AP (availability-first)	None	Netflix OSS stack, eventual consistency acceptable

API Gateway

An API Gateway is the single entry point for all client traffic. Instead of exposing dozens of service URLs to clients, the gateway presents a unified facade and handles cross-cutting concerns so individual services don't have to.

Responsibilities

Routing — maps public paths to internal service endpoints
Authentication & Authorization — validates JWTs, API keys, OAuth tokens before traffic reaches services
Rate Limiting — protects services from traffic spikes (see Chapter 16)
Request Aggregation — fan-out to multiple services and compose a single response
Protocol Translation — converts external REST calls to internal gRPC calls
SSL Termination — decrypts HTTPS at the gateway; internal traffic uses plain HTTP or mutual TLS (mTLS) via service mesh

Gateway Traffic Flow

Tools: AWS API Gateway, Kong, NGINX, Envoy, Traefik, Spring Cloud Gateway

Backend for Frontend (BFF) Pattern

A single API Gateway serving both mobile and web clients creates tension — mobile needs compact payloads, web needs richer data. The BFF pattern creates a dedicated gateway per client type, allowing each to evolve independently.

Real-world: Netflix maintains separate BFFs for TV, mobile, and web — each optimized for its device's screen size, network conditions, and interaction model.

Inter-Service Communication

Services must communicate. The choice between synchronous and asynchronous communication affects latency, coupling, and fault tolerance.

Synchronous Communication

The caller blocks and waits for a response. Simple to reason about but creates temporal coupling — if the downstream service is slow or unavailable, the caller is affected.

REST over HTTP/1.1 or HTTP/2 — widely understood, good tooling, cache-friendly. Best for CRUD-style interactions with external or public-facing services. See Chapter 12 for HTTP deep-dive.
gRPC — Protocol Buffers + HTTP/2. Strongly typed, efficient binary serialization, supports streaming. Best for internal service-to-service calls where performance matters. See Chapter 12 for gRPC deep-dive.

Asynchronous Communication

The caller publishes an event or message and continues execution. The downstream service processes the message independently. Decouples services in time and increases resilience.

Message queues (point-to-point) — one producer, one consumer. Durable delivery. Best for task delegation (e.g., send email, process image). See Chapter 11 for message queue deep-dive.
Event bus (pub/sub) — one producer, many consumers. Best for broadcasting domain events to multiple subscribers. See Chapter 14 for event-driven patterns.

Communication Comparison Table

Dimension	REST	gRPC	Message Queue	Event Bus
Coupling	Temporal + interface	Temporal + strict types	Decoupled	Fully decoupled
Latency	Low (ms)	Very low (ms, binary)	Higher (async)	Higher (async)
Payload	JSON (human-readable)	Protobuf (binary, compact)	Any format	Any format
Type Safety	Optional (OpenAPI)	Strong (`.proto` schema)	Schema optional	Schema optional
Streaming	Limited (chunked transfer)	Native (4 modes)	Pull-based	Push-based
Error Handling	HTTP status codes	gRPC status codes	Dead-letter queues	Event replay
Use Case	Public APIs, CRUD	Internal high-perf calls	Task queues	Domain events

Circuit Breaker Pattern

When Service A calls Service B synchronously, a slow or failing Service B will cause Service A's threads to pile up waiting for responses. Under load, this cascades — Service A's connection pool exhausts, its own latency rises, and the failure propagates upstream. The circuit breaker prevents this cascade.

States

Closed: Normal operation. Requests pass through. Failures are counted in a rolling window.

Open: Failure threshold exceeded. Requests are rejected immediately (fail fast) without calling the downstream service. A fallback response (cached data, default value, error) is returned. The downstream service gets time to recover.

Half-Open: After a configurable timeout, a small number of probe requests are allowed through. If they succeed, the circuit closes. If they fail, the circuit reopens.

Implementation

// Pseudocode: Circuit Breaker logic
function callWithCircuitBreaker(serviceCall, circuitBreaker):
    if circuitBreaker.state == OPEN:
        if circuitBreaker.timeoutElapsed():
            circuitBreaker.state = HALF_OPEN
        else:
            return fallbackResponse()  // fail fast

    try:
        response = serviceCall()
        circuitBreaker.recordSuccess()
        if circuitBreaker.state == HALF_OPEN:
            circuitBreaker.state = CLOSED
        return response
    catch Exception:
        circuitBreaker.recordFailure()
        if circuitBreaker.failureRateExceeds(threshold):
            circuitBreaker.state = OPEN
        throw

Libraries: Netflix Hystrix (now in maintenance), Resilience4j (JVM), Polly (.NET), circuitbreaker package (Go), pybreaker (Python).

Real-world: Netflix's Hystrix was born from a real incident — a single downstream service's degradation caused their entire API to become unresponsive. By wrapping every external call in a circuit breaker, Netflix guarantees that no single service failure can take down the API tier.

Saga Pattern for Distributed Transactions

In a monolith with a shared database, a multi-step operation (place order → reserve inventory → charge payment → send confirmation) can be wrapped in a single ACID transaction. In microservices, each service owns its own database — there is no distributed ACID transaction.

The Saga pattern breaks a distributed transaction into a sequence of local transactions. Each step publishes an event or sends a command to trigger the next step. If a step fails, compensating transactions are executed in reverse to undo completed steps.

Choreography-Based Saga

No central coordinator. Each service listens for events and decides what to do next. Services are fully decoupled.

Advantages: No single point of failure, fully decoupled, easy to add new participants. Disadvantages: Hard to track overall transaction state; debugging requires tracing events across multiple services.

Orchestration-Based Saga

A central Saga Orchestrator drives the process by sending commands to each service and waiting for responses.

Choreography vs Orchestration

Dimension	Choreography	Orchestration
Coordination	Implicit via events	Explicit — central orchestrator
Coupling	Low — services only know events	Medium — services know orchestrator protocol
Visibility	Low — state distributed across services	High — orchestrator holds full state
Debugging	Hard — must trace events across logs	Easier — orchestrator logs full saga state
Single Point of Failure	None	Orchestrator (mitigate with high availability)
Best For	Simple, stable flows with few participants	Complex flows, many compensations, regulatory audit requirements

Cross-reference: Choreography-based sagas rely on event-driven patterns covered in Chapter 14.

Service Mesh

As the number of microservices grows, cross-cutting networking concerns — mutual TLS, retries, timeouts, circuit breaking, observability — are duplicated in every service. A service mesh externalizes these concerns from application code into a dedicated infrastructure layer.

Sidecar Proxy Pattern

Every service instance gets a co-located sidecar proxy (e.g., Envoy). All inbound and outbound network traffic flows through the proxy, not directly to the service. The application code is unaware of the mesh.

What a Service Mesh Handles

Concern	Without Mesh	With Mesh
mTLS encryption	Each service implements TLS	Sidecar handles cert rotation automatically
Retries & timeouts	Embedded in each service's HTTP client	Configured in mesh policy, applied uniformly
Circuit breaking	Each service integrates Resilience4j	Configured once in mesh, applied to all
Load balancing	Client-side or DNS round-robin	Sidecar uses L7 policies (least-request, zone-aware)
Distributed tracing	Manual instrumentation per service	Automatic trace propagation via headers
Traffic splitting	Custom code or feature flags	Mesh policy (e.g., 10% canary, 90% stable)
Observability	Each service emits metrics manually	Golden signals (latency, errors, saturation) auto-generated

Tools: Istio (most feature-rich, higher complexity), Linkerd (lightweight, simpler operational model), Consul Connect, AWS App Mesh.

When to use a service mesh: Justified at 10+ services when operational burden of per-service networking logic exceeds the complexity of running the mesh itself. Avoid premature adoption — the control plane adds real operational overhead.

Service Mesh Evolution: Sidecar → Ambient Mode (2026)

The sidecar model has a well-understood cost: every pod gets an Envoy proxy, adding ~50MB RAM per pod and an extra network hop to every inbound and outbound call. At 1,000 pods, that's 50 GB of memory consumed purely by proxies.

Istio Ambient Mode, entering early production adoption in 2026, eliminates per-pod sidecars entirely. Instead, networking is handled by two shared components:

ztunnel — a per-node L4 proxy (DaemonSet) that handles mTLS encryption and basic routing for all pods on the node. One process per node instead of one sidecar per pod.
Waypoint proxy — an optional per-namespace/service Envoy that handles L7 policy (authorization, retries, traffic splitting) only when needed. Deployed on demand, not by default.

Performance gains vs sidecar Istio (production measurements, 2026):

Metric	Sidecar Istio	Ambient Mode	Improvement
Memory per service	~50 MB (Envoy sidecar)	~3 MB (ztunnel share)	~90% reduction
Added latency	5–10ms (extra hop)	~1ms (kernel-level)	~25% reduction
CPU overhead per pod	5–15%	<1% (eBPF assist)	~75% reduction
Deployment complexity	Sidecar injection per pod	DaemonSet + on-demand waypoints	Simpler

Cilium integration: Ambient mode can use Cilium's eBPF data plane for the L4 layer instead of ztunnel, pushing mTLS enforcement into the kernel and reducing overhead further. This Cilium + ambient Istio combination is the emerging "best of both worlds" pattern for 2026 Kubernetes deployments.

Migration path: Ambient mode is opt-in per namespace — istio.io/dataplane-mode: ambient label. Existing sidecar namespaces continue unchanged, enabling incremental migration.

When to Choose Ambient Mode

If you are starting a new Kubernetes cluster in 2026 or have >200 pods on sidecar Istio, evaluate ambient mode. The memory and CPU savings at scale are substantial. For clusters under ~50 pods, the operational difference is smaller — existing sidecar deployments are stable and do not need immediate migration.

Decomposition Strategies

Deciding where to draw service boundaries is the hardest part of microservices design. Poor boundaries create chatty, tightly-coupled services worse than the monolith they replaced.

Decompose by Business Capability

Organize services around what the business does, not how the technology is structured.

A typical e-commerce system decomposed by business capability:

├── Catalog Service      (product listing, search, pricing)
├── Order Service        (order placement, order history)
├── Inventory Service    (stock levels, reservations)
├── Payment Service      (charging, refunds, fraud detection)
├── Shipping Service     (fulfillment, tracking)
├── Notification Service (email, SMS, push)
└── Identity Service     (auth, user profiles)

Each service maps to a business function owned by a single team. The team controls the full stack: schema, code, deployment, on-call rotation.

Decompose by Subdomain (Domain-Driven Design)

DDD introduces Bounded Contexts — explicit boundaries around a domain model. Within a bounded context, terms have precise meanings. Across boundaries, models are translated via anti-corruption layers.

A "Customer" in the Order subdomain (name, shipping address, loyalty tier) is different from a "Customer" in the Billing subdomain (billing address, payment methods, credit limit). DDD makes these differences explicit and prevents one team's model from leaking into another's.

Bounded context = natural service boundary.

Strangler Fig Pattern

Migrating a large monolith to microservices all at once is high-risk. The Strangler Fig pattern migrates incrementally:

Place a routing layer (API gateway or reverse proxy) in front of the monolith
Identify a bounded context to extract first (choose low-risk, high-change-frequency)
Implement the new microservice in parallel
Route traffic for that domain to the new service
Delete the corresponding code from the monolith
Repeat until the monolith is empty — or small enough to keep

The strangler fig — a tropical plant that grows around a host tree, eventually replacing it — is the namesake of this pattern. Amazon, LinkedIn, and Airbnb all used variants of this approach.

Platform Engineering and Internal Developer Platforms

As microservices organizations scale past ~50 services, a new problem emerges: each team reinvents the same infrastructure — Kubernetes YAML, CI/CD pipelines, service observability wiring, secret management. The cognitive overhead per developer climbs as the platform sprawl grows.

Platform engineering addresses this by treating the internal developer platform (IDP) as a product. A platform team builds and maintains the "golden path" — pre-wired templates, self-service provisioning, and curated tooling — so product teams can ship services without becoming Kubernetes experts.

Backstage (open-sourced by Spotify, now a CNCF incubating project) is the most adopted framework for building IDPs in 2026. It provides:

Service catalog — a searchable registry of every service, its owner, SLOs, runbooks, API docs, and dependencies. Replaces scattered wikis with a single pane of glass.
Software templates — "scaffolder" templates that generate a new microservice repository with all required boilerplate: CI/CD, Dockerfile, health check endpoints, OTel instrumentation, and Kubernetes manifests.
Tech radar — an opinionated view of approved vs. deprecated technologies, reducing "which library should I use?" decision sprawl.
Plugin ecosystem — 200+ plugins connect Backstage to PagerDuty, Datadog, GitHub Actions, ArgoCD, Vault, and more.

Platform Engineering as the 2026 Microservices Operating Model

The shift in 2026 is from each team owning their own infrastructure configuration to platform teams owning the abstraction layer. This does not reduce autonomy — teams still own their service code — but it eliminates the operational tax of low-level Kubernetes/networking expertise from every team. Large-scale microservices organizations (Spotify, Airbnb, Shopify) have validated that platform investment reduces new-service time-to-production from weeks to hours.

Cell-based architecture is an emerging complementary resilience pattern: rather than one large cluster of microservices, the system is partitioned into independent "cells" (e.g., by customer shard or geographic zone). A failure in one cell is blast-radius-contained, never propagating to all cells. This is conceptually an extension of the bulkhead pattern applied at the organizational/cluster level.

Real-World Examples

Netflix: 600+ Microservices

Netflix operates one of the most cited microservices deployments in the industry. Starting as a DVD rental monolith, Netflix began its cloud migration in 2008 and completed it by 2016. Today:

600+ microservices handle video streaming, recommendations, billing, and device management
Each team owns 2–5 services end-to-end (deploy, operate, on-call)
Eureka handles service discovery across multiple AWS regions
Hystrix circuit breakers (now largely replaced by Resilience4j) isolate failures
Zuul and later Spring Cloud Gateway serve as the API gateway layer
Custom chaos engineering tooling (Chaos Monkey) randomly terminates production services to prove resilience

Key lesson: Netflix's microservices investment paid off in deployment velocity — teams deploy hundreds of times per day with independent release cycles — but required massive investment in tooling, observability, and operational culture.

Amazon: The Two-Pizza Team Rule

Amazon's CEO Jeff Bezos mandated in the early 2000s that every team must be small enough to be fed by two pizzas (~6–8 people). This rule drove the decomposition of Amazon's monolith into services that later became AWS primitives.

Every team owns its service like a product: API contract, data store, deployment pipeline, SLA. Teams communicate only through APIs — no shared databases, no shared libraries beyond approved utilities.

The result: Amazon can deploy to production every 11.6 seconds on average, with thousands of teams operating independently.

Key Takeaway

Microservices trade operational complexity for organizational scalability. A monolith is simpler to build and run at small scale; microservices become necessary when independent deployability, team autonomy, and fault isolation justify the overhead of distributed systems. Get service boundaries right by following business capabilities and DDD bounded contexts — wrong boundaries create a distributed monolith that has all the costs and none of the benefits. Invest in the platform (service discovery, API gateway, observability, circuit breakers) before you invest in splitting services.

Service Discovery Deep-Dive

The existing section covers the two models at a high level. This section goes deeper into implementation details and failure modes.

Multi-Region Service Discovery

In multi-region deployments, service discovery must balance routing traffic to the nearest healthy instance against cross-region failover when an entire region goes down.

Strategy	How It Works	Latency	Failover
Local-first (zone-aware)	Registry returns instances in same AZ/region first	Lowest	Automatic fallback to remote zone if local unhealthy
Global load balancer	DNS-based (Route 53 latency routing, Cloudflare) routes to nearest region	Low	Automatic via DNS health checks
Federated registries	Each region has its own Consul/etcd cluster; cross-region lookup over WAN	Medium	Manual or scripted failover
Service mesh multi-cluster	Istio or Linkerd cross-cluster service discovery via trust bundles	Low	Policy-driven; transparent to services

Kubernetes multi-cluster: ServiceImport / ServiceExport (MCS API) allows a service in cluster A to be discovered by services in cluster B without shared DNS, using a multi-cluster gateway to route traffic.

Client-Side vs Server-Side Discovery

Pattern	Pros	Cons	Example
Client-Side	Fine-grained load-balancing control, no extra hop	Discovery logic in every client; language-specific SDKs required	Netflix Eureka + Ribbon
Server-Side	Clients are simple; centralized routing policy	Router is a critical path dependency; extra network hop	Kubernetes DNS + kube-proxy
DNS-based	Universal client support; no SDK needed	Low TTL required for fast updates; no health-aware routing	AWS Route 53, Consul DNS
Service Mesh	Transparent to app code; policy-driven	Operational overhead of control plane	Istio, Linkerd

Service Registry Comparison

Choosing a service registry depends on your consistency requirements, existing ecosystem, and operational complexity budget.

Tool	Protocol	Consistency	Health Check	Watches / Notifications	Language Support	Best For
Consul	HTTP + DNS	CP (Raft)	HTTP, TCP, gRPC, script	Yes (blocking queries)	All via HTTP	Multi-DC, health-aware routing, KV store
etcd	gRPC (HTTP/2)	CP (Raft)	Lease TTL (client heartbeat)	Yes (watch API)	All via gRPC	Kubernetes backing store, config management
Eureka	REST (HTTP)	AP (no consensus)	HTTP heartbeat every 30s	Polling only	JVM-first; REST for others	Netflix OSS stack; availability over consistency
ZooKeeper	Custom binary	CP (ZAB)	Session timeout (ephemeral znodes)	Yes (watches on znodes)	Java-first	Legacy Kafka/Hadoop; strong consistency required

Consistency trade-off in practice:

Consul/etcd (CP): During a network partition, the minority partition stops serving writes. Services cannot register or deregister — they may return stale data or be unavailable.
Eureka (AP): During a partition, Eureka nodes continue serving cached data. A service may appear registered even after it has crashed (30–90s stale window). Netflix accepted this: a stale entry causing a failed request is better than a complete outage.

Service Registration Flow

Self-registration vs third-party registration:

Self-registration: The service itself calls the registry on startup and deregisters on shutdown (graceful). Simple, but requires registry client logic in each service.
Third-party registration: An external observer (Kubernetes controller, Consul agent, ECS service scheduler) registers and deregisters on behalf of the service. The service code stays registry-agnostic.

Cross-reference: Chapter 6 — Load Balancing covers the load-balancing algorithms (round-robin, least-connections, consistent hashing) that client-side discovery uses to pick instances. Chapter 23 covers Kubernetes DNS, which implements server-side discovery natively.

Health Check Patterns

Service registries only provide value if they reflect the true health of each instance. Three probe types cover different failure modes:

Probe Type	Question Answered	Failure Action	Typical Check
Startup	Has the app finished initializing?	Kill and restart	HTTP 200 on `/healthz/startup`
Liveness	Is the app alive (not deadlocked/OOM)?	Kill and restart	HTTP 200 on `/healthz/live`
Readiness	Is the app ready to serve traffic?	Remove from load balancer (do NOT restart)	Check DB connection, cache, downstream deps

Key distinction — readiness vs liveness: A service can be alive but not ready. If the database connection pool is exhausted, the service is liveness-healthy (process is running, no deadlock) but readiness-unhealthy (cannot serve requests). The right action is to pull it from the load balancer, not kill it. Confusing the two probes causes unnecessary restarts under load.

Deep health checks: Expose a /health/detailed endpoint that reports sub-component status (database, cache, downstream services). Use it for dashboards and alerting. Liveness probes should be shallow (process-level only) to avoid cascading restarts when a dependency flaps.

Health Check Anti-Patterns

Anti-Pattern	Problem	Fix
Readiness checks all dependencies	One flapping downstream dep marks entire service unready; cascading removal from load balancer	Check only critical dependencies; use circuit breaker for non-critical ones
Liveness checks include DB	DB timeout triggers container restart loop; connection pool exhaustion self-heals, restarts do not	Liveness = process-only; readiness = dependency-aware
No startup probe (slow init)	Liveness fires before app finishes initializing; container killed in infinite restart loop	Add startup probe with `failureThreshold * periodSeconds > max_init_time`
Too-short TTL + slow check	Health check RTT > TTL → instance flaps between healthy/unhealthy under load	Set TTL ≥ 3× check interval; use deregister-critical-service-after for graceful cleanup

API Versioning Strategies

API versioning is how you evolve your service contracts without breaking existing consumers. There is no universally correct strategy — the right choice depends on client types and change frequency.

Strategy	Format	Caching	Client Complexity	Breaking Changes	Example
URL Path	`/v1/users`, `/v2/users`	Excellent (URL is cache key)	Low — explicit, visible	Clean — old and new coexist	Most REST APIs
Query Parameter	`/users?version=2`	Good	Low	Same as URL path	Some Google APIs
Request Header	`API-Version: 2024-01-01`	Poor (cache key doesn't include headers by default)	Medium — must set header	Requires header-aware proxies	Stripe, GitHub
Content Negotiation	`Accept: application/vnd.myapp.v2+json`	Poor	High — non-standard	Semantically correct per HTTP spec	Rare in practice
Consumer-Driven Contracts	Pact-style contract tests	N/A	Low	Detected before deployment	Internal microservices

Backward compatibility best practices:

Never remove a field — mark it deprecated, keep it populated for ≥6 months
Never change field semantics — total changing from gross to net is a breaking change even if the field name stays the same
Add fields as optional — new fields must have safe defaults; consumers must tolerate unknown fields
Use sunset headers — Sunset: Sat, 01 Jan 2026 00:00:00 GMT in responses to deprecated versions gives consumers a machine-readable deadline
Run versions in parallel — use the strangler fig approach at the API level: route /v1 to old handlers, /v2 to new handlers, retire /v1 once all consumers migrate

Versioning in event contracts: The same principles apply to event schemas — see Chapter 14 for schema evolution and schema registry patterns.

API Version Lifecycle

Internal vs External Versioning

Internal microservices (service-to-service) can use a lighter versioning strategy than public APIs:

Context	Recommended Strategy	Reason
Public API (third-party, mobile apps)	URL path versioning (`/v1`)	Explicit, cacheable, discoverable; clients may not update quickly
Internal services (known consumers)	Consumer-driven contracts (Pact) + additive-only changes	Detect breaking changes at CI time; avoid URL proliferation
Event schemas	Schema registry with compatibility rules	Enforce backward compatibility before publishing to topic
gRPC internal APIs	Protobuf field numbers (never change); add new fields, deprecate old	Protobuf is naturally backward/forward compatible if field numbers are preserved

Code Example: gRPC Service Definition

protobuf

// user_service.proto
syntax = "proto3";
package user;

service UserService {
  rpc GetUser(GetUserRequest) returns (User);
  rpc ListUsers(ListUsersRequest) returns (stream User);  // server streaming
  rpc CreateUser(CreateUserRequest) returns (User);
}

message GetUserRequest {
  string user_id = 1;
}

message User {
  string id = 1;
  string name = 2;
  string email = 3;
  int64 created_at = 4;
}

// gRPC server implementation (Go)
type userServer struct {
    pb.UnimplementedUserServiceServer
    db *sql.DB
}

func (s *userServer) GetUser(ctx context.Context, req *pb.GetUserRequest) (*pb.User, error) {
    user, err := s.db.QueryRow("SELECT id, name, email FROM users WHERE id = $1", req.UserId)
    if err != nil {
        return nil, status.Errorf(codes.NotFound, "user not found: %s", req.UserId)
    }
    return &pb.User{Id: user.Id, Name: user.Name, Email: user.Email}, nil
}

Chapter	Relevance
Ch06 — Load Balancing	API gateway is a specialized LB for service mesh routing
Ch11 — Message Queues	Async inter-service communication via Kafka/SQS
Ch14 — Event-Driven Architecture	Saga pattern for distributed transactions across services
Ch15 — Replication & Consistency	Per-service database consistency in a microservices system
Ch23 — Cloud-Native	Kubernetes orchestration for microservice deployments

Practice Questions

Beginner

Strangler Fig Migration: An e-commerce startup wants to decompose their monolith. They have a single users table joined in 15 places. Describe the strangler fig migration strategy step by step. Which service would you extract first and why? What is the risk if you try to extract the users service first?
Hint
Extract services with the fewest inbound dependencies first (leaf services like email notifications); extracting a heavily-joined entity like `users` first creates a distributed monolith with synchronous coupling everywhere.

Intermediate

Circuit Breaker Tuning: Your payment service has a circuit breaker with a 50% failure threshold over 10 seconds and a 30-second open timeout. During a flash sale, the payment processor becomes slow (P99 = 8s) but returns 200 OK. Explain why the circuit breaker does not trip, and what additional resilience mechanism (timeout + bulkhead) you would add.
Hint
The circuit breaker counts failures, not slowness — add a request timeout (e.g., 2s) so slow responses are classified as failures, allowing the circuit breaker to trip and shed load during latency spikes.
Saga Compensation: You are building a travel booking system that must atomically book a flight, hotel, and car rental across three external APIs. Compare choreography vs orchestration sagas for this use case. Which do you choose, and how do you handle the car rental being unavailable after flight and hotel are already confirmed?
Hint
Orchestration is clearer for complex compensations with explicit rollback sequences; when car rental fails, the orchestrator explicitly calls the hotel and flight cancellation APIs in reverse order.
Service Mesh Decision: Your team runs 8 microservices and security requires mTLS between all services. A senior engineer proposes Istio; a skeptic says it's too complex. What criteria (team size, service count, compliance requirements, existing tooling) would you use to decide? At what scale does a service mesh clearly pay for itself?
Hint
Evaluate: does your team have the Kubernetes expertise to operate Istio? Are you implementing mTLS, circuit breaking, or traffic splitting in every service's code already? If yes to both, a mesh reduces per-service complexity.

Advanced

API Gateway vs BFF: A fintech company has three client types: a web dashboard (data-heavy, many charts), a mobile app (bandwidth-constrained), and a partner API (versioned, metered). Design the gateway layer. Would you use a single API gateway or Backend-for-Frontend (BFF)? Describe what each BFF would do differently from the others.
Hint
BFF is justified when clients have meaningfully different data-shaping needs (web = large aggregated responses, mobile = minimal payloads, partner = versioned contracts with rate limiting) — a single gateway forces all clients to over-fetch or under-fetch.

References & Further Reading

"Building Microservices" — Sam Newman
"Microservices Patterns" — Chris Richardson
gRPC documentation
Netflix Microservices blog series
"The Twelve-Factor App"

Chapter 13: Microservices Architecture ​

Mind Map ​

Monolith vs Microservices ​

Architecture Comparison ​

Comparison Table ​

Service Discovery ​

Client-Side Discovery ​

Server-Side Discovery ​

Service Registry Tools ​

API Gateway ​

Responsibilities ​

Gateway Traffic Flow ​

Backend for Frontend (BFF) Pattern ​

Inter-Service Communication ​

Synchronous Communication ​

Asynchronous Communication ​

Communication Comparison Table ​

Circuit Breaker Pattern ​

States ​

Implementation ​

Saga Pattern for Distributed Transactions ​

Choreography-Based Saga ​

Orchestration-Based Saga ​

Choreography vs Orchestration ​

Service Mesh ​

Sidecar Proxy Pattern ​

What a Service Mesh Handles ​

Service Mesh Evolution: Sidecar → Ambient Mode (2026) ​

Decomposition Strategies ​

Decompose by Business Capability ​

Decompose by Subdomain (Domain-Driven Design) ​

Strangler Fig Pattern ​

Platform Engineering and Internal Developer Platforms ​

Real-World Examples ​

Netflix: 600+ Microservices ​

Amazon: The Two-Pizza Team Rule ​

Key Takeaway ​

Service Discovery Deep-Dive ​

Multi-Region Service Discovery ​

Client-Side vs Server-Side Discovery ​

Service Registry Comparison ​

Service Registration Flow ​

Health Check Patterns ​

Health Check Anti-Patterns ​

API Versioning Strategies ​

API Version Lifecycle ​

Internal vs External Versioning ​

Code Example: gRPC Service Definition ​

Related Chapters ​

Practice Questions ​

Beginner ​

Intermediate ​

Advanced ​

References & Further Reading ​