Load Balancing and Proxies
A single server can only do so much. The moment your traffic grows past what one machine can serve, you need a way to spread requests across many machines, route them to the right place, and keep working when one of those machines dies at 3 in the morning. That is the job of load balancing and the proxies that sit in front of your systems. When Black Friday traffic hits and one of your servers catches fire, the difference between a quiet night and a public outage is whether your load balancer notices in two seconds and stops sending it customers.
This category covers the full path a request takes before it ever touches your application code. It starts with DNS turning a name into an address, walks through how packets actually move across networks (unicast, broadcast, multicast, anycast), and then gets into the machinery that decides which server handles each request: load balancer algorithms, the many flavors of routing, session handling, and the proxies and gateways that tie it all together. By the end you will understand not just what each piece does, but when to reach for it and what it costs you.
What Load Balancing and Proxies Actually Are
A load balancer is a traffic cop. It sits in front of a pool of servers and decides which one gets each incoming request. Done well, no single server gets overwhelmed while others sit idle, and a server that fails quietly stops receiving traffic instead of returning errors to real users. That is the core promise: more capacity than one machine, and survival when machines go down.
A proxy is a middleman that requests pass through. There are two main kinds, and the difference is which side it works for. A forward proxy sits in front of clients and speaks to the wider internet on their behalf, which is why companies use it to filter and log outbound traffic. A reverse proxy sits in front of your servers and answers the internet on their behalf, which is why it is the natural home for load balancing, TLS termination, caching, and request routing.
Before any of that runs, DNS has to turn a hostname like systemdesign.academy into an IP address. DNS is the first decision point in every request, and because it can hand back different answers to different users, it is also a load balancing tool in its own right. Understanding DNS, and how routing protocols like unicast, broadcast, multicast, and anycast move traffic underneath it, is the foundation everything else builds on.
The Key Concepts You Need to Know
At the center is the question of how a load balancer picks a server. The load balancer algorithms lesson covers the classic options: round robin sends requests one after another to each server in turn, least connections sends the next request to whichever server is busiest with the fewest open connections, and weighted variants let a beefier server take a larger share. Each is simple to describe and has very different behavior under uneven load.
Routing is the other big idea, and it shows up in many forms because you can route on almost anything. Path-based routing sends /api to one service and /images to another. Host-based routing splits traffic by domain name. Header-based routing reads the request headers to decide. Then there are routing strategies that live closer to DNS and the network: weighted routing for gradual shifts, geolocation and latency-based routing to send users to the nearest or fastest region, failover routing for disaster recovery, and multi-value routing for simple redundancy.
State is where things get tricky. Many applications expect the same user to keep hitting the same server, which is what sticky sessions, session affinity, and affinity routing solve. They are convenient but they fight against even load distribution and graceful failure, so this category treats them honestly as a trade-off rather than a default. Connection draining rounds this out by letting a server finish its in-flight work before it is removed, so deploys and scale-downs do not drop live requests.
Choosing the Right Option and the Trade-Offs
There is rarely one correct choice. DNS load balancing is cheap and works at global scale, but DNS caching means changes can take minutes to propagate, so it is poor for fast failover. A reverse proxy load balancer reacts in seconds and sees every request, but it is one more hop to operate and scale. Most large systems use both: DNS or global server load balancing to pick a region, then a reverse proxy inside that region to pick a server.
The routing trade-offs follow the same pattern. Latency-based routing gives users the fastest experience but makes traffic patterns harder to predict. Geolocation routing is great for data residency rules but can send a user far away if the nearest region is down. Weighted routing and traffic splitting let you roll out a new version to 5 percent of users first, and traffic mirroring lets you copy real production traffic to a new system to test it without risk, neither of which is free in complexity.
The sticky session question is the one that catches teams out. Affinity keeps per-user state simple in the short term, but it concentrates risk: when that one server dies, those users lose their sessions, and the load balancer can no longer balance freely. The usual senior move is to push session state into a shared store so any server can serve any user, and reserve stickiness for cases where it genuinely cannot be avoided.
How Real Companies Use This
At internet scale, this stack is layered. Cloudflare and Google use anycast routing so that a single IP address is announced from data centers all over the world, and the network naturally sends each user to the closest one. That is how a DNS resolver or a CDN edge can feel local no matter where you are. Global server load balancing then sits above regional load balancers to steer whole regions of traffic based on health and proximity.
A content delivery network is load balancing and reverse proxying turned into a product. A CDN caches your static content at hundreds of edge locations so users download from a server near them instead of from your origin, which cuts latency and shields your servers from load. Netflix, Amazon, and almost every large site put a CDN in front of their assets for exactly this reason.
The API gateway is the modern front door for microservices. It is a specialized reverse proxy that does path and host based routing to dozens of backend services, plus authentication, rate limiting, and request shaping in one place. Combined with connection draining for zero-downtime deploys and traffic splitting for safe rollouts, this is the everyday toolkit that lets large engineering teams ship continuously without taking the site down.