Is this a video course?

No. This is an interactive, slide-based learning platform. Each lesson has rich text, animated diagrams, live code editors, and quizzes. You learn by reading, interacting, and doing, not by watching videos passively.

How long do I have access?

Forever. Both pricing tiers are one-time payments with lifetime access. This includes all current 766 lessons and any future content we add.

What level of experience do I need?

None. We start from absolute basics like 'What is latency?' and build up to distributed consensus protocols. The Foundation level assumes zero prior knowledge of system design.

How much does the system design course cost?

7.99 US dollars for lifetime access globally, or 499 Indian rupees for lifetime access in India. One-time payment, no subscription, no hidden fees. 11 lessons are free with no signup required.

What technologies are covered?

Everything from DNS and load balancers to Kubernetes, Kafka, distributed databases, consensus protocols, stream processing, security architecture, and observability. We cover principles and real-world implementations used at Netflix, Google, Amazon, Uber, Stripe, and more.

Is this useful for system design interview preparation?

Yes. The lessons are structured around the exact topics asked in system design interviews at FAANG and top-tier companies. Interactive diagrams help you practice whiteboard-style explanations. Covers everything from URL shortener design to distributed payment systems.

How is this different from ByteByteGo or Educative?

766 interactive lessons (4x more than most competitors), 16 different diagram types that build step by step, real production examples from Netflix, Google, Amazon, Uber, and Stripe, and lifetime access for a one-time payment of $7.99 instead of annual subscriptions costing 100 to 200 dollars per year.

What is the difference between capacity planning and autoscaling?

Capacity planning decides how much you need and sets the limits and budget. Autoscaling is the mechanism that adds or removes resources within those limits. Autoscaling still depends on good planning because it has minimums, maximums, boot delays, and some resources like a primary database do not scale elastically.

Why not just run servers at 100 percent utilization to save money?

Because latency climbs sharply as a resource nears saturation. Past roughly 70 to 80 percent utilization, queuing means small traffic bumps cause large response time spikes. The gap below 100 percent is your buffer for spikes, deploys, and a failed node, so most teams target 50 to 70 percent steady state.

What is the N plus 1 rule in capacity planning?

N plus 1 means you provision enough capacity to handle peak load even after losing one node or zone. N plus 2 tolerates losing two. You size each remaining unit to absorb the failed one's share of traffic, which is why redundant systems need more total capacity than raw demand suggests.

How far ahead should you plan capacity?

It depends on lead time. Stateless web tiers autoscale in minutes, so a short horizon is fine. Resources that scale slowly, like database replicas, shards, or reserved cloud capacity, need weeks or months of lead time, so plan those conservatively and well in advance.

What inputs do you need to plan capacity?

A service level objective such as a latency or availability target, a measured baseline of current load (requests per second, CPU, memory, IOPS), the throughput at which you start violating that objective, and a growth or peak forecast. From those you derive instance counts, tiers, and autoscaling limits with headroom built in.

AdvancedReliability & Resilience

Capacity Planning

Estimating future resource needs based on current usage trends and expected growth. Answers questions like 'how many servers do we need for 10x traffic in 6 months?'

What is Capacity Planning?

In short

Capacity planning is the practice of measuring how much load your system handles today, forecasting how that load will grow, and deciding how much compute, memory, storage, and network you need to provision so the system stays fast and available as demand rises. It answers questions like "how many servers do we need to survive 10x traffic in six months without breaking our latency target?"

What capacity planning actually is

Every service has a ceiling. A database can take only so many writes per second, a web server can hold only so many open connections, a disk fills up. Capacity planning is the work of finding those ceilings, watching how fast you are approaching them, and adding headroom before you hit them. It turns a vague worry like "are we going to fall over during the sale?" into a number you can act on.

The output of capacity planning is concrete: an instance count, a database tier, a storage allocation, an autoscaling minimum and maximum. You start from a measured baseline (requests per second, CPU utilization, memory, IOPS, queue depth), you attach a growth assumption (10 percent month over month, or a Black Friday spike of 8x), and you size resources so that even at peak you stay under a chosen utilization target with room to spare.

It is different from scaling. Scaling is the mechanism that adds or removes resources. Capacity planning is the decision about how much you need and when, so that scaling has the right limits set and the budget exists to back it.

How it works under the hood

You begin with a service level objective, because capacity only means something relative to a promise. A common one is "p99 latency under 200ms" or "availability of 99.95 percent." You then load test or analyze production data to find the throughput at which you start violating that objective. That breaking point is your real capacity, and it is almost always lower than the theoretical maximum of the hardware.

A useful rule of thumb is the utilization target. Most teams plan to run steady state CPU around 50 to 70 percent, not 95 percent, because queuing theory says latency climbs sharply as a resource approaches saturation. Run a single server at 90 percent CPU and a small traffic bump pushes response times off a cliff. The gap between your target and 100 percent is your buffer for spikes, deploys, and one failed node.

Forecasting takes the baseline and projects it forward. Simple linear projection works for steady growth. For seasonal or bursty traffic you model the peaks directly: a retailer sizes for Black Friday, not for an average Tuesday. You also account for the N plus 1 or N plus 2 rule, provisioning enough that you can lose one or two nodes (or an entire availability zone) and still serve peak load. Headroom plus redundancy plus growth gives you the final number.

When to use it and the trade-offs

You need real capacity planning when running out of capacity is expensive: a checkout flow during a sale, an API with contractual SLAs, a stateful database that cannot be scaled in seconds. Cloud autoscaling does not remove the need. Autoscaling reacts to load, but it has limits you must set, it takes time to boot new instances, and some resources (database connections, a primary database, a third party rate limit) do not scale elastically at all.

The core trade-off is cost versus risk. Over-provision and you pay for idle servers every hour. Under-provision and you risk an outage, dropped revenue, and a paged engineer at 3am. Good planning keeps utilization high enough to be efficient but low enough to absorb the spikes you can predict and the failures you cannot.

The hard parts are forecasting growth you cannot see (a feature goes viral) and planning for resources that scale slowly. A stateless web tier can autoscale in a minute, but adding a read replica or resharding a database can take hours or days, so those need the longest lead time and the most conservative buffer.

A concrete example

Say your API serves 2,000 requests per second at peak today, growing 8 percent per month, and each instance comfortably handles 500 requests per second while staying under your 200ms p99 target. Today you need 4 instances at the limit, so for headroom you run 6, keeping each near 65 percent. In six months your peak is roughly 3,200 requests per second, which needs about 7 instances at the limit, so you plan for around 10 to keep the same buffer.

Now add failure tolerance. If you run across three availability zones and want to survive losing one, you size each zone to carry half the load, pushing the fleet higher still. You set the autoscaling minimum at the steady state count and the maximum above your forecasted peak, and you revisit the numbers every month against actual traffic so the plan does not drift from reality.

Where it is used in production

Netflix

Pre-scales its AWS fleet ahead of predictable evening viewing peaks and big launches, sizing well above average load so that losing an entire availability zone does not degrade streaming.

Amazon retail

Plans capacity months ahead for Prime Day and Black Friday, provisioning for traffic many times the daily baseline rather than relying on reactive scaling alone.

Google SRE

Treats capacity planning as a core SRE discipline, combining demand forecasting with N plus 2 provisioning so the loss of two clusters still leaves enough capacity to serve peak.

Kubernetes Cluster Autoscaler

Adds and removes nodes based on pending pods, but operators still set node group minimums, maximums, and resource requests, which is capacity planning applied to a cluster.

Frequently asked questions

What is the difference between capacity planning and autoscaling?: Capacity planning decides how much you need and sets the limits and budget. Autoscaling is the mechanism that adds or removes resources within those limits. Autoscaling still depends on good planning because it has minimums, maximums, boot delays, and some resources like a primary database do not scale elastically.
Why not just run servers at 100 percent utilization to save money?: Because latency climbs sharply as a resource nears saturation. Past roughly 70 to 80 percent utilization, queuing means small traffic bumps cause large response time spikes. The gap below 100 percent is your buffer for spikes, deploys, and a failed node, so most teams target 50 to 70 percent steady state.
What is the N plus 1 rule in capacity planning?: N plus 1 means you provision enough capacity to handle peak load even after losing one node or zone. N plus 2 tolerates losing two. You size each remaining unit to absorb the failed one's share of traffic, which is why redundant systems need more total capacity than raw demand suggests.
How far ahead should you plan capacity?: It depends on lead time. Stateless web tiers autoscale in minutes, so a short horizon is fine. Resources that scale slowly, like database replicas, shards, or reserved cloud capacity, need weeks or months of lead time, so plan those conservatively and well in advance.
What inputs do you need to plan capacity?: A service level objective such as a latency or availability target, a measured baseline of current load (requests per second, CPU, memory, IOPS), the throughput at which you start violating that objective, and a growth or peak forecast. From those you derive instance counts, tiers, and autoscaling limits with headroom built in.

Learn Capacity Planning hands-on

This page explains the idea. The full lesson lets you step through the ring as servers join and leave, read the implementation, and check yourself with a quiz. It is one of 760+ lessons in the System Design Masterclass, from your first API call to distributed consensus. Eleven Foundation lessons are free, no signup. Lifetime access is ₹499 in India or $7.99 worldwide, one payment, no subscription.

Open the Capacity Planning lesson See pricing

What capacity planning actually is

How it works under the hood

When to use it and the trade-offs

A concrete example

Where it is used in production

Netflix

Pre-scales its AWS fleet ahead of predictable evening viewing peaks and big launches, sizing well above average load so that losing an entire availability zone does not degrade streaming.

Amazon retail

Plans capacity months ahead for Prime Day and Black Friday, provisioning for traffic many times the daily baseline rather than relying on reactive scaling alone.

Google SRE

Treats capacity planning as a core SRE discipline, combining demand forecasting with N plus 2 provisioning so the loss of two clusters still leaves enough capacity to serve peak.

Kubernetes Cluster Autoscaler

Adds and removes nodes based on pending pods, but operators still set node group minimums, maximums, and resource requests, which is capacity planning applied to a cluster.

Frequently asked questions

What is the difference between capacity planning and autoscaling?: Capacity planning decides how much you need and sets the limits and budget. Autoscaling is the mechanism that adds or removes resources within those limits. Autoscaling still depends on good planning because it has minimums, maximums, boot delays, and some resources like a primary database do not scale elastically.
Why not just run servers at 100 percent utilization to save money?: Because latency climbs sharply as a resource nears saturation. Past roughly 70 to 80 percent utilization, queuing means small traffic bumps cause large response time spikes. The gap below 100 percent is your buffer for spikes, deploys, and a failed node, so most teams target 50 to 70 percent steady state.
What is the N plus 1 rule in capacity planning?: N plus 1 means you provision enough capacity to handle peak load even after losing one node or zone. N plus 2 tolerates losing two. You size each remaining unit to absorb the failed one's share of traffic, which is why redundant systems need more total capacity than raw demand suggests.
How far ahead should you plan capacity?: It depends on lead time. Stateless web tiers autoscale in minutes, so a short horizon is fine. Resources that scale slowly, like database replicas, shards, or reserved cloud capacity, need weeks or months of lead time, so plan those conservatively and well in advance.
What inputs do you need to plan capacity?: A service level objective such as a latency or availability target, a measured baseline of current load (requests per second, CPU, memory, IOPS), the throughput at which you start violating that objective, and a growth or peak forecast. From those you derive instance counts, tiers, and autoscaling limits with headroom built in.

Learn Capacity Planning hands-on

Open the Capacity Planning lesson See pricing

Capacity Planning

What is Capacity Planning?

What capacity planning actually is

How it works under the hood

When to use it and the trade-offs

A concrete example

Where it is used in production

Frequently asked questions

See also

Capacity Planning

What is Capacity Planning?

What capacity planning actually is

How it works under the hood

When to use it and the trade-offs

A concrete example

Where it is used in production

Frequently asked questions

See also