Auto Scaling
Automatically adding or removing compute instances based on current demand. Scales out during traffic spikes and scales in during quiet periods to save cost.
What is Auto Scaling?
Automatically adding or removing compute instances based on current demand. Scales out during traffic spikes and scales in during quiet periods to save cost.
Auto Scaling is a intermediate-level concept that sits in the Cloud Infrastructure area of system design. Engineers reach for it whenever they need to reason about real-world trade-offs in that space — not just for textbook correctness, but because real production systems at companies like Netflix, Amazon, and Google make these decisions every day.
If you want to go deeper than this definition — with diagrams, code, and a quiz to lock it in — work through the "Auto Scaling" lesson linked below. It walks through the why, the mechanism, the trade-offs, and how the giants actually use it in production.
Learn Auto Scaling in depth
Full interactive lesson with diagrams, code examples, real-world references, and a quiz.
Open the Auto Scaling lessonRelated lessons
Lessons that touch on Auto Scaling as part of a larger topic.
See also
Related glossary terms you might want to look up next.
Horizontal Scaling
Adding more machines to handle increased load (scaling out). Like opening more checkout lanes instead of making one cashier faster.
Load Balancer
Distributes incoming traffic across multiple servers so no single server gets overwhelmed. Like a traffic cop directing cars to different lanes.
Kubernetes
An orchestration platform that automates deploying, scaling, and managing containerized applications. K8s is the operating system for your cloud.