Scalability
A system's ability to handle growing amounts of work by adding resources. A scalable system maintains performance as load increases.
What is Scalability?
A system's ability to handle growing amounts of work by adding resources. A scalable system maintains performance as load increases.
Scalability is a foundational concept that sits in the Core Fundamentals area of system design. Engineers reach for it whenever they need to reason about real-world trade-offs in that space — not just for textbook correctness, but because real production systems at companies like Netflix, Amazon, and Google make these decisions every day.
If you want to go deeper than this definition — with diagrams, code, and a quiz to lock it in — work through the "Scalability" lesson linked below. It walks through the why, the mechanism, the trade-offs, and how the giants actually use it in production.
Learn Scalability in depth
Full interactive lesson with diagrams, code examples, real-world references, and a quiz.
Open the Scalability lessonRelated lessons
Lessons that touch on Scalability as part of a larger topic.
Throughput
How much work a system can handle in a given time period
foundation · core fundamentals
Stateless vs Stateful Systems
Two fundamental architecture patterns that shape how systems handle data, scale, and recover from failure
foundation · core fundamentals
Vertical Scaling
Adding resources to a single node to handle more load
foundation · core fundamentals
Horizontal Scaling
Adding more nodes to a system to distribute load
foundation · core fundamentals
Elasticity
Dynamic resource adjustment based on real-time demand
foundation · core fundamentals
See also
Related glossary terms you might want to look up next.
Horizontal Scaling
Adding more machines to handle increased load (scaling out). Like opening more checkout lanes instead of making one cashier faster.
Vertical Scaling
Making a single machine more powerful (more CPU, RAM, storage). Simpler but has physical limits. Also called 'scaling up.'
Load Balancer
Distributes incoming traffic across multiple servers so no single server gets overwhelmed. Like a traffic cop directing cars to different lanes.