Availability
The percentage of time a system is operational and accessible. Measured in 'nines' — 99.99% availability means about 52 minutes of downtime per year.
What is Availability?
The percentage of time a system is operational and accessible. Measured in 'nines' — 99.99% availability means about 52 minutes of downtime per year.
Availability is a foundational concept that sits in the Core Fundamentals area of system design. Engineers reach for it whenever they need to reason about real-world trade-offs in that space — not just for textbook correctness, but because real production systems at companies like Netflix, Amazon, and Google make these decisions every day.
If you want to go deeper than this definition — with diagrams, code, and a quiz to lock it in — work through the "Availability" lesson linked below. It walks through the why, the mechanism, the trade-offs, and how the giants actually use it in production.
Learn Availability in depth
Full interactive lesson with diagrams, code examples, real-world references, and a quiz.
Open the Availability lessonRelated lessons
Lessons that touch on Availability as part of a larger topic.
Availability Metrics
Measuring how reliably your system serves requests, the nines that define your reputation
intermediate · observability monitoring
CAP Theorem
Consistency, Availability, Partition Tolerance, you can only pick two, and you must pick partition tolerance
advanced · distributed systems core
Hinted Handoff
Store writes temporarily on behalf of unavailable nodes. Dynamo's approach to write availability
advanced · distributed systems core
Sloppy Quorum
Relaxed quorum rules that prioritize availability over strict replica placement
advanced · distributed systems core
Zero-Downtime Deployment
Deploying new versions without any period of unavailability for users
intermediate · devops cicd
See also
Related glossary terms you might want to look up next.
Fault Tolerance
A system's ability to keep operating correctly even when some of its components fail. Achieved through redundancy, replication, and graceful degradation.
CAP Theorem
In a distributed system, you can only guarantee two of three: Consistency, Availability, and Partition tolerance. You must choose your trade-off.
Redundancy
Duplicating critical components or functions so that if one fails, a backup takes over. The reason planes have two engines and databases have replicas.