Tail Latency
The high-percentile response times (p99, p99.9) that affect the slowest requests. A system with 10ms median but 2s p99 latency feels slow for 1 in 100 users.
What is Tail Latency?
The high-percentile response times (p99, p99.9) that affect the slowest requests. A system with 10ms median but 2s p99 latency feels slow for 1 in 100 users.
Tail Latency is a advanced concept that sits in the Reliability & Resilience area of system design. Engineers reach for it whenever they need to reason about real-world trade-offs in that space — not just for textbook correctness, but because real production systems at companies like Netflix, Amazon, and Google make these decisions every day.
If you want to go deeper than this definition — with diagrams, code, and a quiz to lock it in — work through the "Tail Latency" lesson linked below. It walks through the why, the mechanism, the trade-offs, and how the giants actually use it in production.
Learn Tail Latency in depth
Full interactive lesson with diagrams, code examples, real-world references, and a quiz.
Open the Tail Latency lessonSee also
Related glossary terms you might want to look up next.
Latency
The time delay between sending a request and getting a response. Amazon found every 100ms of extra latency costs 1% in sales.
SLI
Service Level Indicator: a quantitative measure of service behavior, like the proportion of requests faster than 300ms. The raw metric that feeds SLOs.
Metrics
Numerical measurements collected over time that describe system behavior: request rate, error rate, latency percentiles, CPU utilization. Prometheus is the standard collector.