Thundering Herd
When many clients simultaneously retry or reconnect after a failure, overwhelming the recovering system. Solved by jittered backoff, request coalescing, and admission control.
What is Thundering Herd?
When many clients simultaneously retry or reconnect after a failure, overwhelming the recovering system. Solved by jittered backoff, request coalescing, and admission control.
Thundering Herd is a advanced concept that sits in the Reliability & Resilience area of system design. Engineers reach for it whenever they need to reason about real-world trade-offs in that space — not just for textbook correctness, but because real production systems at companies like Netflix, Amazon, and Google make these decisions every day.
If you want to go deeper than this definition — with diagrams, code, and a quiz to lock it in — work through the "Thundering Herd" lesson linked below. It walks through the why, the mechanism, the trade-offs, and how the giants actually use it in production.
Learn Thundering Herd in depth
Full interactive lesson with diagrams, code examples, real-world references, and a quiz.
Open the Thundering Herd lessonRelated lessons
Lessons that touch on Thundering Herd as part of a larger topic.
Cache Stampede Prevention
When a popular cache key expires, thousands of requests hit the database at once, here's how to prevent the thundering herd
foundation · caching strategies
Jitter
Add randomness to retry timing to prevent the thundering herd, the missing piece of exponential backoff
intermediate · microservices architecture
See also
Related glossary terms you might want to look up next.
Cache Stampede
When many requests hit the database simultaneously because a popular cache entry expired. Solved with locking, probabilistic early expiration, or request coalescing.
Exponential Backoff
A retry strategy that doubles the wait time between attempts (1s, 2s, 4s, 8s...) with random jitter. Prevents thundering herd problems when many clients retry simultaneously.
Load Shedding
Deliberately dropping low-priority requests during overload to protect the system's ability to serve high-priority traffic. Better to serve some requests than crash serving none.