Is this a video course?

No. This is an interactive, slide-based learning platform. Each lesson has rich text, animated diagrams, live code editors, and quizzes. You learn by reading, interacting, and doing, not by watching videos passively.

How long do I have access?

Forever. Both pricing tiers are one-time payments with lifetime access. This includes all current 766 lessons and any future content we add.

What level of experience do I need?

None. We start from absolute basics like 'What is latency?' and build up to distributed consensus protocols. The Foundation level assumes zero prior knowledge of system design.

How much does the system design course cost?

7.99 US dollars for lifetime access globally, or 499 Indian rupees for lifetime access in India. One-time payment, no subscription, no hidden fees. 11 lessons are free with no signup required.

What technologies are covered?

Everything from DNS and load balancers to Kubernetes, Kafka, distributed databases, consensus protocols, stream processing, security architecture, and observability. We cover principles and real-world implementations used at Netflix, Google, Amazon, Uber, Stripe, and more.

Is this useful for system design interview preparation?

Yes. The lessons are structured around the exact topics asked in system design interviews at FAANG and top-tier companies. Interactive diagrams help you practice whiteboard-style explanations. Covers everything from URL shortener design to distributed payment systems.

How is this different from ByteByteGo or Educative?

766 interactive lessons (4x more than most competitors), 16 different diagram types that build step by step, real production examples from Netflix, Google, Amazon, Uber, and Stripe, and lifetime access for a one-time payment of $7.99 instead of annual subscriptions costing 100 to 200 dollars per year.

Why hold a seat with a TTL instead of just booking it immediately?

Because payment sits between selection and confirmation and is slow (UPI/card/netbanking can take 30-120s) and can fail. You reserve the seat the instant the user picks it so nobody else can grab it, then give them a bounded window (~10 minutes) to pay. The TTL auto-expires the hold if they abandon or a server crashes, so inventory self-heals, no manual cleanup, no seat locked forever. A booking is only written durably after payment succeeds.

What stops the system from crashing when 10 million people hit it at on-sale?

A virtual waiting room at the edge (BookMyShow has used Queue-it and Cloudflare Waiting Room). Every user gets a signed token with a queue position, and the room admits people to the live booking service at a fixed rate the backend can absorb, a few thousand per second, no matter how many millions are waiting. This sheds the herd before it touches your database or Redis. The insight is that inventory is fixed and small, so you match admission to what's sellable rather than trying to scale compute to meet unbounded demand.

Why use Redis for locks instead of a database row lock (SELECT ... FOR UPDATE)?

A FOR UPDATE lock held across the user's think-time and a 60-second payment keeps a DB transaction open the whole time, serializes contention on the database, and exhausts the connection pool during a surge, the DB becomes the bottleneck. Redis gives sub-millisecond atomic claims, keeps DB transactions short (only the final confirm write), and provides free TTL-based auto-expiry that a FOR UPDATE lock has no equivalent for. The DB UNIQUE constraint still serves as the final correctness guarantee.

How do you keep payment and booking atomic when payment can fail mid-flow?

Model it as a saga: reserve inventory (Redis hold) -> initiate payment -> on success, confirm and persist the booking; on failure, timeout, or abandonment, run a compensating ReleaseInventory that drops the hold. The orchestrator is durable so a crash resumes. Payment results arrive via webhook, which is made idempotent with a key per payment/event id so a retried callback can't create a second booking or a second charge. A reconciliation job periodically compares PSP records against bookings to catch and refund any 'charged but not booked' drift.

What happens if a user pays successfully but their seat hold already expired?

This is the classic edge case. You guard against it three ways: keep the hold window comfortably longer than the payment-provider timeout so it rarely happens; at confirm time, atomically re-validate that the user still owns the seat and treat 'hold lost' as a failure that auto-refunds the late payment instead of double-booking; and rely on the DB UNIQUE constraint as the final catch, if someone else already booked the seat, the second confirm insert fails and that payment is refunded. The user never ends up paying for a seat someone else holds.

How does the read-heavy browse traffic scale without hurting booking correctness?

Reads outnumber writes roughly 450:1, so the seat map and catalog are served from CDN and cache (Redis) with short TTLs and can be a second or two stale. That's fine because the authoritative check is the atomic hold at claim time, if a user clicks a seat that was just taken, they get a clean 'taken, pick another'. Live updates are pushed via WebSocket/SSE as seats flip status rather than millions polling. The strongly-consistent path stays tiny (only actual claims and confirms), so the consistency budget is spent exactly where double-booking risk lives.

System design interview guide

BookMyShow System Design: Seat Booking Concurrency

In the 2024 Coldplay India on-sale, roughly 1.3 crore (13 million) people fought for ~174,000 tickets that sold out in about 90 minutes, peaking near 5,000 booking attempts per second against a seat map where each seat can be sold exactly once.

A seat-level ticketing system where the hard part isn't scale, it's correctness under contention: 50,000 people clicking the same seat at the same millisecond, and exactly one of them must win. The design centers on a temporary seat-hold lock (Redis, ~10 min TTL, acquired with an atomic Lua compare-and-set), a virtual waiting room that gates the herd before it ever touches the booking service, and a payment saga that keeps the seat held until money clears, then either confirms the booking or releases the seat. Reads (seat maps, show listings) are massively cached and scale horizontally; writes (the actual booking) are funneled through a narrow, strongly-consistent path.

Where it shows up

A staple at Indian product companies and FAANG-India loops (Flipkart, Razorpay, Swiggy, Atlassian, Microsoft IDC, Amazon India, PhonePe) because it forces the candidate to reason about strict inventory concurrency, temporary locks with expiry, and payment-booking atomicity rather than just CRUD + caching. It's the canonical "no double-booking under a thundering herd" problem.

Why this question is asked

Most "design X" questions reward breadth, caching, sharding, CDNs. BookMyShow rewards depth on one nasty property: a single seat is a unit of inventory that must be sold exactly once, even when tens of thousands of users select it concurrently and a payment step (which can take 30-120s and fail) sits in the middle of the transaction. It probes whether you understand distributed locks, lock TTLs and their failure modes, optimistic vs pessimistic concurrency, idempotency, saga/compensation, and how to shed load with a waiting room before correctness even comes into play. The read path is easy; the interviewer is watching how you protect the write path.

Requirements

Always clarify these in the first 5 minutes of the interview. Do not start drawing boxes until both lists are agreed.

Functional requirements

Browse movies/events, cities, cinemas, and showtimes; view a live seat map with per-seat status (available / held / booked) and price tier
Select one or more seats and place a temporary hold so other users cannot grab them while the current user pays
Complete payment within the hold window; on success the seats become permanently booked and a ticket/QR is issued
Automatically release held seats if the user abandons or the hold expires, returning them to the available pool
Guarantee a seat is never sold to two users (no double-booking) even under extreme concurrency
Handle on-sale / blockbuster surges (IPL, Coldplay, big releases) without the system collapsing, graceful queueing instead of errors
Support cancellations/refunds per the event's policy and return inventory if allowed
Send booking confirmation (email/SMS/in-app) with ticket details

Non-functional requirements

Strong consistency on the seat-inventory write path, correctness is non-negotiable; a double-booking is a real-money, real-reputation failure
High availability and graceful degradation on the read path: if booking is overloaded, browsing and seat maps should still work
Low latency for seat-map reads (target < 100-200ms) so the UI feels live; seat-status reads dominate traffic ~450:1 over writes
Elastic capacity: handle 50-100x normal load during a hot on-sale (normal a few thousand RPS, peak 100k-300k RPS of mixed traffic)
Bounded, fair admission during surges via a virtual waiting room rather than first-come server crashes
Idempotency end-to-end: a retried request or a duplicated payment webhook must never create a second booking or a second charge
Durability: a confirmed booking + successful charge must survive crashes; partial failures must be reconcilable

Back-of-envelope scale estimates

Show your math. Pulling numbers from thin air signals you have not thought about the load.

Peak booking attempts/sec (single hot on-sale)

~5,000 writes/sec attempted

Grounded in the Coldplay India on-sale: ~1.3 crore users, ~174,000 tickets over a ~90-minute window, reported around 4,800-4,900 booking attempts/sec at peak. This is the number your write path must survive, most attempts will lose the race for a seat.

Read:write ratio

~450:1

Estimated ~3 reads (seat map refresh, availability checks, price lookups) per booking attempt, vs ~32 successful writes/sec. Seat-status reads are the dominant load and are what you cache aggressively; confirmed bookings are a thin trickle by comparison.

Concurrent users in the waiting room

10+ million queued, dispatched ~2,000-2,500/sec

With 1.3 crore users and a fixed inventory, the waiting room holds the herd and admits a steady, controlled trickle to the booking service. Dispatch rate is tuned to what the booking + payment backend can actually absorb, not to demand.

Seat-hold TTL

~5-10 minutes

Long enough to complete a real payment (UPI/card/netbanking can take 30-120s plus user think-time), short enough that abandoned holds don't sterilize inventory during a fast on-sale. Industry write-ups commonly cite a 10-minute hold.

Inventory footprint

Tiny per show, huge in aggregate

One multiplex screen is ~150-300 seats; a single show's seat state is a few KB and fits comfortably in memory/Redis. The challenge is not data volume but the number of concurrent shows (tens of thousands live) and write contention on the hot ones.

Peak bandwidth

hundreds of MB/sec

Coldplay-scale events report ~700+ MB/sec and multiple TB transferred over the event. Most of this is seat-map and asset reads served from CDN/cache, not the booking path.

High-level architecture

The system splits hard into a read plane and a write plane, because they have opposite requirements. The read plane, movie/event catalog, city/cinema/show listings, and the seat map, is overwhelmingly the traffic (roughly 450 reads per write) and tolerates being slightly stale. It's served from a CDN for static assets and a cache (Redis/ElastiCache) for seat maps and show metadata, fronted by stateless API servers behind a load balancer. This plane scales horizontally with no coordination. The write plane is the entire interview. It's narrow on purpose. Before a user can even attempt a booking on a hot show, they pass through a virtual waiting room (BookMyShow has used Queue-it and Cloudflare Waiting Room): the edge issues a signed token granting a queue position, and the room admits users to the live booking service at a controlled rate, a few thousand per second, regardless of how many millions are waiting. This is load-shedding: it converts an uncontrollable thundering herd into a bounded, predictable stream the backend can actually handle. Without it, the 2015 IPL-final collapse (500k+ simultaneous users) repeats. Once admitted, seat selection acquires a temporary hold in Redis. The hold is a key like seat:{showId}:{seatId} set with an atomic compare-and-set (a Lua script: "set this key to my userId only if it doesn't exist") and a TTL of ~10 minutes. Redis is the source of truth for "who currently holds this seat" precisely because that operation is atomic and sub-millisecond. If the SET succeeds, the user owns the seat; if it fails, someone else got it and the UI immediately reflects that. Because it's a single atomic op, two users clicking the same seat in the same millisecond can never both win. Holding a seat is not booking it. Payment sits in the middle, and it's slow and failure-prone, so the booking is modeled as a saga (an orchestrated state machine): create order → reserve inventory (the Redis hold) → initiate payment → on payment success, persist the booking durably to Postgres/Aurora and mark the seat permanently sold; on payment failure, timeout, or abandonment, run the compensating action, release the Redis hold so the seat returns to the pool. The hold's TTL is the safety net: even if a server crashes mid-saga, the lock auto-expires and inventory is never permanently lost. Payment confirmation arrives via webhook, which is deduplicated with an idempotency key so a retried or duplicated webhook can't create a second booking or release something it shouldn't. The confirmed booking is the only thing written to the durable relational store, where a unique constraint on (show_id, seat_id) is the final, absolute backstop against double-booking. In a real interview, sketch this on the whiteboard before diving into any single box.

In a real interview, sketch this on the whiteboard before diving into any single box.

Core components

Walk through each service. The interviewer wants to hear what each one owns, not just the names.

Virtual Waiting Room (edge admission control)

Sits in front of the booking service for hot on-sales. Issues signed JWT-style tokens with a queue position and expiry, and admits users to the live system at a fixed, tunable rate (e.g., ~2,000-2,500/sec) no matter how many millions are queued. This is pure load-shedding: it protects every downstream component by converting an unbounded herd into a bounded stream. BookMyShow has used Queue-it and Cloudflare Waiting Room for this.

Catalog & Search service (read plane)

Serves movies, events, cities, cinemas, and showtimes. Read-heavy, cache-friendly, eventually-consistent. Backed by a search index for discovery and a cache for hot listings. Scales horizontally with stateless replicas; has no role in correctness.

Seat-Map service

Returns the live seat layout and status for a given show. Reads the authoritative hold state from Redis and merges it with the booked state from the durable store, then caches the rendered map briefly. This is the highest-QPS component during an on-sale; it must be fast (<100-200ms) and is allowed to be a second or two stale for browsing, the real check happens at hold time.

Seat-Hold / Lock service (Redis)

The heart of concurrency control. Acquires per-seat holds via an atomic Lua compare-and-set with a ~10-minute TTL (key: seat:{showId}:{seatId} -> userId). Uses a quorum/RedLock setup across multiple Redis nodes for hot events to survive a node failure without split-brain. Auto-expiry means abandoned holds self-heal, no orphaned locks freezing inventory.

Booking Orchestrator (saga state machine)

Drives create-order → reserve → pay → confirm/compensate. On success, writes the booking durably and marks seats sold; on any failure or timeout, emits ReleaseInventory to drop the Redis holds and cancel the order. Decouples the slow payment step from the fast lock step and makes partial failures recoverable instead of corrupting inventory.

Payment service + webhook handler

Integrates UPI/cards/netbanking via a PSP (Razorpay/PayU-style). Payment is asynchronous: the user is redirected, and the result arrives via webhook. The handler is strictly idempotent, each webhook is keyed by event/payment id (stored in Redis/DB for a few hours) so duplicate or retried callbacks are no-ops. This prevents double-charges and double-bookings from PSP retries.

Durable booking store (Postgres/Aurora)

The system of record for confirmed bookings, payments, and seat ownership. A UNIQUE constraint on (show_id, seat_id) among active bookings is the absolute, last-line guarantee against double-booking, even if every layer above it has a bug, the database rejects the second insert. Sharded by show/event for the largest catalogs; reads can use replicas.

Notification service

Sends booking confirmations and tickets/QR codes over email/SMS/in-app after a booking is confirmed. Off the critical path, fired from a queue so a slow SMS provider can never delay or block the booking commit.

Message bus (RabbitMQ/Kafka)

Carries saga commands/events (ReserveInventory, InitiatePayment, ReleaseInventory, ConfirmOrder) and notification jobs with at-least-once delivery and retries. At-least-once is the reason every consumer must be idempotent.

Data model

Pick the right store per table. Justify each choice with the access pattern, not by reflex.

shows

show_id (PK)event_id / movie_idcinema_id / venue_idscreen_idstart_timecitystatus

One row per screening/event instance. The unit everything else hangs off. Hot rows during an on-sale are a tiny subset of all live shows; route those to dedicated capacity.

seats

seat_id (PK)screen_idrownumberseat_type / price_tier

Physical seat definitions per screen, largely static. The live booked/held status is NOT primarily stored here, booked status lives in the bookings table (durable) and held status lives in Redis (ephemeral). Keeping volatile status out of this table avoids hammering it with writes.

seat_holds (Redis, not a SQL table)

key: seat:{show_id}:{seat_id}value: user_id / session_idTTL: ~10 minutes

Ephemeral source of truth for 'currently held'. Set/checked atomically via Lua. Auto-expires so abandoned holds free themselves. Never the system of record for a confirmed sale, only for the temporary reservation window.

bookings

booking_id (PK, UUID)user_idshow_idseat_ids[]status (reserved/confirmed/cancelled)amountversioncreated_at

Durable system of record. The hard guarantee: a UNIQUE / exclusion constraint ensuring no two CONFIRMED bookings share the same (show_id, seat_id). A version column supports optimistic concurrency when transitioning reserved -> confirmed.

payments

payment_id (PK)booking_id (FK)psp_referencestatus (initiated/success/failed)idempotency_keyamountwebhook_event_ids[]

Tracks the money. The idempotency_key and recorded webhook event ids let the handler safely ignore duplicate/retried PSP callbacks. Payment status drives the saga's confirm-vs-compensate decision.

users

user_id (PK)phoneemailcity

Standard. Mostly read; not on the contention-critical path. Phone is the primary identity in the Indian market (OTP login).

Deep dives

These are the conversations the interviewer is steering you toward. Practice each one until you can talk through it without notes.

Preventing double-booking: the atomic compare-and-set, not a read-then-write

The naive design, read 'is seat A1 free?', then write 'book A1', has a race: two requests both read 'free', both write, both succeed. The fix is to make claim a single atomic operation. In Redis: SET seat:{show}:{A1} userId NX PX 600000, or equivalently a Lua script that does GET-then-SET-if-absent in one indivisible step. Exactly one of N concurrent claimants gets the OK; everyone else instantly sees 'taken'. This is why Redis (single-threaded command execution, atomic Lua) is used as the live hold authority instead of a row read in the DB. The durable database then adds a second, independent guarantee: a UNIQUE constraint on (show_id, seat_id) among active bookings. Belt and suspenders, the DB makes a double-booking physically impossible to persist even if the Redis layer were bypassed or buggy. State this two-layer model explicitly; it's what separates a strong answer from a hand-wave.

Pessimistic DB locks (SELECT ... FOR UPDATE) vs. the Redis-hold approach

You could lock at the database: SELECT * FROM seats WHERE seat_id=? FOR UPDATE inside a transaction. It's correct, but it serializes contention on the database connection and holds a DB transaction open for the entire user think-time + payment (tens of seconds). During an on-sale that turns the database into the bottleneck and exhausts the connection pool. The Redis-hold approach moves the contention to an in-memory store built for it, keeps DB transactions short (only the final confirm write), and gives you free auto-expiry via TTL, a FOR UPDATE lock has no natural timeout tied to user behavior. The trade is that Redis is now a critical dependency and you must handle its failure modes (covered next). Optimistic concurrency (a version column checked on the reserved->confirmed transition) is the lightweight DB-side complement, since by confirm time the contention is already resolved by the hold.

Lock TTL: the abandoned-cart problem and the failure modes of expiry

A hold needs a TTL because users abandon and servers crash; without expiry, one rage-quit could sterilize a seat forever. ~10 minutes is the usual choice, enough for a real UPI/card flow plus think-time, short enough to recycle inventory fast in a 90-minute sellout. But TTL introduces its own danger: what if the payment succeeds at minute 10:30, after the hold expired and someone else grabbed the seat? You must not confirm a booking whose hold is gone. Defenses: (1) make the hold window comfortably longer than the PSP timeout; (2) at confirm time, re-validate ownership atomically and treat 'hold lost' as a failure path that refunds the late payment rather than double-booking; (3) the DB UNIQUE constraint catches it regardless, the second confirm insert fails, and that booking is auto-refunded. This 'late payment after lock expiry' edge case is exactly what strong interviewers push on.

The virtual waiting room: solving the herd before correctness even matters

With 1.3 crore users hitting at t=0 and 174k tickets, no amount of clever locking saves you if the herd reaches your servers, connection pools, Redis, and load balancers all melt (the 2015 IPL-final lesson). The waiting room is admission control at the edge: every user gets a signed token with a queue position; the room releases users into the live booking path at a rate the backend can absorb (a few thousand/sec), independent of total demand. Critically it's stateless at the edge and runs before any business logic, so 99% of the herd never touches your database. It also improves fairness (FIFO-ish) and UX (a clear 'you're 40,000th in line, ~6 min' beats a spinner then a 503). The key interview insight: scaling the booking service to absorb the full herd is the wrong goal; shedding the herd to match a fixed, sellable inventory is the right one.

Payment + booking atomicity via saga and compensation

Booking spans a fast local step (acquire hold) and a slow external step (payment) that can fail, time out, or return asynchronously via webhook, you can't wrap that in one ACID transaction. Model it as a saga: ReserveInventory -> InitiatePayment -> (on success) ConfirmBooking, with a compensating ReleaseInventory if payment fails, times out, or the user abandons. The orchestrator owns this state machine and is durable, so a crash mid-flow resumes. Two correctness pillars: the hold's TTL means even a lost orchestrator can't strand inventory, and idempotency means retried steps don't duplicate work. The webhook handler dedupes on payment/event id (cached a few hours) so a PSP that retries its callback three times still produces exactly one confirmed booking and zero extra charges. Mention reconciliation: a periodic job compares PSP records vs. bookings to catch any 'charged but not booked' (refund) or 'booked but not charged' (alert) drift.

Read-path scaling and live seat-map updates

Seat-status reads are ~450x the writes, so the read path must scale independently and cheaply. Serve catalog/static assets from CDN; cache show metadata and rendered seat maps in Redis with a short TTL. The seat map shown for browsing can be a second or two stale, the authoritative check happens atomically at hold time, so a user who clicks an already-taken seat just gets a clean 'taken, pick another'. For the live feel during a hot show, push deltas to clients (WebSocket/SSE) as seats flip held/booked, rather than having millions poll. This keeps the expensive, strongly-consistent path tiny (only actual claims) while the cheap, eventually-consistent path carries the bulk of traffic. Don't try to make the browse view perfectly consistent, that's where naive designs waste their consistency budget.

Trade-offs to discuss

Every senior interviewer expects you to surface at least 3 of these. Pick the decisions, state the alternatives, and justify your choice.

Redis hold as live source of truth + DB UNIQUE constraint as backstop (vs. DB-only locking)

Redis gives sub-millisecond atomic claims and free TTL-based auto-expiry, keeping DB transactions short and the database off the contention hot path. The cost is a hard dependency on Redis and the need to handle its failure modes (quorum/RedLock for hot events). The DB UNIQUE constraint makes a double-booking impossible to persist regardless, so Redis being the fast path doesn't mean it's the only guarantee.

Virtual waiting room (shed the herd) instead of autoscaling the booking service to meet demand

Inventory is fixed and small; demand is unbounded. Scaling compute to absorb 13M concurrent users is wasteful and still risks melting stateful components (Redis, DB, pools). Admission control at the edge matches throughput to what's sellable and protects everything downstream. Trade-off: added edge dependency and a queue UX, plus tuning the dispatch rate.

~10-minute hold TTL (vs. shorter or no expiry)

Long enough for real Indian payment flows (UPI redirect, OTP, netbanking) plus think-time; short enough to recycle seats fast during a sellout. Too short frustrates legitimate payers; too long sterilizes inventory and lets abandoners lock seats. The cost is the 'payment-after-expiry' edge case, which you handle with re-validation + auto-refund.

Saga with compensation (vs. trying for a single distributed ACID transaction across booking + payment)

Payment is external, slow, and asynchronous (webhooks), it cannot sit inside one ACID transaction. A saga gives recoverability and clear compensation (release the hold) at the price of more moving parts and the need for end-to-end idempotency and a reconciliation job.

Eventually-consistent browse seat map (vs. strongly-consistent everywhere)

Spending the consistency budget only where a seat is actually claimed lets the 450:1 read traffic be cached and cheap. The downside is a brief window where the displayed map is stale; this is acceptable because the atomic hold at claim time is the real arbiter, and a clean 'seat just taken' retry is fine UX.

Idempotency keys everywhere on the write path (extra complexity)

At-least-once message delivery and PSP webhook retries make duplicates inevitable. Dedicating an idempotency key per booking attempt and per payment event prevents double-charges and double-bookings. The cost is extra storage and discipline, but it's non-negotiable when real money is involved.

How BookMyShow actually does it

BookMyShow's current design is, in large part, a reaction to a public failure: during the 2015 IPL final on-sale, roughly 500,000 users hit the system simultaneously and it collapsed. The lessons from that, distributed Redis-based seat locking and a virtual queue for high-demand events, became the template. The 2024 Coldplay India on-sale is the modern stress test that gets quoted in interviews: about 1.3 crore (13 million) users competing for ~174,000 tickets across three days, sold out in roughly 90 minutes, peaking near ~4,800 booking attempts/sec, with the herd gated by a virtual waiting room (Queue-it / Cloudflare Waiting Room). Public engineering write-ups consistently describe the same shape: Redis as the live hold store with an atomic Lua compare-and-set and a ~10-minute TTL, a saga/compensation flow around an asynchronous payment step with idempotent webhook handling, eventually-consistent cached seat maps for the read-heavy browse path, and a durable relational store (Postgres/Aurora) as the final system of record with a uniqueness guarantee. Treat the specific instance counts and per-second cost figures in third-party blogs as informed estimates, not official numbers, but the architectural pattern (waiting room + Redis hold + saga + DB backstop) is well-corroborated and is what interviewers expect you to converge on.

Sources

Lessons to study before this interview

If any of these topics are fuzzy, the interviewer will catch it. Each lesson is 15 to 60 minutes with diagrams, code, and a quiz.

Distributed Locks

advanced / distributed systems core

Idempotency

foundation / core fundamentals

Saga Pattern

advanced / distributed systems core

Cache-Aside Pattern

foundation / caching strategies

Rate Limiting for Resilience

advanced / reliability resilience

Design a Payment System

capstone / capstone

Redis Cache

foundation / caching strategies

Frequently asked questions

Practice with 766 system design lessons

Lifetime access for INR 499 or $7.99. Interactive diagrams, runnable code, quizzes, and 20 capstone projects including Design BookMyShow.

BookMyShow System Design: Seat Booking Concurrency

Where it shows up

Why this question is asked

Requirements

Always clarify these in the first 5 minutes of the interview. Do not start drawing boxes until both lists are agreed.

Functional requirements

Browse movies/events, cities, cinemas, and showtimes; view a live seat map with per-seat status (available / held / booked) and price tier
Select one or more seats and place a temporary hold so other users cannot grab them while the current user pays
Complete payment within the hold window; on success the seats become permanently booked and a ticket/QR is issued
Automatically release held seats if the user abandons or the hold expires, returning them to the available pool
Guarantee a seat is never sold to two users (no double-booking) even under extreme concurrency
Handle on-sale / blockbuster surges (IPL, Coldplay, big releases) without the system collapsing, graceful queueing instead of errors
Support cancellations/refunds per the event's policy and return inventory if allowed
Send booking confirmation (email/SMS/in-app) with ticket details

Non-functional requirements

Strong consistency on the seat-inventory write path, correctness is non-negotiable; a double-booking is a real-money, real-reputation failure
High availability and graceful degradation on the read path: if booking is overloaded, browsing and seat maps should still work
Low latency for seat-map reads (target < 100-200ms) so the UI feels live; seat-status reads dominate traffic ~450:1 over writes
Elastic capacity: handle 50-100x normal load during a hot on-sale (normal a few thousand RPS, peak 100k-300k RPS of mixed traffic)
Bounded, fair admission during surges via a virtual waiting room rather than first-come server crashes
Idempotency end-to-end: a retried request or a duplicated payment webhook must never create a second booking or a second charge
Durability: a confirmed booking + successful charge must survive crashes; partial failures must be reconcilable

Back-of-envelope scale estimates

Show your math. Pulling numbers from thin air signals you have not thought about the load.

Peak booking attempts/sec (single hot on-sale)

~5,000 writes/sec attempted

Read:write ratio

~450:1

Concurrent users in the waiting room

10+ million queued, dispatched ~2,000-2,500/sec

Seat-hold TTL

~5-10 minutes

Inventory footprint

Tiny per show, huge in aggregate

Peak bandwidth

hundreds of MB/sec

Coldplay-scale events report ~700+ MB/sec and multiple TB transferred over the event. Most of this is seat-map and asset reads served from CDN/cache, not the booking path.

How BookMyShow actually does it

Frequently asked questions

BookMyShow System Design: Seat Booking Concurrency

Why this question is asked

Requirements

Functional requirements

Non-functional requirements

Back-of-envelope scale estimates

High-level architecture

Core components

Virtual Waiting Room (edge admission control)

Catalog & Search service (read plane)

Seat-Map service

Seat-Hold / Lock service (Redis)

Booking Orchestrator (saga state machine)

Payment service + webhook handler

Durable booking store (Postgres/Aurora)

Notification service

Message bus (RabbitMQ/Kafka)

Data model

Deep dives

Preventing double-booking: the atomic compare-and-set, not a read-then-write

Pessimistic DB locks (SELECT ... FOR UPDATE) vs. the Redis-hold approach

Lock TTL: the abandoned-cart problem and the failure modes of expiry

The virtual waiting room: solving the herd before correctness even matters

Payment + booking atomicity via saga and compensation

Read-path scaling and live seat-map updates

Trade-offs to discuss

Redis hold as live source of truth + DB UNIQUE constraint as backstop (vs. DB-only locking)

Virtual waiting room (shed the herd) instead of autoscaling the booking service to meet demand

~10-minute hold TTL (vs. shorter or no expiry)

Saga with compensation (vs. trying for a single distributed ACID transaction across booking + payment)

Eventually-consistent browse seat map (vs. strongly-consistent everywhere)

Idempotency keys everywhere on the write path (extra complexity)

How BookMyShow actually does it

Lessons to study before this interview

Related system design interview questions

Frequently asked questions

How does BookMyShow guarantee a seat is never sold twice?

Why hold a seat with a TTL instead of just booking it immediately?

What stops the system from crashing when 10 million people hit it at on-sale?

Why use Redis for locks instead of a database row lock (SELECT ... FOR UPDATE)?

How do you keep payment and booking atomic when payment can fail mid-flow?

What happens if a user pays successfully but their seat hold already expired?

How does the read-heavy browse traffic scale without hurting booking correctness?

Practice with 766 system design lessons

BookMyShow System Design: Seat Booking Concurrency

Why this question is asked

Requirements

Functional requirements

Non-functional requirements

Back-of-envelope scale estimates

High-level architecture

Core components

Virtual Waiting Room (edge admission control)

Catalog & Search service (read plane)

Seat-Map service

Seat-Hold / Lock service (Redis)

Booking Orchestrator (saga state machine)

Payment service + webhook handler

Durable booking store (Postgres/Aurora)

Notification service

Message bus (RabbitMQ/Kafka)

Data model

Deep dives

Preventing double-booking: the atomic compare-and-set, not a read-then-write

Pessimistic DB locks (SELECT ... FOR UPDATE) vs. the Redis-hold approach

Lock TTL: the abandoned-cart problem and the failure modes of expiry

The virtual waiting room: solving the herd before correctness even matters

Payment + booking atomicity via saga and compensation

Read-path scaling and live seat-map updates

Trade-offs to discuss

Redis hold as live source of truth + DB UNIQUE constraint as backstop (vs. DB-only locking)

Virtual waiting room (shed the herd) instead of autoscaling the booking service to meet demand

~10-minute hold TTL (vs. shorter or no expiry)

Saga with compensation (vs. trying for a single distributed ACID transaction across booking + payment)

Eventually-consistent browse seat map (vs. strongly-consistent everywhere)

Idempotency keys everywhere on the write path (extra complexity)

How BookMyShow actually does it

Lessons to study before this interview

Related system design interview questions

Frequently asked questions