Design Hotstar (JioHotstar): System Design Interview Guide
JioHotstar streamed the 2026 T20 World Cup semi-final to 65.2 million concurrent viewers — the highest concurrency ever recorded for any live event on any digital platform.
A deep, interview-grade walkthrough of designing a live video streaming platform at India scale: adaptive bitrate (HLS/DASH), multi-CDN with origin shielding, the brutal traffic dynamics of a cricket wicket, and why you pre-warm and predictively scale instead of trusting reactive autoscaling. Grounded in how Hotstar actually handled its record-breaking IPL and World Cup peaks.
Asked at: This is a canonical India-market interview question. It is asked at JioHotstar/Disney directly, and the "design a live streaming service for the IPL" variant shows up at Amazon, Google, Flipkart, Swiggy, PhonePe, and most India product-company SDE2/SDE3 loops because the 50M+ concurrent cricket spike is the most famous scaling story in Indian engineering.
Why this question is asked
Live streaming forces the candidate to reason about a fundamentally different shape of load than a CRUD app. The interviewer is checking whether you understand that (1) video is delivered by CDNs, not your origin, so the real design question is about cache hit ratio and origin protection, not request handling; (2) live events produce correlated, near-instant demand spikes (a wicket falls, millions open the app in the same 60 seconds) that reactive autoscaling cannot absorb because servers take ~90s to boot; and (3) graceful degradation — shedding non-essential features to protect the play button — is a deliberate design choice, not a failure. It separates engineers who memorized "use a load balancer and a cache" from those who can reason about capacity ahead of demand.
Requirements
Always clarify these in the first 5 minutes of the interview. Do not start drawing boxes until both lists are agreed.
Functional requirements
- Stream live events (cricket, shows) and video-on-demand to web, Android, iOS, and TV apps
- Adaptive bitrate playback: the player picks a quality (4K/1080p/720p/480p/360p/low) based on live measured bandwidth and switches mid-stream without re-buffering
- Low glass-to-glass latency for live — the goal is the streamed moment landing close to the real event (sub-30s typical for HLS/DASH; lower with LL-HLS), and fast sub-3s start-up time
- Authentication, subscription/entitlement checks, and content/geo licensing enforcement before a stream is served
- Live engagement features: scorecard, watch-along, reactions, and concurrency counter
- Personalized home page, search, and continue-watching for VOD
- Handle predictable mega-events (IPL, World Cup finals) with scheduled, pre-announced peaks
Non-functional requirements
- Survive 50-65M+ concurrent live viewers with no full outage; partial degradation is acceptable, a black screen on the match is not
- Absorb a surge of ~500K new concurrent viewers per minute when a key moment (wicket, last over) hits
- Keep origin load flat regardless of audience size — target 90%+ CDN cache hit ratio so origin sees a tiny fraction of edge requests
- High availability for the playback path (the play button must work even when recommendations, social, and analytics are down)
- Multi-CDN redundancy so a single CDN outage during a final does not take down playback
- Cost efficiency at peak: peak is ~10-50x the daily baseline, so the architecture must scale up for a 3-hour window and scale back down
- Global edge delivery but India-first network reality: serve a 5G phone in Bengaluru and a 2G connection in a village from the same pipeline
Back-of-envelope scale estimates
Show your math. Pulling numbers from thin air signals you have not thought about the load.
Peak concurrent live viewers
65.2M
JioHotstar's verified record at the 2026 T20 World Cup semi-final; the 2023 ODI World Cup final hit 59M and the platform repeatedly cleared 50M+ for marquee India matches.
Surge rate at a key moment
~500K new concurrent / minute
When a wicket falls or the last over starts, millions open the app in a correlated burst. Hotstar publicly cited concurrency growth of ~500K per minute, which is the spike reactive autoscaling cannot keep up with.
Peak egress bandwidth
~5+ Tbps
At ~10M concurrent, Hotstar measured ~5.7 Tbps peak. At 65M with adaptive bitrate, sustained multi-CDN egress is on the order of tens of terabits per second — the reason multi-CDN is non-negotiable.
Origin requests after caching
~10% of edge requests
At a 90% cache hit ratio, ~65M apparent viewers translate to only ~6.5M of requests reaching origin/shield, which is what makes the origin survivable. The other 90% are served from CDN edge.
Peak compute footprint
~8,000 vCPU / 16 TB RAM, 800+ microservices on EKS
Hotstar's published peak control/API plane: thousands of cores and 16 TB RAM across 800+ microservices on Amazon EKS. This is the API/control plane — video bytes themselves never touch it; they flow CDN→client.
Server warm-up time
~75-90s
A newly provisioned instance takes ~90s to boot and ~75s for the container/app to become healthy. This single number is why pre-warming and predictive scaling exist — you cannot react to a 60-second spike with 90-second servers.
Adaptive bitrate renditions
6 ladder rungs
4K, 1080p, 720p, 480p, 360p, and a low-bandwidth rung — encoded once per live stream so a 5G phone and a 2G connection are served from identical infrastructure.
High-level architecture
Start by separating the two completely different planes, because conflating them is the most common interview mistake. The **data plane** moves video bytes; the **control plane** moves everything else (auth, entitlements, home page, scorecard, social). Video bytes never flow through your application servers. A live camera feed goes into an ingest + transcoder that produces a 6-rung adaptive bitrate ladder, chops each rung into short segments (2-6s for standard HLS/DASH, sub-second with Low-Latency HLS), writes a manifest (the playlist of segment URLs), and pushes segments to origin. Players fetch the manifest, then pull segments — and crucially they pull them through a CDN, not through you. With a 90%+ edge cache hit ratio, 65M concurrent players hammering for the same live segment collapse into a handful of origin pulls per segment. The control plane is a fleet of 800+ microservices on Kubernetes (Amazon EKS) fronted by a centralized gateway. The playback-start request flow is: client → gateway → auth/entitlement check (is this user subscribed, is this content licensed in their region) → playback service returns a signed manifest URL pointing at the multi-CDN. From that point on, the client talks only to CDNs. So the control plane handles a burst of *session starts* (heavy, correlated, and where the surge actually lands), while the data plane handles a continuous river of segment fetches (absorbed almost entirely at the edge). The defining challenge is the shape of demand. A cricket match is not steady traffic — it is a step function. A wicket falls, an out-of-app push notification fires, and ~500K people per minute open the app and hit the playback-start path in the same window. Reactive autoscaling cannot save you here because a server takes ~90s to become healthy and the spike is over in 60. So the real architecture is **predictive**: forecast the peak from historical match data, **pre-warm** capacity and load balancers ahead of the toss, scale up in a *ladder* keyed to projected concurrency (Hotstar provisions per pre-defined "million-concurrent" rung and keeps a ~2M-concurrent buffer ready), and treat reactive autoscaling only as a fast top-up. When even that is not enough, the system enters **panic mode**: the server tells clients it is overloaded, clients exponentially back off with jitter, and non-essential features (recommendations, social, fancy home page) are shed to protect the one thing that must never break — the play button.
In a real interview, sketch this on the whiteboard before diving into any single box.
Core components
Walk through each service. The interviewer wants to hear what each one owns, not just the names.
Ingest + Live Transcoder (ABR ladder)
Takes the raw camera/contribution feed and produces 6 renditions (4K→low-bandwidth) simultaneously, segments each rung into short chunks, and emits HLS/DASH manifests. This is the only component that produces the actual video. Encoding once per stream — not per viewer — is what lets one pipeline serve 65M people across 5G and 2G.
Origin + Origin Shield
Stores live segments and manifests. The origin shield is a mid-tier cache layer between the many CDN edges and the true origin: when 10 CDN POPs all miss on a fresh segment, they fetch from the shield, and only the shield fetches from origin — collapsing N edge misses into 1 origin pull. This is the single most important origin-protection trick for live (a brand-new segment is, by definition, a cache miss everywhere at once).
Multi-CDN + CDN Load Optimizer
Traffic is spread across Akamai, CloudFront, Cloudflare and others. An in-house load optimizer routes each client to the best CDN by real-time health, cost, and per-region performance, and fails over automatically if one CDN degrades mid-final. No single CDN can carry tens of Tbps reliably, and no single CDN should be a single point of failure during the one event that matters most.
Playback / Entitlement Service
The control-plane gate. On stream start it verifies subscription status, applies content licensing and geo-restrictions, and returns a short-lived signed manifest URL. This is where the session-start surge concentrates, so it is the service you pre-warm most aggressively and the first place you apply admission control.
API Gateway (centralized Envoy)
A single Envoy-based gateway replaced hundreds of per-service load balancers. It also splits cacheable APIs (scorecard, match summary, highlights — pushed to a CDN domain) from non-cacheable APIs (sessions, personalization), so heavy, repetitive read traffic is served from the edge and never touches the gateway compute budget.
Predictive Autoscaler + Pre-warmer
A custom autoscaler that scales on actual concurrency metrics rather than CPU/memory lag, spinning pods in ~30s instead of waiting on the 60-90s CPU-signal delay. It is paired with ladder-based pre-provisioning (capacity per projected million concurrent) and a standing ~2M buffer, plus pre-warmed load balancers before the toss.
Panic Mode + Client Backoff Controller
When backend saturation is detected, the server signals 'panic' to clients, which then increase the interval between polling/heartbeat requests using exponential backoff with jitter. This flattens the thundering-herd retry storm and is a deliberate load-shedding lever, not an error state.
Concurrency / Live Stats Pipeline
A streaming aggregation pipeline that counts live concurrent viewers in near-real-time and feeds the autoscaler and ops dashboards (Hotstar's 'Infradashboard'). It is also what powers the on-screen concurrency counter. It must be approximate-but-fast, not exact — counting 65M sessions precisely in real time is wasteful.
Load Testing Harness (Project Hulk-style)
An in-house framework that simulates full user journeys and entire traffic patterns (including ML-modeled spike shapes), runs tsunami/chaos tests, and validates that pre-warming and panic mode actually hold. You cannot discover your real ceiling for the first time during the World Cup final.
Data model
Pick the right store per table. Justify each choice with the access pattern, not by reflex.
contentcontent_idtype (live|vod)titlelanguagelicense_regions[]drm_policystatus (scheduled|live|ended)One row per asset. license_regions and drm_policy drive the geo/entitlement gate; live vs vod changes the caching TTL story entirely (live segments get very short TTLs, vod gets long ones).
live_streamstream_idcontent_idingest_endpointabr_ladder (jsonb: rungs + bitrates)manifest_urlorigin_urlcdn_pool[]The live-specific config: which bitrate rungs are encoded and which CDNs are in the active pool for this event. cdn_pool is mutated live by the CDN optimizer during failover.
playback_sessionsession_iduser_idcontent_idcdn_assignedstart_tsheartbeat_tsbitrate_currentdevice_typeCreated at stream start, updated by client heartbeats. The source of truth for concurrency counting and the surge signal. Stored in a fast store (not a relational primary) and rolled up; you never JOIN against 65M live rows transactionally — you aggregate a stream of them.
entitlementuser_idplan (free|super|premium)valid_untilallowed_max_streamsregionHeavily cached (Redis) and checked on every playback start. Region here is enforced against content.license_regions. allowed_max_streams enforces concurrent-device limits, itself a load and licensing control.
concurrency_rollupcontent_idwindow_tsconcurrent_countcdn_split (jsonb)region_split (jsonb)Pre-aggregated, time-bucketed concurrency derived from the heartbeat stream. Feeds the autoscaler ladder, the Infradashboard, and the on-screen counter. Approximate and append-only by design.
segment_cache_metastream_idsegment_seqrenditionttlcache_keyConceptually how the CDN/origin-shield key live segments. Short, sequential segment names + short TTLs make a fresh live segment a coordinated miss across all edges — which is exactly why origin shield exists.
Deep dives
These are the conversations the interviewer is steering you toward. Practice each one until you can talk through it without notes.
Why a cricket wicket is the hardest load-test in the world
Normal apps see traffic that rises and falls smoothly; you can autoscale into it. A wicket is a correlated step function: the event happens, a push notification fires to tens of millions of devices, and within ~60 seconds ~500K *new* concurrent viewers per minute slam the playback-start path. The killer is the timing mismatch — a new server needs ~90s to be healthy, but the spike resolves in ~60s. By the time reactive autoscaling provisions capacity, the surge is already over (or has already caused errors). This is why the entire design is built around having capacity *before* demand, not provisioning *in response* to it: forecast the peak from historical match data, pre-warm to the projected number, keep a ~2M-concurrent buffer hot, and use a custom autoscaler that reacts to concurrency metrics (which lead) instead of CPU (which lags by 60-90s).
Origin shielding: collapsing the thundering herd on a fresh live segment
For VOD, popular segments are cached everywhere and origin is sleepy. Live is the opposite: every 2-6 seconds a brand-new segment is published, and at that instant it is a cache MISS at every single CDN edge simultaneously. Without protection, every CDN POP independently pulls the new segment from origin — for every rendition — the moment it appears. That is a synchronized stampede on origin, repeating every few seconds for hours. The fix is a two-tier cache: an origin shield (a designated mid-tier cache) sits between the edges and true origin. All edge misses for a given segment are funneled to the shield; the shield does request collapsing so only ONE request per segment reaches the actual origin, and fans the response back out to all edges. This is what turns 65M viewers into a single-digit number of origin reads per segment, and it is the detail strong candidates volunteer that weak ones miss.
Adaptive bitrate: encode once, serve 5G and 2G from the same pipeline
India's network reality spans 5G fiber and 2G rural links, often in the same match. You cannot personalize a stream per viewer — that would mean 65M encodes. Instead you encode the live feed once into a ladder of ~6 renditions (4K/1080p/720p/480p/360p/low), segment each, and list them in the manifest. The *client* player measures its own throughput each segment and decides which rung to request next: download was fast and buffer is full → step up a rung; download stalled → step down before the buffer empties and the user sees a spinner. All the adaptation logic lives on the device, the server just publishes every rung. Discuss the trade between segment length (shorter = lower latency and faster quality adaptation, but more requests and overhead) and the start-up vs latency target (sub-3s start-up, with Low-Latency HLS pushing live latency down toward a few seconds at the cost of more CDN/tuning complexity).
Graceful degradation and panic mode: protecting the play button
At the ragged edge of capacity you do not let the system fail uniformly — you fail it *selectively*. The playback path is sacred; recommendations, the rich personalized home page, watch-along social, and analytics are not. When the backend detects saturation it enters panic mode: it signals overload to clients, which respond by backing off exponentially (with jitter to avoid synchronized retries), and the platform sheds non-essential services to reclaim their CPU and DB capacity for the critical path. A user in panic mode might get a plainer home screen and no live reactions — but the match keeps playing. Framing this as a *designed* behavior with explicit tiers of importance (must-never-break vs nice-to-have) is exactly the senior signal interviewers look for. Pair it with circuit breakers on inter-service calls so a slow recommendation service can't drag down playback.
Multi-CDN: redundancy and economics, not just performance
No single CDN reliably carries the tens of terabits per second a 65M-concurrent final demands, and depending on one CDN means a single vendor incident can black out the biggest event of the year. So traffic is split across Akamai, CloudFront, Cloudflare, etc., with an in-house optimizer assigning each client a CDN by live health, regional performance, and cost. If one CDN's error rate or latency spikes mid-match, the optimizer reroutes new sessions (and can migrate existing ones) away from it. There's also a cost lever: CDN pricing varies by region and commit, so the optimizer can prefer cheaper capacity when performance is equal. The trade-off to name: multi-CDN adds real operational complexity — you now manage cache consistency, token/signed-URL auth, and warming across multiple vendors — but for a flagship live event the redundancy is non-negotiable.
Splitting cacheable from non-cacheable to free up the control plane
A huge amount of in-match API traffic is the same for everyone: the live scorecard, match summary, highlights list. If those hit your gateway and microservices, you're spending precious control-plane CPU re-serving identical data to millions. Hotstar's move was to route cacheable APIs to a dedicated CDN domain with lighter security checks (no per-user validation needed for a public scorecard) so the edge serves them, while non-cacheable, per-user APIs (sessions, personalization, entitlement) go through the full gateway. This is the same edge-offload principle as video, applied to JSON: identify what's identical-across-users and push it to the CDN so your origin compute is reserved for genuinely user-specific work. It directly buys back capacity for the session-start surge.
Trade-offs to discuss
Every senior interviewer expects you to surface at least 3 of these. Pick the decisions, state the alternatives, and justify your choice.
Predictive pre-warming + ladder scaling instead of pure reactive autoscaling
Reactive autoscaling reacts on a ~90s server-boot delay, but the wicket surge lands in ~60s — so reactive alone guarantees errors at the worst moment. Pre-warming to a forecasted peak and provisioning per projected-million-concurrent ladder costs money (you pay for idle headroom and a ~2M buffer) but it's the only thing that's actually ahead of the spike. The cost is justified because the events are pre-scheduled and predictable; you know the toss time.
Multi-CDN over a single best-in-class CDN
A single CDN is simpler (one set of cache keys, tokens, warming) and may even be cheaper per GB at commit. But it caps your peak bandwidth and makes one vendor a single point of failure for your marquee event. For a 65M final, redundancy and aggregate capacity outweigh the operational simplicity of one vendor.
Approximate near-real-time concurrency instead of exact counts
Exactly counting 65M live sessions in real time is wasteful and slow. The autoscaler and the on-screen counter need a number that's fresh and directionally correct, not perfectly precise. Trading exactness for latency lets the concurrency pipeline drive scaling decisions fast enough to matter.
Graceful degradation (shed features) instead of uniform failure or hard rejection
When saturated, you can (a) fail everything, (b) reject new users at the door, or (c) keep everyone watching but strip non-essential features. Option (c) protects the core product experience — people came for the match, not the recommendations — at the cost of a degraded secondary experience and the engineering effort to tier every feature by importance up front.
Standard HLS/DASH segment latency vs Low-Latency HLS
Standard 2-6s segments give rock-solid CDN cacheability and simple operations but put live ~15-30s behind real time. LL-HLS (sub-second chunks) cuts that toward a few seconds — better for a live wicket reaction and to beat the neighbor's TV — but multiplies request volume and demands tighter CDN/player tuning. You pick per event how much latency you'll trade for cache efficiency and stability.
Encode-once ABR ladder vs per-user/just-in-time encoding
One fixed ladder serves everyone and keeps the encode count constant regardless of audience size — essential at 65M. The cost is you serve a fixed set of quality rungs rather than perfectly optimal per-device renditions, and you pay to encode rungs even if few viewers use the top one. At this concurrency, constant encode cost beats per-viewer optimization every time.
How Hotstar (JioHotstar) actually does it
The numbers here are from Hotstar/JioHotstar's own public engineering talks (Rootconf), the platform's published figures, and reputable engineering writeups — not invented. Verified anchors: 59M concurrent at the 2023 ODI World Cup final and 65.2M at the 2026 T20 World Cup semi-final (the current world record for live-event concurrency); ~10.3M concurrent at the 2019 VIVO IPL final with ~5.7 Tbps peak egress and ~1M requests/sec; ~500K concurrency growth per minute during surges; ~90s server warm-up and ~75s container/app startup (the root cause of the pre-warming strategy); 800+ microservices on Amazon EKS at peak with ~8,000 vCPU and 16 TB RAM; ~90% CDN cache hit ratio; multi-CDN across Akamai/CloudFront/Cloudflare with an in-house optimizer; panic mode with client-side exponential backoff; and Project Hulk, their in-house load/chaos testing framework that simulates full traffic patterns with ML. Where exact internal figures aren't public (e.g., the precise origin-shield request-collapse ratio, current LL-HLS latency targets), the page reasons from the published cache-hit ratio and standard HLS/DASH behavior and flags those as estimates. The compute/RAM figures describe the API/control plane only — the video bytes themselves are served by CDNs and never touch those servers, which is the central architectural point.
Sources
- ByteByteGo — How Disney+ Hotstar (now JioHotstar) Scaled Its Infra for 60M+ Concurrent Users
- System Design Newsletter — How Disney+ Hotstar Scaled to 25M Concurrent Users (pre-warming, panic mode, backoff)
- ScaleYourApp — Hotstar at 10.3M Concurrent: ladder scaling, 2M buffer, Gatling/Flood load testing
- JioHotstar IPL/World Cup 2026: 65.2M concurrent on AWS EKS, custom autoscaler, multi-CDN optimizer, 90% cache hit
- TechCrunch — India vs Australia World Cup final shatters streaming records (59M) on Hotstar
- Hotstar Engineering — Scaling hotstar.com for 25M concurrent viewers (Rootconf)
Lessons to study before this interview
If any of these topics are fuzzy, the interviewer will catch it. Each lesson is 15 to 60 minutes with diagrams, code, and a quiz.
Design a Content Delivery Network
capstone / capstone
Content Delivery Network (CDN)
foundation / load balancing proxies
Load Balancing
foundation / core fundamentals
Cache Warming
foundation / caching strategies
Circuit Breaker for Resilience
advanced / reliability resilience
Backpressure for Resilience
advanced / reliability resilience
Global Server Load Balancing (GSLB)
foundation / load balancing proxies