Is this a video course?

No. This is an interactive, slide-based learning platform. Each lesson has rich text, animated diagrams, live code editors, and quizzes. You learn by reading, interacting, and doing, not by watching videos passively.

How long do I have access?

Forever. Both pricing tiers are one-time payments with lifetime access. This includes all current 766 lessons and any future content we add.

What level of experience do I need?

None. We start from absolute basics like 'What is latency?' and build up to distributed consensus protocols. The Foundation level assumes zero prior knowledge of system design.

How much does the system design course cost?

7.99 US dollars for lifetime access globally, or 499 Indian rupees for lifetime access in India. One-time payment, no subscription, no hidden fees. 11 lessons are free with no signup required.

What technologies are covered?

Everything from DNS and load balancers to Kubernetes, Kafka, distributed databases, consensus protocols, stream processing, security architecture, and observability. We cover principles and real-world implementations used at Netflix, Google, Amazon, Uber, Stripe, and more.

Is this useful for system design interview preparation?

Yes. The lessons are structured around the exact topics asked in system design interviews at FAANG and top-tier companies. Interactive diagrams help you practice whiteboard-style explanations. Covers everything from URL shortener design to distributed payment systems.

How is this different from ByteByteGo or Educative?

766 interactive lessons (4x more than most competitors), 16 different diagram types that build step by step, real production examples from Netflix, Google, Amazon, Uber, and Stripe, and lifetime access for a one-time payment of $7.99 instead of annual subscriptions costing 100 to 200 dollars per year.

Does the live video actually flow through Hotstar's servers?

No, and saying it does is the fastest way to fail this interview. The video bytes flow camera → transcoder → origin → CDN edge → player. Hotstar's microservices (the ~8,000 vCPU / 16 TB RAM control plane) handle auth, entitlement, the home page, scorecard, and session start, but once the player has a signed manifest URL it pulls every video segment from a CDN. That's why 65M viewers don't require 65M servers: at a ~90% cache hit ratio the edge absorbs the bytes and origin sees only a sliver.

Why can't they just use autoscaling to handle the cricket spike?

Because a freshly provisioned server takes ~90s to become healthy, but a wicket surge, roughly 500K new concurrent viewers per minute, lands and resolves in about 60 seconds. Reactive autoscaling is always a step behind a step-function spike. The answer is to have capacity before demand: forecast the peak from historical match data, pre-warm load balancers and servers ahead of the toss, scale up in pre-defined ladders per projected million concurrent, keep a ~2M-concurrent hot buffer, and use a concurrency-driven custom autoscaler as a fast top-up rather than the primary defense.

What is origin shielding and why does live streaming need it more than VOD?

For live, a brand-new segment is published every few seconds and is simultaneously a cache miss at every CDN edge, so without protection, every edge pulls it from origin at once, every few seconds, for every rendition. Origin shield is a mid-tier cache between the edges and true origin that does request collapsing: all edge misses for a segment funnel to the shield, only one request per segment reaches real origin, and the response fans back out. It turns a synchronized origin stampede into a handful of reads. VOD rarely needs this because its popular segments are already warm everywhere.

How does one stream serve both a 5G phone and a 2G connection?

Adaptive bitrate. The live feed is encoded once into ~6 quality rungs (4K down to a low-bandwidth rung) and listed in the manifest. The player itself measures throughput each segment and picks the next rung, stepping down before the buffer empties on a weak link, stepping up when bandwidth allows. The server just publishes every rung; all the adaptation logic runs on the device. Encoding once (not per viewer) is what keeps it economical at 65M concurrent.

What is 'panic mode' and is it a failure?

It's a deliberate load-shedding design, not a failure. When the backend detects saturation it signals overload to clients, which then back off exponentially with jitter to break the retry stampede, and the platform sheds non-essential features (recommendations, social, the rich home page) to reclaim capacity for the playback path. The contract is explicit: the play button must never break, everything else is negotiable. Describing graceful degradation as a designed tiering of features by importance is exactly the senior signal interviewers want.

Why multi-CDN instead of just using the best single CDN?

Two reasons: capacity and redundancy. A 65M-concurrent final pushes tens of terabits per second, more than any single CDN reliably carries, and depending on one vendor makes that vendor a single point of failure for your biggest event of the year. Hotstar splits traffic across Akamai, CloudFront, Cloudflare and others with an in-house optimizer that routes each client by live health, regional performance, and cost, and reroutes automatically if a CDN degrades mid-match. The price is operational complexity, managing cache keys, signed-URL auth, and warming across vendors, which is worth it for the flagship event.

How do they count 65M concurrent viewers in real time without melting a database?

They don't count exactly, they approximate. Client heartbeats feed a streaming aggregation pipeline that produces time-bucketed, near-real-time concurrency rollups. That approximate number is good enough to drive the autoscaler ladder, the ops dashboard, and the on-screen counter, and it's far cheaper than transactionally counting 65M live sessions. Trading exactness for freshness is the right call: the autoscaler needs a fast, directionally-correct signal, not a perfect tally.

System design interview guide

JioHotstar (Hotstar) System Design Interview: 65M Viewers

JioHotstar streamed the 2026 T20 World Cup semi-final to 65.2 million concurrent viewers, the highest concurrency ever recorded for any live event on any digital platform.

A deep, interview-grade walkthrough of designing a live video streaming platform at India scale: adaptive bitrate (HLS/DASH), multi-CDN with origin shielding, the extreme traffic dynamics of a cricket wicket, and why you pre-warm and predictively scale instead of trusting reactive autoscaling. Grounded in how Hotstar actually handled its record-breaking IPL and World Cup peaks.

Where it shows up

This is a canonical India-market interview question. It is asked at JioHotstar/Disney directly, and the "design a live streaming service for the IPL" variant shows up at Amazon, Google, Flipkart, Swiggy, PhonePe, and most India product-company SDE2/SDE3 loops because the 50M+ concurrent cricket spike is the most famous scaling story in Indian engineering.

Why this question is asked

Live streaming forces the candidate to reason about a fundamentally different shape of load than a CRUD app. The interviewer is checking whether you understand that (1) video is delivered by CDNs, not your origin, so the real design question is about cache hit ratio and origin protection, not request handling; (2) live events produce correlated, near-instant demand spikes (a wicket falls, millions open the app in the same 60 seconds) that reactive autoscaling cannot absorb because servers take ~90s to boot; and (3) graceful degradation, shedding non-essential features to protect the play button, is a deliberate design choice, not a failure. It separates engineers who memorized "use a load balancer and a cache" from those who can reason about capacity ahead of demand.

Requirements

Always clarify these in the first 5 minutes of the interview. Do not start drawing boxes until both lists are agreed.

Functional requirements

Stream live events (cricket, shows) and video-on-demand to web, Android, iOS, and TV apps
Adaptive bitrate playback: the player picks a quality (4K/1080p/720p/480p/360p/low) based on live measured bandwidth and switches mid-stream without re-buffering
Low glass-to-glass latency for live, the goal is the streamed moment landing close to the real event (sub-30s typical for HLS/DASH; lower with LL-HLS), and fast sub-3s start-up time
Authentication, subscription/entitlement checks, and content/geo licensing enforcement before a stream is served
Live engagement features: scorecard, watch-along, reactions, and concurrency counter
Personalized home page, search, and continue-watching for VOD
Handle predictable mega-events (IPL, World Cup finals) with scheduled, pre-announced peaks

Non-functional requirements

Survive 50-65M+ concurrent live viewers with no full outage; partial degradation is acceptable, a black screen on the match is not
Absorb a surge of ~500K new concurrent viewers per minute when a key moment (wicket, last over) hits
Keep origin load flat regardless of audience size, target 90%+ CDN cache hit ratio so origin sees a tiny fraction of edge requests
High availability for the playback path (the play button must work even when recommendations, social, and analytics are down)
Multi-CDN redundancy so a single CDN outage during a final does not take down playback
Cost efficiency at peak: peak is ~10-50x the daily baseline, so the architecture must scale up for a 3-hour window and scale back down
Global edge delivery but India-first network reality: serve a 5G phone in Bengaluru and a 2G connection in a village from the same pipeline

Back-of-envelope scale estimates

Show your math. Pulling numbers from thin air signals you have not thought about the load.

Peak concurrent live viewers

65.2M

JioHotstar's verified record at the 2026 T20 World Cup semi-final; the 2023 ODI World Cup final hit 59M and the platform repeatedly cleared 50M+ for marquee India matches.

Surge rate at a key moment

~500K new concurrent / minute

When a wicket falls or the last over starts, millions open the app in a correlated burst. Hotstar publicly cited concurrency growth of ~500K per minute, which is the spike reactive autoscaling cannot keep up with.

Peak egress bandwidth

~5+ Tbps

At ~10M concurrent, Hotstar measured ~5.7 Tbps peak. At 65M with adaptive bitrate, sustained multi-CDN egress is on the order of tens of terabits per second, the reason multi-CDN is non-negotiable.

Origin requests after caching

~10% of edge requests

At a 90% cache hit ratio, ~65M apparent viewers translate to only ~6.5M of requests reaching origin/shield, which is what makes the origin survivable. The other 90% are served from CDN edge.

Peak compute footprint

~8,000 vCPU / 16 TB RAM, 800+ microservices on EKS

Hotstar's published peak control/API plane: thousands of cores and 16 TB RAM across 800+ microservices on Amazon EKS. This is the API/control plane, video bytes themselves never touch it; they flow CDN→client.

Server warm-up time

~75-90s

A newly provisioned instance takes ~90s to boot and ~75s for the container/app to become healthy. This single number is why pre-warming and predictive scaling exist, you cannot react to a 60-second spike with 90-second servers.

Adaptive bitrate renditions

6 ladder rungs

4K, 1080p, 720p, 480p, 360p, and a low-bandwidth rung, encoded once per live stream so a 5G phone and a 2G connection are served from identical infrastructure.

High-level architecture

Start by separating the two completely different planes, because conflating them is the most common interview mistake. The **data plane** moves video bytes; the **control plane** moves everything else (auth, entitlements, home page, scorecard, social). Video bytes never flow through your application servers. A live camera feed goes into an ingest + transcoder that produces a 6-rung adaptive bitrate ladder, chops each rung into short segments (2-6s for standard HLS/DASH, sub-second with Low-Latency HLS), writes a manifest (the playlist of segment URLs), and pushes segments to origin. Players fetch the manifest, then pull segments, and crucially they pull them through a CDN, not through you. With a 90%+ edge cache hit ratio, 65M concurrent players hammering for the same live segment collapse into a handful of origin pulls per segment. The control plane is a fleet of 800+ microservices on Kubernetes (Amazon EKS) fronted by a centralized gateway. The playback-start request flow is: client → gateway → auth/entitlement check (is this user subscribed, is this content licensed in their region) → playback service returns a signed manifest URL pointing at the multi-CDN. From that point on, the client talks only to CDNs. So the control plane handles a burst of *session starts* (heavy, correlated, and where the surge actually lands), while the data plane handles a continuous river of segment fetches (absorbed almost entirely at the edge). The defining challenge is the shape of demand. A cricket match is not steady traffic, it is a step function. A wicket falls, an out-of-app push notification fires, and ~500K people per minute open the app and hit the playback-start path in the same window. Reactive autoscaling cannot save you here because a server takes ~90s to become healthy and the spike is over in 60. So the real architecture is **predictive**: forecast the peak from historical match data, **pre-warm** capacity and load balancers ahead of the toss, scale up in a *ladder* keyed to projected concurrency (Hotstar provisions per pre-defined "million-concurrent" rung and keeps a ~2M-concurrent buffer ready), and treat reactive autoscaling only as a fast top-up. When even that is not enough, the system enters **panic mode**: the server tells clients it is overloaded, clients exponentially back off with jitter, and non-essential features (recommendations, social, fancy home page) are shed to protect the one thing that must never break, the play button. In a real interview, sketch this on the whiteboard before diving into any single box.

In a real interview, sketch this on the whiteboard before diving into any single box.

Core components

Walk through each service. The interviewer wants to hear what each one owns, not just the names.

Ingest + Live Transcoder (ABR ladder)

Takes the raw camera/contribution feed and produces 6 renditions (4K→low-bandwidth) simultaneously, segments each rung into short chunks, and emits HLS/DASH manifests. This is the only component that produces the actual video. Encoding once per stream, not per viewer, is what lets one pipeline serve 65M people across 5G and 2G.

Origin + Origin Shield

Stores live segments and manifests. The origin shield is a mid-tier cache layer between the many CDN edges and the true origin: when 10 CDN POPs all miss on a fresh segment, they fetch from the shield, and only the shield fetches from origin, collapsing N edge misses into 1 origin pull. This is the single most important origin-protection trick for live (a brand-new segment is, by definition, a cache miss everywhere at once).

Multi-CDN + CDN Load Optimizer

Traffic is spread across Akamai, CloudFront, Cloudflare and others. An in-house load optimizer routes each client to the best CDN by real-time health, cost, and per-region performance, and fails over automatically if one CDN degrades mid-final. No single CDN can carry tens of Tbps reliably, and no single CDN should be a single point of failure during the one event that matters most.

Playback / Entitlement Service

The control-plane gate. On stream start it verifies subscription status, applies content licensing and geo-restrictions, and returns a short-lived signed manifest URL. This is where the session-start surge concentrates, so it is the service you pre-warm most aggressively and the first place you apply admission control.

API Gateway (centralized Envoy)

A single Envoy-based gateway replaced hundreds of per-service load balancers. It also splits cacheable APIs (scorecard, match summary, highlights, pushed to a CDN domain) from non-cacheable APIs (sessions, personalization), so heavy, repetitive read traffic is served from the edge and never touches the gateway compute budget.

Predictive Autoscaler + Pre-warmer

A custom autoscaler that scales on actual concurrency metrics rather than CPU/memory lag, spinning pods in ~30s instead of waiting on the 60-90s CPU-signal delay. It is paired with ladder-based pre-provisioning (capacity per projected million concurrent) and a standing ~2M buffer, plus pre-warmed load balancers before the toss.

Panic Mode + Client Backoff Controller

When backend saturation is detected, the server signals 'panic' to clients, which then increase the interval between polling/heartbeat requests using exponential backoff with jitter. This flattens the thundering-herd retry storm and is a deliberate load-shedding lever, not an error state.

Concurrency / Live Stats Pipeline

A streaming aggregation pipeline that counts live concurrent viewers in near-real-time and feeds the autoscaler and ops dashboards (Hotstar's 'Infradashboard'). It is also what powers the on-screen concurrency counter. It must be approximate-but-fast, not exact, counting 65M sessions precisely in real time is wasteful.

Load Testing Harness (Project Hulk-style)

An in-house framework that simulates full user journeys and entire traffic patterns (including ML-modeled spike shapes), runs tsunami/chaos tests, and validates that pre-warming and panic mode actually hold. You cannot discover your real ceiling for the first time during the World Cup final.

Data model

Pick the right store per table. Justify each choice with the access pattern, not by reflex.

content

content_idtype (live|vod)titlelanguagelicense_regions[]drm_policystatus (scheduled|live|ended)

One row per asset. license_regions and drm_policy drive the geo/entitlement gate; live vs vod changes the caching TTL story entirely (live segments get very short TTLs, vod gets long ones).

live_stream

stream_idcontent_idingest_endpointabr_ladder (jsonb: rungs + bitrates)manifest_urlorigin_urlcdn_pool[]

The live-specific config: which bitrate rungs are encoded and which CDNs are in the active pool for this event. cdn_pool is mutated live by the CDN optimizer during failover.

playback_session

session_iduser_idcontent_idcdn_assignedstart_tsheartbeat_tsbitrate_currentdevice_type

Created at stream start, updated by client heartbeats. The source of truth for concurrency counting and the surge signal. Stored in a fast store (not a relational primary) and rolled up; you never JOIN against 65M live rows transactionally, you aggregate a stream of them.

entitlement

user_idplan (free|super|premium)valid_untilallowed_max_streamsregion

Heavily cached (Redis) and checked on every playback start. Region here is enforced against content.license_regions. allowed_max_streams enforces concurrent-device limits, itself a load and licensing control.

concurrency_rollup

content_idwindow_tsconcurrent_countcdn_split (jsonb)region_split (jsonb)

Pre-aggregated, time-bucketed concurrency derived from the heartbeat stream. Feeds the autoscaler ladder, the Infradashboard, and the on-screen counter. Approximate and append-only by design.

segment_cache_meta

stream_idsegment_seqrenditionttlcache_key

Conceptually how the CDN/origin-shield key live segments. Short, sequential segment names + short TTLs make a fresh live segment a coordinated miss across all edges, which is exactly why origin shield exists.

Deep dives

These are the conversations the interviewer is steering you toward. Practice each one until you can talk through it without notes.

Why a cricket wicket is the hardest load-test in the world

Normal apps see traffic that rises and falls smoothly; you can autoscale into it. A wicket is a correlated step function: the event happens, a push notification fires to tens of millions of devices, and within ~60 seconds ~500K *new* concurrent viewers per minute slam the playback-start path. The killer is the timing mismatch, a new server needs ~90s to be healthy, but the spike resolves in ~60s. By the time reactive autoscaling provisions capacity, the surge is already over (or has already caused errors). This is why the entire design is built around having capacity *before* demand, not provisioning *in response* to it: forecast the peak from historical match data, pre-warm to the projected number, keep a ~2M-concurrent buffer hot, and use a custom autoscaler that reacts to concurrency metrics (which lead) instead of CPU (which lags by 60-90s).

Origin shielding: collapsing the thundering herd on a fresh live segment

For VOD, popular segments are cached everywhere and origin is sleepy. Live is the opposite: every 2-6 seconds a brand-new segment is published, and at that instant it is a cache MISS at every single CDN edge simultaneously. Without protection, every CDN POP independently pulls the new segment from origin, for every rendition, the moment it appears. That is a synchronized stampede on origin, repeating every few seconds for hours. The fix is a two-tier cache: an origin shield (a designated mid-tier cache) sits between the edges and true origin. All edge misses for a given segment are funneled to the shield; the shield does request collapsing so only ONE request per segment reaches the actual origin, and fans the response back out to all edges. This is what turns 65M viewers into a single-digit number of origin reads per segment, and it is the detail strong candidates volunteer that weak ones miss.

Adaptive bitrate: encode once, serve 5G and 2G from the same pipeline

India's network reality spans 5G fiber and 2G rural links, often in the same match. You cannot personalize a stream per viewer, that would mean 65M encodes. Instead you encode the live feed once into a ladder of ~6 renditions (4K/1080p/720p/480p/360p/low), segment each, and list them in the manifest. The *client* player measures its own throughput each segment and decides which rung to request next: download was fast and buffer is full → step up a rung; download stalled → step down before the buffer empties and the user sees a spinner. All the adaptation logic lives on the device, the server just publishes every rung. Discuss the trade between segment length (shorter = lower latency and faster quality adaptation, but more requests and overhead) and the start-up vs latency target (sub-3s start-up, with Low-Latency HLS pushing live latency down toward a few seconds at the cost of more CDN/tuning complexity).

Graceful degradation and panic mode: protecting the play button

At the ragged edge of capacity you do not let the system fail uniformly, you fail it *selectively*. The playback path is sacred; recommendations, the rich personalized home page, watch-along social, and analytics are not. When the backend detects saturation it enters panic mode: it signals overload to clients, which respond by backing off exponentially (with jitter to avoid synchronized retries), and the platform sheds non-essential services to reclaim their CPU and DB capacity for the critical path. A user in panic mode might get a plainer home screen and no live reactions, but the match keeps playing. Framing this as a *designed* behavior with explicit tiers of importance (must-never-break vs nice-to-have) is exactly the senior signal interviewers look for. Pair it with circuit breakers on inter-service calls so a slow recommendation service can't drag down playback.

Multi-CDN: redundancy and economics, not just performance

No single CDN reliably carries the tens of terabits per second a 65M-concurrent final demands, and depending on one CDN means a single vendor incident can black out the biggest event of the year. So traffic is split across Akamai, CloudFront, Cloudflare, etc., with an in-house optimizer assigning each client a CDN by live health, regional performance, and cost. If one CDN's error rate or latency spikes mid-match, the optimizer reroutes new sessions (and can migrate existing ones) away from it. There's also a cost lever: CDN pricing varies by region and commit, so the optimizer can prefer cheaper capacity when performance is equal. The trade-off to name: multi-CDN adds real operational complexity, you now manage cache consistency, token/signed-URL auth, and warming across multiple vendors, but for a flagship live event the redundancy is non-negotiable.

Splitting cacheable from non-cacheable to free up the control plane

A huge amount of in-match API traffic is the same for everyone: the live scorecard, match summary, highlights list. If those hit your gateway and microservices, you're spending precious control-plane CPU re-serving identical data to millions. Hotstar's move was to route cacheable APIs to a dedicated CDN domain with lighter security checks (no per-user validation needed for a public scorecard) so the edge serves them, while non-cacheable, per-user APIs (sessions, personalization, entitlement) go through the full gateway. This is the same edge-offload principle as video, applied to JSON: identify what's identical-across-users and push it to the CDN so your origin compute is reserved for user-specific work. It directly buys back capacity for the session-start surge.

Trade-offs to discuss

Every senior interviewer expects you to surface at least 3 of these. Pick the decisions, state the alternatives, and justify your choice.

Predictive pre-warming + ladder scaling instead of pure reactive autoscaling

Reactive autoscaling reacts on a ~90s server-boot delay, but the wicket surge lands in ~60s, so reactive alone guarantees errors at the worst moment. Pre-warming to a forecasted peak and provisioning per projected-million-concurrent ladder costs money (you pay for idle headroom and a ~2M buffer) but it's the only thing that's actually ahead of the spike. The cost is justified because the events are pre-scheduled and predictable; you know the toss time.

Multi-CDN over a single best-in-class CDN

A single CDN is simpler (one set of cache keys, tokens, warming) and may even be cheaper per GB at commit. But it caps your peak bandwidth and makes one vendor a single point of failure for your marquee event. For a 65M final, redundancy and aggregate capacity outweigh the operational simplicity of one vendor.

Approximate near-real-time concurrency instead of exact counts

Exactly counting 65M live sessions in real time is wasteful and slow. The autoscaler and the on-screen counter need a number that's fresh and directionally correct, not perfectly precise. Trading exactness for latency lets the concurrency pipeline drive scaling decisions fast enough to matter.

Graceful degradation (shed features) instead of uniform failure or hard rejection

When saturated, you can (a) fail everything, (b) reject new users at the door, or (c) keep everyone watching but strip non-essential features. Option (c) protects the core product experience, people came for the match, not the recommendations, at the cost of a degraded secondary experience and the engineering effort to tier every feature by importance up front.

Standard HLS/DASH segment latency vs Low-Latency HLS

Standard 2-6s segments give rock-solid CDN cacheability and simple operations but put live ~15-30s behind real time. LL-HLS (sub-second chunks) cuts that toward a few seconds, better for a live wicket reaction and to beat the neighbor's TV, but multiplies request volume and demands tighter CDN/player tuning. You pick per event how much latency you'll trade for cache efficiency and stability.

Encode-once ABR ladder vs per-user/just-in-time encoding

One fixed ladder serves everyone and keeps the encode count constant regardless of audience size, essential at 65M. The cost is you serve a fixed set of quality rungs rather than perfectly optimal per-device renditions, and you pay to encode rungs even if few viewers use the top one. At this concurrency, constant encode cost beats per-viewer optimization every time.

How JioHotstar actually does it

The numbers here are from Hotstar/JioHotstar's own public engineering talks (Rootconf), the platform's published figures, and reputable engineering writeups, not invented. Verified anchors: 59M concurrent at the 2023 ODI World Cup final and 65.2M at the 2026 T20 World Cup semi-final (the current world record for live-event concurrency); ~10.3M concurrent at the 2019 VIVO IPL final with ~5.7 Tbps peak egress and ~1M requests/sec; ~500K concurrency growth per minute during surges; ~90s server warm-up and ~75s container/app startup (the root cause of the pre-warming strategy); 800+ microservices on Amazon EKS at peak with ~8,000 vCPU and 16 TB RAM; ~90% CDN cache hit ratio; multi-CDN across Akamai/CloudFront/Cloudflare with an in-house optimizer; panic mode with client-side exponential backoff; and Project Hulk, their in-house load/chaos testing framework that simulates full traffic patterns with ML. Where exact internal figures aren't public (e.g., the precise origin-shield request-collapse ratio, current LL-HLS latency targets), the page reasons from the published cache-hit ratio and standard HLS/DASH behavior and flags those as estimates. The compute/RAM figures describe the API/control plane only, the video bytes themselves are served by CDNs and never touch those servers, which is the central architectural point.

Sources

Lessons to study before this interview

If any of these topics are fuzzy, the interviewer will catch it. Each lesson is 15 to 60 minutes with diagrams, code, and a quiz.

Design a Content Delivery Network

capstone / capstone

Content Delivery Network (CDN)

foundation / load balancing proxies

Load Balancing

foundation / core fundamentals

Cache Warming

foundation / caching strategies

Circuit Breaker for Resilience

advanced / reliability resilience

Backpressure for Resilience

advanced / reliability resilience

Global Server Load Balancing (GSLB)

foundation / load balancing proxies

Frequently asked questions

Practice with 766 system design lessons

Lifetime access for INR 499 or $7.99. Interactive diagrams, runnable code, quizzes, and 20 capstone projects including Design JioHotstar.

JioHotstar (Hotstar) System Design Interview: 65M Viewers

JioHotstar streamed the 2026 T20 World Cup semi-final to 65.2 million concurrent viewers, the highest concurrency ever recorded for any live event on any digital platform.

Where it shows up

Why this question is asked

Requirements

Always clarify these in the first 5 minutes of the interview. Do not start drawing boxes until both lists are agreed.

Functional requirements

Stream live events (cricket, shows) and video-on-demand to web, Android, iOS, and TV apps
Adaptive bitrate playback: the player picks a quality (4K/1080p/720p/480p/360p/low) based on live measured bandwidth and switches mid-stream without re-buffering
Low glass-to-glass latency for live, the goal is the streamed moment landing close to the real event (sub-30s typical for HLS/DASH; lower with LL-HLS), and fast sub-3s start-up time
Authentication, subscription/entitlement checks, and content/geo licensing enforcement before a stream is served
Live engagement features: scorecard, watch-along, reactions, and concurrency counter
Personalized home page, search, and continue-watching for VOD
Handle predictable mega-events (IPL, World Cup finals) with scheduled, pre-announced peaks

Non-functional requirements

Survive 50-65M+ concurrent live viewers with no full outage; partial degradation is acceptable, a black screen on the match is not
Absorb a surge of ~500K new concurrent viewers per minute when a key moment (wicket, last over) hits
Keep origin load flat regardless of audience size, target 90%+ CDN cache hit ratio so origin sees a tiny fraction of edge requests
High availability for the playback path (the play button must work even when recommendations, social, and analytics are down)
Multi-CDN redundancy so a single CDN outage during a final does not take down playback
Cost efficiency at peak: peak is ~10-50x the daily baseline, so the architecture must scale up for a 3-hour window and scale back down
Global edge delivery but India-first network reality: serve a 5G phone in Bengaluru and a 2G connection in a village from the same pipeline

Back-of-envelope scale estimates

Show your math. Pulling numbers from thin air signals you have not thought about the load.

Peak concurrent live viewers

65.2M

JioHotstar's verified record at the 2026 T20 World Cup semi-final; the 2023 ODI World Cup final hit 59M and the platform repeatedly cleared 50M+ for marquee India matches.

Surge rate at a key moment

~500K new concurrent / minute

Peak egress bandwidth

~5+ Tbps

At ~10M concurrent, Hotstar measured ~5.7 Tbps peak. At 65M with adaptive bitrate, sustained multi-CDN egress is on the order of tens of terabits per second, the reason multi-CDN is non-negotiable.

Origin requests after caching

~10% of edge requests

At a 90% cache hit ratio, ~65M apparent viewers translate to only ~6.5M of requests reaching origin/shield, which is what makes the origin survivable. The other 90% are served from CDN edge.

Peak compute footprint

~8,000 vCPU / 16 TB RAM, 800+ microservices on EKS

Server warm-up time

~75-90s

Adaptive bitrate renditions

6 ladder rungs

4K, 1080p, 720p, 480p, 360p, and a low-bandwidth rung, encoded once per live stream so a 5G phone and a 2G connection are served from identical infrastructure.

How JioHotstar actually does it

Frequently asked questions

JioHotstar (Hotstar) System Design Interview: 65M Viewers

Why this question is asked

Requirements

Functional requirements

Non-functional requirements

Back-of-envelope scale estimates

High-level architecture

Core components

Ingest + Live Transcoder (ABR ladder)

Origin + Origin Shield

Multi-CDN + CDN Load Optimizer

Playback / Entitlement Service

API Gateway (centralized Envoy)

Predictive Autoscaler + Pre-warmer

Panic Mode + Client Backoff Controller

Concurrency / Live Stats Pipeline

Load Testing Harness (Project Hulk-style)

Data model

Deep dives

Why a cricket wicket is the hardest load-test in the world

Origin shielding: collapsing the thundering herd on a fresh live segment

Adaptive bitrate: encode once, serve 5G and 2G from the same pipeline

Graceful degradation and panic mode: protecting the play button

Multi-CDN: redundancy and economics, not just performance

Splitting cacheable from non-cacheable to free up the control plane

Trade-offs to discuss

Predictive pre-warming + ladder scaling instead of pure reactive autoscaling

Multi-CDN over a single best-in-class CDN

Approximate near-real-time concurrency instead of exact counts

Graceful degradation (shed features) instead of uniform failure or hard rejection

Standard HLS/DASH segment latency vs Low-Latency HLS

Encode-once ABR ladder vs per-user/just-in-time encoding

How JioHotstar actually does it

Lessons to study before this interview

Related system design interview questions

Frequently asked questions

Does the live video actually flow through Hotstar's servers?

Why can't they just use autoscaling to handle the cricket spike?

What is origin shielding and why does live streaming need it more than VOD?

How does one stream serve both a 5G phone and a 2G connection?

What is 'panic mode' and is it a failure?

Why multi-CDN instead of just using the best single CDN?

How do they count 65M concurrent viewers in real time without melting a database?

Practice with 766 system design lessons

JioHotstar (Hotstar) System Design Interview: 65M Viewers

Why this question is asked

Requirements

Functional requirements

Non-functional requirements

Back-of-envelope scale estimates

High-level architecture

Core components

Ingest + Live Transcoder (ABR ladder)

Origin + Origin Shield

Multi-CDN + CDN Load Optimizer

Playback / Entitlement Service

API Gateway (centralized Envoy)

Predictive Autoscaler + Pre-warmer

Panic Mode + Client Backoff Controller

Concurrency / Live Stats Pipeline

Load Testing Harness (Project Hulk-style)

Data model

Deep dives

Why a cricket wicket is the hardest load-test in the world

Origin shielding: collapsing the thundering herd on a fresh live segment

Adaptive bitrate: encode once, serve 5G and 2G from the same pipeline

Graceful degradation and panic mode: protecting the play button

Multi-CDN: redundancy and economics, not just performance

Splitting cacheable from non-cacheable to free up the control plane

Trade-offs to discuss

Predictive pre-warming + ladder scaling instead of pure reactive autoscaling

Multi-CDN over a single best-in-class CDN

Approximate near-real-time concurrency instead of exact counts

Graceful degradation (shed features) instead of uniform failure or hard rejection

Standard HLS/DASH segment latency vs Low-Latency HLS

Encode-once ABR ladder vs per-user/just-in-time encoding

How JioHotstar actually does it

Lessons to study before this interview

Related system design interview questions

Frequently asked questions