Design Netflix: System Design Interview Guide
Netflix streams 250+ million hours of video per day to 270+ million subscribers, with peak traffic at 15% of global internet bandwidth.
Designing Netflix forces you to think about video encoding pipelines, multi-CDN delivery with adaptive bitrate, a recommendation system that drives 80% of watched content, and a microservices architecture that has to stay up while half a million users press play in the same minute.
Asked at: Commonly asked at Meta, Google, Amazon, Netflix, Disney+, Hulu, and YouTube. It is the canonical video streaming system design problem.
Why this question is asked
Design Netflix tests whether you understand video pipelines (encoding, packaging, DRM), CDN economics, low-latency streaming protocols (HLS, DASH), and personalization at scale. It is also one of the few problems where you should bring up cost as an explicit constraint, since bandwidth dominates.
Requirements
Always clarify these in the first 5 minutes of the interview. Do not start drawing boxes until both lists are agreed.
Functional requirements
- Users browse a catalog with personalized rows
- Users play any title and resume from where they left off
- Video adapts bitrate to the user's bandwidth in real time
- Users get personalized recommendations on the home screen
- Subtitles, audio tracks, and multiple languages are supported
- Users can download titles for offline viewing on mobile
- Account sharing limits and household detection
Non-functional requirements
- Start-up latency under 2 seconds at the 95th percentile
- Buffering ratio below 0.5% of watch time
- 99.99% availability for the playback service
- Global delivery from the nearest edge with no manual region selection
- Cost per stream optimized through CDN tiering and codec selection
- DRM enforced so content is not trivially extractable
Back-of-envelope scale estimates
Show your math. Pulling numbers from thin air signals you have not thought about the load.
Subscribers
270M
Public Q3 2024 reporting. Assume 1.5 profiles per account on average.
Concurrent streams (peak)
70M
Roughly 25% of subscribers stream at the same time during global peak events.
Average bitrate per stream
4 Mbps
Weighted across SD, HD, and 4K. 4K is 15 Mbps, HD is 5 Mbps, SD is 1 Mbps, with most viewing in HD.
Peak egress bandwidth
280 Tbps
70M concurrent streams times 4 Mbps. This is why CDN placement matters. Netflix runs Open Connect inside ISPs to avoid paying transit on this volume.
Catalog storage per encoded title
1 to 6 TB
Each title is encoded into 100+ variants (codec, resolution, bitrate, HDR, audio language). A two-hour movie is 1 TB after encoding; a series season is 3 to 6 TB.
High-level architecture
The control plane (catalog, recommendations, user profile, billing) runs as microservices on AWS, fronted by an API Gateway and Zuul. The data plane (actual video bytes) is served from Open Connect, Netflix's own CDN, with appliances embedded inside ISP networks. When a user presses play, the client asks a Playback API for a manifest. The manifest lists available bitrates and the URLs of the nearest Open Connect appliances. The client uses adaptive bitrate (HLS or DASH) to switch quality based on observed throughput. Personalization (rows on the home page, ranking within rows) is precomputed offline by ML jobs in Spark and pushed to a low-latency Cassandra cluster fronted by EVCache for reads. Real-time signals (last watched, bookmark position) are written to a Cassandra column family with low consistency requirements.
In a real interview, sketch this on the whiteboard before diving into any single box.
Core components
Walk through each service. The interviewer wants to hear what each one owns, not just the names.
Encoding Pipeline
Ingests source masters, then encodes each title into 100+ variants (codecs: H.264, HEVC, AV1; resolutions: 240p to 4K; HDR profiles; audio tracks per language). Runs as a fleet of GPU and CPU jobs on AWS. The output is packaged into HLS and DASH manifests.
Open Connect (CDN)
Netflix's own CDN, with appliances deployed inside ISP networks. Each appliance caches the most-watched titles for its region. When a user requests a stream, the playback service returns URLs of the nearest appliances. This avoids paying transit on hundreds of terabits.
Playback API
Issues a signed manifest to the client at start. The manifest lists bitrate variants and CDN URLs. Also enforces DRM by issuing a license to authorized clients.
Catalog and Metadata Service
Owns title metadata: titles, descriptions, cast, genres, artwork URLs, parental ratings. Backed by Cassandra. Reads are heavily cached in EVCache because catalog churn is slow.
Recommendation Service
Precomputes ranked rows per user using collaborative filtering and a deep learning model. The top N rows per user are stored in Cassandra. Real-time refinements (just-watched signal, time-of-day re-ranking) happen in a thin online service.
Bookmark and Watch History Service
Writes the current playback position every 5 seconds during a stream. Backed by Cassandra with low consistency since a slightly stale bookmark is harmless. Reads on resume.
DRM and License Service
Issues Widevine, PlayReady, or FairPlay licenses to authenticated clients. Keys are per-title and per-device, with rotation policies.
Data model
Pick the right store per table. Justify each choice with the access pattern, not by reflex.
titlestitle_id (PK)namedescriptionrelease_yearduration_secondsratinggenresartwork_urlCassandra. Heavily cached. Updated by editorial workflows, not user actions.
title_variantsvariant_id (PK)title_id (FK)codecresolutionbitrate_kbpshdr_profilemanifest_urlOne row per encoded variant. The Playback API picks the right rows based on client capabilities and constructs the manifest.
viewing_historyuser_id (PK partition)title_id (clustering)last_position_secondscompletedupdated_atCassandra. Partitioned by user_id so the resume query is one partition read. Eventually consistent.
user_recommendationsuser_id (PK)row_indextitle_ids[]row_labelgenerated_atPrecomputed by Spark jobs nightly, then patched online by recent activity. Read by the home screen on every load.
Deep dives
These are the conversations the interviewer is steering you toward. Practice each one until you can talk through it without notes.
Adaptive bitrate streaming and the manifest
When a user presses play, the client downloads a manifest (HLS .m3u8 or DASH .mpd) that lists every bitrate variant available for that title. The client estimates its current bandwidth and picks a variant. It downloads short segments (typically 4 to 10 seconds each) and re-evaluates after every segment. If bandwidth drops, it steps down. If buffer health is good and bandwidth is high, it steps up. The encoding pipeline produces a ladder of variants (240p at 300 kbps, 360p at 600 kbps, 480p at 1.2 Mbps, 720p at 2.5 Mbps, 1080p at 5 Mbps, 4K at 15 Mbps) so the client always has a sane choice. The clever part is the manifest: it embeds CDN URLs of the nearest Open Connect appliances, so the client never has to do a separate CDN lookup.
Why Netflix runs its own CDN (Open Connect)
At 280 Tbps of peak egress, paying a third-party CDN like Akamai or Cloudflare would cost hundreds of millions per year. Netflix built Open Connect: rack-mounted appliances loaded with the most-watched 95% of the catalog, deployed for free inside ISP networks. The ISP saves on transit (they would have paid for that bandwidth anyway). Netflix saves on CDN fees. The user gets lower latency because the bytes are now in their ISP's local POP. The remaining 5% of long-tail catalog is served from Netflix's central S3-backed origin.
Precomputed vs real-time recommendations
Most recommendation rows on the home screen are precomputed once a day by Spark jobs running on AWS. The output (top N titles per row per user) is written to Cassandra. This works because user preferences shift slowly. The exception is the Continue Watching row and the Because You Watched X row: these need to respond to events from the last few minutes. A small online service reads the precomputed base rows and patches in fresh signals from the bookmark service before returning to the client. The split keeps the heavy ML offline (cheap, GPU-batched) and the online layer thin (latency-bounded).
Handling a Thursday Stranger Things drop
When a new season drops at midnight Pacific, you get a coordinated traffic spike: 10x the normal Thursday load in 30 minutes. The encoding pipeline preheats the CDN by replicating all variants to every Open Connect appliance hours before the drop. The Playback API auto-scales based on RPS. The recommendation service is cache-warmed with the new title pre-injected into Trending Now rows. The catch is preventing thundering-herd login traffic at midnight: this is handled by JWT tokens that are still valid from earlier in the day, so most clients hit the API gateway with a cached session.
Trade-offs to discuss
Every senior interviewer expects you to surface at least 3 of these. Pick the decisions, state the alternatives, and justify your choice.
Own CDN vs third-party CDN
Owning is cheaper at extreme scale and gives you tighter ISP partnerships. The cost is the engineering investment. Below ~50 Tbps, just use Cloudflare or Akamai. Netflix is past that threshold by 5x.
Cassandra vs SQL for watch history
Cassandra wins on write throughput and easy horizontal scaling. The cost is no joins and weak consistency. For watch history, both are acceptable: a slightly stale bookmark is harmless, and the only query is by user_id.
Precomputed recommendations vs real-time
Precomputed is cheap, batch-friendly, and lets you run heavy ML models offline. The cost is staleness. Real-time is the opposite. Netflix splits the problem: precompute the rows, patch in real-time signals just before serving.
HLS vs DASH
HLS is mandatory on iOS. DASH has better codec flexibility (AV1, HEVC HDR) and a cleaner manifest format. Netflix delivers both. The client picks based on platform.
Encode every codec or pick one and migrate
Encoding into AV1, HEVC, and H.264 means 3x the storage and 3x the encoding cost. But AV1 saves ~30% bandwidth versus H.264, which dwarfs the storage cost at Netflix's egress volume. The math forces you to encode everything.
How Netflix actually does it
Netflix runs Open Connect, its own CDN, with appliances inside thousands of ISPs. Encoding uses a homegrown pipeline that runs on AWS and produces hundreds of variants per title. Personalization is built on a combination of matrix factorization, deep learning, and contextual bandits. The microservices run on Spring Boot in Java with Hystrix for fault tolerance (now replaced by Resilience4j). Service discovery uses Eureka. The main data store for user state is Cassandra, fronted by EVCache (a Netflix fork of Memcached).
Lessons to study before this interview
If any of these topics are fuzzy, the interviewer will catch it. Each lesson is 15 to 60 minutes with diagrams, code, and a quiz.