Is this a video course?

No. This is an interactive, slide-based learning platform. Each lesson has rich text, animated diagrams, live code editors, and quizzes. You learn by reading, interacting, and doing, not by watching videos passively.

How long do I have access?

Forever. Both pricing tiers are one-time payments with lifetime access. This includes all current 766 lessons and any future content we add.

What level of experience do I need?

None. We start from absolute basics like 'What is latency?' and build up to distributed consensus protocols. The Foundation level assumes zero prior knowledge of system design.

How much does the system design course cost?

7.99 US dollars for lifetime access globally, or 499 Indian rupees for lifetime access in India. One-time payment, no subscription, no hidden fees. 11 lessons are free with no signup required.

What technologies are covered?

Everything from DNS and load balancers to Kubernetes, Kafka, distributed databases, consensus protocols, stream processing, security architecture, and observability. We cover principles and real-world implementations used at Netflix, Google, Amazon, Uber, Stripe, and more.

Is this useful for system design interview preparation?

Yes. The lessons are structured around the exact topics asked in system design interviews at FAANG and top-tier companies. Interactive diagrams help you practice whiteboard-style explanations. Covers everything from URL shortener design to distributed payment systems.

How is this different from ByteByteGo or Educative?

766 interactive lessons (4x more than most competitors), 16 different diagram types that build step by step, real production examples from Netflix, Google, Amazon, Uber, and Stripe, and lifetime access for a one-time payment of $7.99 instead of annual subscriptions costing 100 to 200 dollars per year.

How does YouTube transcode so many uploads?

Distributed chunked encoding. The master is split into 30-second segments at I-frame boundaries, segments are encoded in parallel across a worker fleet (CPU and GPU), and the results are concatenated. This turns a multi-hour serial encode into a minutes-long parallel job.

Why use a two-stage recommendation system?

Scoring every video against every user is computationally infeasible. The candidate generator narrows the catalog to a few hundred videos per user. The ranker scores only those candidates with a deeper model. This pattern is used by nearly every large recommendation system.

How does YouTube serve 1 billion hours per day?

Google's global edge network plus predictive cache warming. Popular videos are pushed to POPs near where they will be watched. Clients adapt bitrate based on bandwidth, so the same video plays smoothly on 4G or fiber.

What is the data model for comments?

Comments are sharded by video_id so an entire thread lives on one shard. Parent and reply comments are linked via parent_comment_id. Counters (likes, replies) are maintained by a stream processor for eventual consistency.

How does YouTube handle DMCA takedowns?

A flag on the videos row (status = removed) hides the video from playback and search. The encoded variants are kept (for appeal) but origin and CDN are evicted. Content ID, a fingerprinting system, also blocks future re-uploads of the same content.

System design interview guide

Design YouTube: System Design Interview Guide

YouTube serves 1 billion hours of video per day across 2.5 billion users, with 500 hours of new content uploaded every minute.

Designing YouTube combines a giant video upload pipeline, multi-resolution encoding, CDN delivery, a massive recommendation system, and a comments and engagement layer. The hardest piece is the upload-to-playback pipeline: how a 4K video uploaded in Mumbai is playable in São Paulo within minutes.

Where it shows up

Commonly asked at Google (YouTube), Meta, Amazon Prime Video, Netflix, Disney+, and TikTok. Often paired with Design Netflix to compare user-generated vs licensed content.

Why this question is asked

Design YouTube tests whether you understand asynchronous pipelines (upload, encode, package), CDN scaling for billions of streams, recommendation systems trained on watch behavior, and a comments system that has to handle moderation. The volume (500 hours uploaded per minute) forces you to make every step batchable or async.

Requirements

Always clarify these in the first 5 minutes of the interview. Do not start drawing boxes until both lists are agreed.

Functional requirements

Users upload videos of any length and resolution
Videos are transcoded to multiple resolutions for adaptive bitrate streaming
Users browse, search, and watch videos
Personalized recommendations on home and watch-next
Users like, dislike, comment, and subscribe
Live streaming as a separate but related feature
Monetization (ads inserted dynamically)

Non-functional requirements

Upload-to-playback under 10 minutes for HD, under 30 minutes for 4K
Playback start latency under 2 seconds
99.99% availability for playback
Global delivery with sub-second buffer fill from the nearest edge
Cost-efficient storage tiering (cold videos move to cheaper tiers)
DMCA takedown within hours of report

Back-of-envelope scale estimates

Show your math. Pulling numbers from thin air signals you have not thought about the load.

Total users

2.5B

Public Google reporting. Assume 1.5 average logged-in profile and many anonymous viewers.

Hours watched per day

Public reporting. Average session ~40 minutes per active user.

Hours uploaded per minute

500

Public reporting. That is 720,000 hours of new content per day, every day.

Concurrent streams at peak

100M

Average concurrency works out to 40M, with peak factor of 2.5x at global evening windows.

Storage growth per year

10 to 30 EB

500 hours per minute times 1 GB per hour HD baseline times 5x for multi-resolution encoding times 365 days.

High-level architecture

Upload flow: Client uploads to an Upload Service via resumable HTTP (tus or YouTube's own protocol). The Upload Service writes the raw file to a regional object store and emits an UploadComplete event. An Encoding Pipeline picks up the event and runs a fleet of jobs that transcode the master into multiple resolutions and bitrates (240p to 4K), package into HLS and DASH manifests, and write each variant to the object store. Once at least the 360p variant is ready, the video is marked playable and indexed by Search. The CDN (Google's own edge network) pre-warms popular videos to edge POPs. Playback: Client requests a manifest from the Playback API, which returns CDN URLs and adapts. Recommendations: a separate offline pipeline (TensorFlow on TPUs) trains and serves personalized rankings, fronted by a low-latency online layer.

In a real interview, sketch this on the whiteboard before diving into any single box.

Core components

Walk through each service. The interviewer wants to hear what each one owns, not just the names.

Upload Service

Resumable upload endpoint. Handles flaky mobile connections with byte-range resume. Writes raw master to regional Google Cloud Storage. Emits an UploadComplete event on Pub/Sub.

Encoding Pipeline

Fleet of jobs that consume UploadComplete and produce 10 to 30 encoded variants per video. Uses MapReduce-style chunked encoding: split the master into 30-second segments, encode in parallel, concatenate. Modern variants use AV1 for bandwidth savings.

Video Metadata Service

Stores video metadata (title, description, channel, tags, upload time). Backed by Spanner or a sharded SQL system. Read-heavy; aggressively cached.

Playback API

Issues signed manifests on play. Includes DRM tokens for monetized content. Selects the nearest CDN POP based on client IP.

Search Service

Indexes video metadata and transcripts (auto-generated by speech recognition). Backed by a custom search system. Returns ranked results based on relevance, freshness, and engagement.

Recommendation Service

Two-stage system. Candidate generation: a neural network produces ~hundreds of candidate videos per user. Ranking: a second model scores each candidate based on watch-time predictions, freshness, and diversity. Online layer adapts the rankings to real-time signals (just-watched).

Comments and Engagement Service

Stores comments, likes, dislikes, subscriptions. Comments are eventually consistent; counters are maintained by a stream processor. Toxicity moderation runs on ingest.

Data model

Pick the right store per table. Justify each choice with the access pattern, not by reflex.

videos

video_id (PK)channel_idtitledescriptionduration_secondsuploaded_atstatus (uploading, encoding, ready, removed)

Sharded by video_id hash. Status drives the playback gate: only ready videos are surfaced.

video_variants

variant_id (PK)video_id (FK)codec (h264, hevc, av1)resolutionbitrate_kbpsmanifest_urlbyte_size

One row per encoded variant. The Playback API picks the right rows based on client capabilities.

watch_events

user_id (PK partition)video_idwatched_atwatch_secondscompleted

Append-only event log. Backed by Bigtable. Used as input to recommendation training.

comments

comment_id (PK)video_id (clustering)user_idtextparent_comment_idcreated_at

Sharded by video_id so that comment threads for a video are co-located.

Deep dives

These are the conversations the interviewer is steering you toward. Practice each one until you can talk through it without notes.

Upload pipeline and resumable transfers

Mobile uploads on flaky networks fail mid-transfer. Resumable uploads use byte-range PUT requests: the client uploads chunks, the server acknowledges each one, and on disconnect the client resumes from the last acked byte. Once the full file is uploaded, an UploadComplete event triggers encoding. The raw master is stored in Google Cloud Storage with regional redundancy. After encoding completes and the video is indexed, the master is moved to cold storage (Nearline or Coldline) because it is rarely needed again.

Distributed encoding with chunked parallelism

A 2-hour video at 4K is ~30 GB. Encoding it serially takes hours. The fix is chunked parallel encoding: split the master into 30-second segments at I-frame boundaries (so each chunk is decodable independently). Distribute the chunks across an encoding fleet (CPU and GPU workers). Each worker encodes its chunk into the target codec and bitrate. Concatenate the encoded chunks. Per-title encoding optimization (popularized by Netflix but also used at YouTube) further tunes the bitrate ladder based on content complexity: a static talking-head video can use lower bitrates than an action scene.

Two-stage recommendation: candidate generation and ranking

Naively scoring every video against every user is impossible at YouTube scale (billions of videos times billions of users). The standard two-stage approach: first a candidate generator (a fast model, often a two-tower neural network) produces a few hundred candidates per user from the catalog. Then a ranker (a slower, deeper model) scores each candidate based on watch-time predictions and other features. The ranker output drives the order on the home page. Real-time signals (just-watched, search query) feed an online layer that re-ranks before serving.

Serving billions of streams with global CDN

YouTube uses Google's global edge network. Popular videos are pushed to edge POPs proactively based on regional popularity predictions. The Playback API selects the nearest POP using IP geolocation. The client uses adaptive bitrate (HLS or DASH) to switch resolutions based on observed bandwidth. For long-tail videos that are not at the edge, the request fans back to a regional cache or origin. Edge caches are sized to hold the top 1 to 5% of videos, which serves over 90% of requests by view count.

Trade-offs to discuss

Every senior interviewer expects you to surface at least 3 of these. Pick the decisions, state the alternatives, and justify your choice.

Encode every video to AV1 vs only popular videos

AV1 cuts bandwidth ~30% vs H.264 but encoding is 5 to 10x slower. Encoding everything to AV1 is too expensive. The compromise: encode all uploads to H.264 immediately for fast availability, then promote videos that cross a popularity threshold to also have AV1 variants.

Spanner vs sharded MySQL for video metadata

Spanner gives you global consistency without manual sharding, at higher per-row cost. Sharded MySQL is cheaper per row but requires shard management. Google chose Spanner. A startup would not.

Streaming chunked upload vs single-shot

Single-shot fails on flaky networks and wastes bandwidth when retried. Chunked uploads add complexity but recover gracefully and let the server start encoding before the upload finishes (pipelined). Chunked wins for any file over a few MB.

Two-stage recommendations vs one-shot ranking

One-shot ranking has to score every candidate, which is computationally infeasible at YouTube scale. Two-stage (candidate gen plus ranking) lets you spend most compute on a few hundred candidates per user. Almost every large recommendation system uses this pattern.

Eager vs lazy CDN warming

Eager warming preloads popular videos to all POPs, which is wasteful for niche regional content. Lazy is the opposite. YouTube uses predictive warming: based on past viewing patterns, predict which videos a region will want and warm those. Long-tail videos cache-on-first-miss.

How YouTube actually does it

YouTube runs on Google's infrastructure: Spanner for metadata, Bigtable for watch logs, Borg and Kubernetes for compute, Google Cloud Storage for raw video. The encoding pipeline uses a chunked MapReduce-style architecture. Recommendation models train on TPUs using TensorFlow. The Playback API integrates Google's DRM (Widevine). Search uses a custom inverted-index system tightly integrated with Google's broader search infrastructure.

Sources

Lessons to study before this interview

If any of these topics are fuzzy, the interviewer will catch it. Each lesson is 15 to 60 minutes with diagrams, code, and a quiz.

Content Delivery Network

foundation / load balancing proxies

Video Encoding

intermediate / web content delivery

Adaptive Bitrate Streaming

intermediate / web content delivery

Database Sharding

foundation / database fundamentals

Capstone: Design YouTube and Netflix

capstone / capstone

Frequently asked questions

Practice with 766 system design lessons

Lifetime access for INR 499 or $7.99. Interactive diagrams, runnable code, quizzes, and 20 capstone projects including Design YouTube.

Design YouTube: System Design Interview Guide

YouTube serves 1 billion hours of video per day across 2.5 billion users, with 500 hours of new content uploaded every minute.

Where it shows up

Commonly asked at Google (YouTube), Meta, Amazon Prime Video, Netflix, Disney+, and TikTok. Often paired with Design Netflix to compare user-generated vs licensed content.

Why this question is asked

Requirements

Always clarify these in the first 5 minutes of the interview. Do not start drawing boxes until both lists are agreed.

Functional requirements

Users upload videos of any length and resolution
Videos are transcoded to multiple resolutions for adaptive bitrate streaming
Users browse, search, and watch videos
Personalized recommendations on home and watch-next
Users like, dislike, comment, and subscribe
Live streaming as a separate but related feature
Monetization (ads inserted dynamically)

Non-functional requirements

Upload-to-playback under 10 minutes for HD, under 30 minutes for 4K
Playback start latency under 2 seconds
99.99% availability for playback
Global delivery with sub-second buffer fill from the nearest edge
Cost-efficient storage tiering (cold videos move to cheaper tiers)
DMCA takedown within hours of report

Back-of-envelope scale estimates

Show your math. Pulling numbers from thin air signals you have not thought about the load.

Total users

2.5B

Public Google reporting. Assume 1.5 average logged-in profile and many anonymous viewers.

Hours watched per day

Public reporting. Average session ~40 minutes per active user.

Hours uploaded per minute

500

Public reporting. That is 720,000 hours of new content per day, every day.

Concurrent streams at peak

100M

Average concurrency works out to 40M, with peak factor of 2.5x at global evening windows.

Storage growth per year

10 to 30 EB

500 hours per minute times 1 GB per hour HD baseline times 5x for multi-resolution encoding times 365 days.

How YouTube actually does it

Design YouTube: System Design Interview Guide

Why this question is asked

Requirements

Functional requirements

Non-functional requirements

Back-of-envelope scale estimates

High-level architecture

Core components

Upload Service

Encoding Pipeline

Video Metadata Service

Playback API

Search Service

Recommendation Service

Comments and Engagement Service

Data model

Deep dives

Upload pipeline and resumable transfers

Distributed encoding with chunked parallelism

Two-stage recommendation: candidate generation and ranking

Serving billions of streams with global CDN

Trade-offs to discuss

Encode every video to AV1 vs only popular videos

Spanner vs sharded MySQL for video metadata

Streaming chunked upload vs single-shot

Two-stage recommendations vs one-shot ranking

Eager vs lazy CDN warming

How YouTube actually does it

Lessons to study before this interview

Related system design interview questions

Frequently asked questions

How does YouTube transcode so many uploads?

Why use a two-stage recommendation system?

How does YouTube serve 1 billion hours per day?

What is the data model for comments?

How does YouTube handle DMCA takedowns?

Practice with 766 system design lessons

Design YouTube: System Design Interview Guide

Why this question is asked

Requirements

Functional requirements

Non-functional requirements

Back-of-envelope scale estimates

High-level architecture

Core components

Upload Service

Encoding Pipeline

Video Metadata Service

Playback API

Search Service

Recommendation Service

Comments and Engagement Service

Data model

Deep dives

Upload pipeline and resumable transfers

Distributed encoding with chunked parallelism

Two-stage recommendation: candidate generation and ranking

Serving billions of streams with global CDN

Trade-offs to discuss

Encode every video to AV1 vs only popular videos

Spanner vs sharded MySQL for video metadata

Streaming chunked upload vs single-shot

Two-stage recommendations vs one-shot ranking

Eager vs lazy CDN warming

How YouTube actually does it

Lessons to study before this interview

Related system design interview questions

Frequently asked questions

How does YouTube transcode so many uploads?

Why use a two-stage recommendation system?

How does YouTube serve 1 billion hours per day?

What is the data model for comments?

How does YouTube handle DMCA takedowns?

Practice with 766 system design lessons