Is this a video course?

No. This is an interactive, slide-based learning platform. Each lesson has rich text, animated diagrams, live code editors, and quizzes. You learn by reading, interacting, and doing, not by watching videos passively.

How long do I have access?

Forever. Both pricing tiers are one-time payments with lifetime access. This includes all current 766 lessons and any future content we add.

What level of experience do I need?

None. We start from absolute basics like 'What is latency?' and build up to distributed consensus protocols. The Foundation level assumes zero prior knowledge of system design.

How much does the system design course cost?

7.99 US dollars for lifetime access globally, or 499 Indian rupees for lifetime access in India. One-time payment, no subscription, no hidden fees. 11 lessons are free with no signup required.

What technologies are covered?

Everything from DNS and load balancers to Kubernetes, Kafka, distributed databases, consensus protocols, stream processing, security architecture, and observability. We cover principles and real-world implementations used at Netflix, Google, Amazon, Uber, Stripe, and more.

Is this useful for system design interview preparation?

Yes. The lessons are structured around the exact topics asked in system design interviews at FAANG and top-tier companies. Interactive diagrams help you practice whiteboard-style explanations. Covers everything from URL shortener design to distributed payment systems.

How is this different from ByteByteGo or Educative?

766 interactive lessons (4x more than most competitors), 16 different diagram types that build step by step, real production examples from Netflix, Google, Amazon, Uber, and Stripe, and lifetime access for a one-time payment of $7.99 instead of annual subscriptions costing 100 to 200 dollars per year.

Should I use ETL or ELT?

Use ELT for most new builds on a cloud warehouse: load raw data first, transform it in place, and keep the raw copy so you can reprocess when logic changes. Use ETL when you need to clean or reduce data before storing it, often for cost, compliance, or when the destination cannot transform efficiently. ELT trades higher storage cost for flexibility; ETL trades flexibility for tighter control over what gets stored.

What is event time versus processing time, and why does it matter?

Event time is when something actually happened; processing time is when your system received it. They drift apart because of network delays and buffering, so events arrive late and out of order. Computing on processing time is fast but can be wrong; computing on event time gives correct results but requires watermarks and late-data handling to know when a window is safe to close.

What are windows in stream processing?

Windows bound an infinite stream into finite chunks you can compute over. Tumbling windows are fixed and non-overlapping, hopping windows overlap by advancing on a smaller step, sliding windows recompute on every event over a rolling period, and session windows group activity separated by gaps of inactivity. You also choose between time-based windows (bound by the clock) and count-based windows (bound by number of events).

What does exactly-once semantics actually guarantee?

It guarantees that each event affects the final result exactly one time, even when machines crash and events are replayed. It does not mean an event is physically delivered only once; under the hood the system may retry, but checkpointing and idempotent or transactional writes ensure the effect is counted once. This matters most for money and inventory, where a double-count is a real loss rather than a metric error.

Lambda or Kappa architecture, which should I choose?

Prefer Kappa unless you have a concrete reason not to. Kappa treats everything as a stream and recomputes history by replaying the event log, so you maintain one codebase. Lambda runs separate batch and speed layers, which gives proven correctness from the batch side but forces you to write and keep the same logic in sync twice. Lambda makes sense when your batch and streaming computations genuinely must differ or when you already have a mature batch layer you trust.

advanced

Stream and Batch Processing

Every fraud charge a bank blocks, every "trending now" list, every dashboard a finance team trusts at quarter close comes down to one question: do you process data after it lands, or as it arrives? Get that choice wrong and you either pay for fraud you could have caught in 200 milliseconds, or you build a real-time pipeline so complex it falls over the first time events arrive out of order. Stream and batch processing is the set of techniques for moving, shaping, and computing over large volumes of data, and knowing when each one fits is what separates a pipeline that survives production from one that pages you at 3 AM.

This category covers both ends of that spectrum and the architectures that blend them. You will learn the building blocks of batch (batch processing, batch analytics, MapReduce, ETL, ELT) and the harder discipline of streaming, where data never stops and you have to reason about time itself: event time versus processing time, windows, watermarks, late and out-of-order data, and how to keep state correct across failures with checkpointing and exactly-once semantics. It closes with the system-level patterns, Lambda and Kappa architecture, plus the governance and organizational layers (data catalog, lineage, data mesh, data fabric) that keep all of this usable at scale.

Stream and Batch Processing: the landscape

What Stream and Batch Processing Actually Means

Batch processing collects data into a bounded set and computes over all of it at once, usually on a schedule. You run a job at 2 AM that reads yesterday's orders, joins them against the product table, and writes a sales report. The defining trait is that the input is finite and known before the job starts. MapReduce was the model that made this idea scale across thousands of machines, and the modern descendants (Spark, BigQuery, Snowflake) still follow the same shape: read a lot, transform, write a lot. Batch analytics is what most companies run for reporting, finance, and machine learning training, because correctness and cost matter more than freshness.

Stream processing computes over an unbounded sequence of events as they arrive. There is no end to the input. A payment event shows up, you check it against the last hour of activity, and you decide in milliseconds. Because the data never stops, you cannot wait for all of it, so you carve the stream into windows and compute partial answers continuously. Stateless stream processing handles each event independently (filter, map, enrich), while stateful stream processing remembers things across events (running counts, sessions, joins), which is where the real engineering lives.

Micro-batching sits in the middle. Instead of acting on every single event, you collect events for a short interval, a few hundred milliseconds to a few seconds, and process that tiny batch. Spark Structured Streaming popularized this. You trade a little latency for much simpler fault recovery and higher throughput, which is often the right deal.

Moving Data: ETL, ELT, and the Pipeline

Before you can compute, you have to move and shape data, and the two dominant patterns are ETL and ELT. ETL (extract, transform, load) cleans and reshapes data in a dedicated processing layer, then loads the finished result into the warehouse. It was the standard when storage and compute were expensive and tightly coupled, because you only stored what you needed. An ETL pipeline is a chain of these steps, often orchestrated by a scheduler with retries, dependencies, and alerting.

ELT (extract, load, transform) flips the last two steps. You dump raw data into a cheap, scalable warehouse first, then transform it in place using the warehouse's own compute. Cloud warehouses made this the default for most new builds, because separating storage from compute means you can keep the raw data forever, reprocess it when business logic changes, and let SQL-fluent analysts own transformations instead of a dedicated engineering team. The trade-off is that you store and pay for raw data you may never use, and governance becomes harder because anyone can transform anything.

This is also where the governance lessons earn their place. A data catalog tells people what data exists and what it means. Metadata management keeps the descriptions, schemas, and ownership current. Data lineage traces a number on a dashboard back through every transformation to its source, which is the difference between fixing a bad metric in an hour versus a week. None of these compute anything, but without them a large pipeline becomes a system nobody trusts.

The Hard Part of Streaming: Time, Windows, and State

In batch, time is simple because the data is already complete. In streaming, time is the central problem. Event time is when something actually happened (the timestamp on the click). Processing time is when your system saw it. These drift apart constantly: a phone goes through a tunnel, buffers events, and delivers them ten minutes late. If you compute on processing time you get fast but wrong answers; if you compute on event time you get correct answers but have to handle data that arrives out of order.

Windows are how you bound an infinite stream into computable chunks. Tumbling windows are fixed and non-overlapping (every 5 minutes, no gaps, no overlap). Hopping windows overlap by advancing on a smaller step than their size. Sliding windows recompute on every event for a rolling period. Session windows group bursts of activity separated by gaps of inactivity, which is how you measure a user's session without a fixed length. These split further into time-based windows and count-based windows depending on whether you bound by clock or by number of events.

Watermarking is the mechanism that makes event-time windows finishable. A watermark is the system's assertion that it has probably seen all events up to a given time, so a window can close and emit its result. Late data handling and out-of-order processing decide what happens to stragglers that arrive after the watermark: drop them, update the result, or route them to a side output. Keeping all of this correct across machine failures requires state management in streams, checkpointing strategies to snapshot that state durably, and exactly-once semantics so a crash and replay does not double-count a payment. Stream joins, stream-to-stream joins, stream-to-table joins, and change streams are how you combine multiple live feeds, and they are the most state-heavy operations of all.

Choosing an Architecture and the Trade-offs

Once you have both batch and streaming, you have to decide how they coexist, and there are two well-known answers. Lambda architecture runs a batch layer for accurate, complete results and a speed layer for fast, approximate ones, then merges them at query time. It gives you correctness and freshness, but you pay by writing and maintaining the same business logic twice, once in batch code and once in streaming code, and keeping them in agreement forever.

Kappa architecture removes the batch layer entirely. Everything is a stream, and when you need to recompute history you replay the event log through the same streaming code. This eliminates the duplicate-logic problem and is cleaner to operate, but it demands a durable, replayable log (Kafka and similar) and a streaming engine strong enough to handle both live and historical loads. The practical rule: reach for streaming when the decision must happen now (fraud, alerting, live dashboards, personalization), reach for batch when correctness and cost dominate (billing, reporting, model training), and choose Kappa over Lambda unless you have a genuine reason your batch and stream logic must differ.

At the largest scale, the question stops being about jobs and becomes about ownership. Data mesh treats data as a product owned by the teams that produce it, decentralizing pipelines instead of funneling everything through one central data team that becomes a bottleneck. Data fabric is the complementary idea of a unified access and integration layer over data that lives in many places. These are organizational and platform patterns, not processing engines, but at enterprise scale they decide whether your stream and batch investments actually get used.

How Real Companies Use This

Netflix runs a Kappa-style backbone on Kafka and Flink, processing trillions of events a day to power recommendations, A/B test results, and operational alerting, while keeping heavy batch jobs for model training and long-range analytics. The event log is the source of truth, and reprocessing means replaying it.

Uber's surge pricing and ETA models depend on stream processing with tight event-time windows and watermarks, because a price computed on stale or out-of-order GPS pings is worse than useless. They pair this with massive batch pipelines for billing reconciliation and analytics, a textbook split of streaming for the live decision and batch for the audited, correct number.

Financial systems lean on exactly-once semantics and checkpointing the hardest, because a duplicated or dropped transaction is not a metric error, it is money. They typically run a streaming path for fraud detection and instant alerts, and a batch path for end-of-day settlement, with strong data lineage so every figure is traceable for auditors. The common thread across all three is that they do not pick stream or batch, they pick both, and the skill is knowing which problem belongs to which.

Frequently asked questions

Learn Stream and Batch Processing the interactive way

All 36 lessons with step by step diagrams, runnable code, and quizzes. One payment of ₹499 in India or $7.99 worldwide. Lifetime access, no subscription.

Stream and Batch Processing

What Stream and Batch Processing Actually Means

Moving Data: ETL, ELT, and the Pipeline

The Hard Part of Streaming: Time, Windows, and State

Choosing an Architecture and the Trade-offs

How Real Companies Use This

Stream and Batch Processing

What Stream and Batch Processing Actually Means

Moving Data: ETL, ELT, and the Pipeline

The Hard Part of Streaming: Time, Windows, and State

Choosing an Architecture and the Trade-offs

How Real Companies Use This

All 36 lessons in Stream and Batch Processing

Frequently asked questions

Learn Stream and Batch Processing the interactive way

Stream and Batch Processing

What Stream and Batch Processing Actually Means

Moving Data: ETL, ELT, and the Pipeline

The Hard Part of Streaming: Time, Windows, and State

Choosing an Architecture and the Trade-offs

How Real Companies Use This

All 36 lessons in Stream and Batch Processing

Frequently asked questions

Learn Stream and Batch Processing the interactive way

What Stream and Batch Processing Actually Means

Moving Data: ETL, ELT, and the Pipeline

The Hard Part of Streaming: Time, Windows, and State

Choosing an Architecture and the Trade-offs

How Real Companies Use This

All 36 lessons in Stream and Batch Processing

Frequently asked questions

What is the difference between batch and stream processing?

Should I use ETL or ELT?

What is event time versus processing time, and why does it matter?

What are windows in stream processing?

What does exactly-once semantics actually guarantee?

Lambda or Kappa architecture, which should I choose?

Learn Stream and Batch Processing the interactive way

What Stream and Batch Processing Actually Means

Moving Data: ETL, ELT, and the Pipeline

The Hard Part of Streaming: Time, Windows, and State

Choosing an Architecture and the Trade-offs

How Real Companies Use This

All 36 lessons in Stream and Batch Processing

Frequently asked questions

What is the difference between batch and stream processing?

Should I use ETL or ELT?

What is event time versus processing time, and why does it matter?

What are windows in stream processing?

What does exactly-once semantics actually guarantee?

Lambda or Kappa architecture, which should I choose?

Learn Stream and Batch Processing the interactive way