MapReduce
A programming model for processing massive datasets in parallel across a cluster. Map splits data into key-value pairs; Reduce aggregates them. Pioneered by Google, implemented by Hadoop.
What is MapReduce?
A programming model for processing massive datasets in parallel across a cluster. Map splits data into key-value pairs; Reduce aggregates them. Pioneered by Google, implemented by Hadoop.
MapReduce is a advanced concept that sits in the Stream & Batch Processing area of system design. Engineers reach for it whenever they need to reason about real-world trade-offs in that space — not just for textbook correctness, but because real production systems at companies like Netflix, Amazon, and Google make these decisions every day.
If you want to go deeper than this definition — with diagrams, code, and a quiz to lock it in — work through the "MapReduce" lesson linked below. It walks through the why, the mechanism, the trade-offs, and how the giants actually use it in production.
Learn MapReduce in depth
Full interactive lesson with diagrams, code examples, real-world references, and a quiz.
Open the MapReduce lessonRelated lessons
Lessons that touch on MapReduce as part of a larger topic.
See also
Related glossary terms you might want to look up next.
Apache Spark
A unified analytics engine for large-scale data processing with built-in modules for SQL, streaming, ML, and graph computation. Processes data in-memory for speed.
Batch Processing
Processing large volumes of data in scheduled chunks rather than in real time. Think nightly reports, ETL jobs, and data warehouse loads.
Data Lake
A centralized repository that stores raw data at any scale in its native format. Unlike a data warehouse, data doesn't need to be structured or cleaned before loading.