ETL
Extract, Transform, Load: a pipeline that extracts data from sources, transforms it into the desired format, and loads it into a destination like a data warehouse.
What is ETL?
Extract, Transform, Load: a pipeline that extracts data from sources, transforms it into the desired format, and loads it into a destination like a data warehouse.
ETL is a intermediate-level concept that sits in the Data Replication & Distribution area of system design. Engineers reach for it whenever they need to reason about real-world trade-offs in that space — not just for textbook correctness, but because real production systems at companies like Netflix, Amazon, and Google make these decisions every day.
If you want to go deeper than this definition — with diagrams, code, and a quiz to lock it in — work through the "ETL" lesson linked below. It walks through the why, the mechanism, the trade-offs, and how the giants actually use it in production.
Learn ETL in depth
Full interactive lesson with diagrams, code examples, real-world references, and a quiz.
Open the ETL lessonRelated lessons
Lessons that touch on ETL as part of a larger topic.
ETL
Extract, Transform, Load, moving and reshaping data between systems
advanced · stream batch processing
Data Transformation
Convert data from one format, structure, or representation to another, the glue between incompatible systems
intermediate · data governance compliance
Data Cleansing
Fix, standardize, and repair dirty data, turning messy real-world inputs into reliable records
intermediate · data governance compliance
Data Migration
Moving data between systems, formats, or schemas safely and completely, with validation, rollback, and zero data loss
intermediate · devops cicd
ETL Pipeline
Orchestrating multi-step data transformations. Airflow, dbt, and pipeline design patterns
advanced · stream batch processing
See also
Related glossary terms you might want to look up next.
Data Lake
A centralized repository that stores raw data at any scale in its native format. Unlike a data warehouse, data doesn't need to be structured or cleaned before loading.
Batch Processing
Processing large volumes of data in scheduled chunks rather than in real time. Think nightly reports, ETL jobs, and data warehouse loads.
Data Lineage
Tracking data from its origin through every transformation and system it passes through. Answers 'where did this number come from?' for audits and debugging.