Is this a video course?

No. This is an interactive, slide-based learning platform. Each lesson has rich text, animated diagrams, live code editors, and quizzes. You learn by reading, interacting, and doing, not by watching videos passively.

How long do I have access?

Forever. Both pricing tiers are one-time payments with lifetime access. This includes all current 766 lessons and any future content we add.

What level of experience do I need?

None. We start from absolute basics like 'What is latency?' and build up to distributed consensus protocols. The Foundation level assumes zero prior knowledge of system design.

How much does the system design course cost?

7.99 US dollars for lifetime access globally, or 499 Indian rupees for lifetime access in India. One-time payment, no subscription, no hidden fees. 11 lessons are free with no signup required.

What technologies are covered?

Everything from DNS and load balancers to Kubernetes, Kafka, distributed databases, consensus protocols, stream processing, security architecture, and observability. We cover principles and real-world implementations used at Netflix, Google, Amazon, Uber, Stripe, and more.

Is this useful for system design interview preparation?

Yes. The lessons are structured around the exact topics asked in system design interviews at FAANG and top-tier companies. Interactive diagrams help you practice whiteboard-style explanations. Covers everything from URL shortener design to distributed payment systems.

How is this different from ByteByteGo or Educative?

766 interactive lessons (4x more than most competitors), 16 different diagram types that build step by step, real production examples from Netflix, Google, Amazon, Uber, and Stripe, and lifetime access for a one-time payment of $7.99 instead of annual subscriptions costing 100 to 200 dollars per year.

What is the difference between SQL and NoSQL databases, and when should I use each?

SQL databases use a fixed relational schema with tables and joins and give strong transactional guarantees, which suits data with clear relationships and correctness requirements like orders and payments. NoSQL is an umbrella for non-relational families: key-value stores for fast single-key lookups, document stores for flexible nested records, column-family stores for high write throughput at scale, and graph databases for connected data. Use SQL when you need transactions, complex queries, and a stable schema. Reach for a NoSQL family when one specific access pattern (huge writes, deep relationship traversal, flexible documents) dominates and a relational engine would fight you. Most large systems use both.

When should I use object storage instead of a database?

Use object storage for large, immutable files you read by key and rarely change: images, video, backups, logs, model artifacts, and data lake files. It scales to effectively unlimited capacity at very low cost but has higher latency and treats objects as write-once. Use a database when you need fast lookups by many fields, frequent updates, transactions, or rich queries. A common pattern is to store the file itself in object storage and keep its metadata and pointer in a database, so you get cheap bulk storage and fast, queryable references.

What is the difference between row-oriented and columnar storage?

Row-oriented storage keeps an entire record physically together, so reading or writing one whole row is fast. That fits transactional workloads where you touch complete records, like fetching or updating a single user. Columnar storage keeps each column together, so analytical queries that scan one or two columns over billions of rows read far less data and compress much better. That fits data warehouses and reporting. The rule of thumb: row stores for operational systems that read and write whole records, column stores for analytics that aggregate a few columns across many rows.

What is a vector database and why do AI applications need one?

A vector database stores high-dimensional embeddings, which are numeric representations of text, images, or audio that capture meaning. Instead of matching exact keywords, it answers "find the items most similar to this one" using similarity search. AI applications need this for semantic search and retrieval-augmented generation, where you fetch the most relevant context for a model. Because exact nearest-neighbor search over millions of vectors is too slow, these systems use approximate nearest neighbor algorithms that trade a small amount of accuracy for a large speedup, which is what makes them practical at scale.

What is polyglot persistence and is using many databases worth the complexity?

Polyglot persistence means deliberately using different storage systems for different parts of one application: a relational database for transactions, a key-value store for sessions, a search engine for the catalog, a time-series database for metrics, and a vector database for recommendations. It is worth it when one general-purpose database would force every workload into a poor fit and cap your scale or speed. The cost is operational: more systems to run, monitor, and keep consistent. Start with one database, and add a specialized store only when a clear access pattern justifies it, rather than collecting databases for their own sake.

Why do so many databases use LSM trees, and what is the trade-off?

LSM trees (log-structured merge trees) buffer incoming writes in memory and periodically flush them to disk as sorted files, then merge those files in the background through compaction. This makes writes very fast because they are sequential, which is why write-heavy stores like Cassandra and RocksDB use them. The trade-off is that reads may have to check several files, so engines add bloom filters to skip files that cannot contain a key, and compaction consumes background CPU and disk. The alternative, a B-tree, gives faster predictable reads but slower random writes, so the choice depends on whether your workload is write-heavy or read-heavy.

intermediate

Database Types and Storage

Picture the day your single Postgres instance stops being enough. Search queries crawl, your analytics dashboards time out, the recommendation feature your team promised needs vector similarity you do not have, and your storage bill keeps climbing because every byte you have ever written still sits on fast disk. None of those problems is solved by tuning one database harder. They are solved by picking the right kind of storage for each job. That decision, repeated across a system, is what separates an architecture that scales from one that quietly falls over at the worst possible moment.

This category covers the full landscape of how data is stored and retrieved: the physical storage layers (file, block, object, blob), the database families built on top (key-value, document, columnar, column-family, time-series, graph, in-memory, vector), the indexes that make reads fast (inverted, forward, bitmap, geospatial), the data structures storage engines run on (LSM trees, skip lists, bloom filters, merkle trees), the analytics platforms (data warehousing, data lakes), the lifecycle tiers that control cost (hot-warm-cold, tiered, cold, archive, hybrid), and the patterns for combining many stores in one system (polyglot persistence, federation, multi-model, NewSQL, distributed SQL, global tables). The goal is to know what each one is good at, what it is bad at, and which one to reach for under real pressure.

Database Types and Storage: the landscape

From Raw Storage to Database Engines

Every database sits on top of a storage primitive, and the primitive shapes what the database can do. File storage gives you a hierarchy of named files and directories, which is why shared file systems back content pipelines and home directories. Block storage hands you raw fixed-size blocks with no notion of files at all, which is what database engines actually want when they manage their own pages and write patterns. Object storage drops the hierarchy entirely and stores immutable blobs addressed by a key, with metadata attached, which is why it underpins almost every modern data lake and backup system. Blob storage is the same idea expressed in cloud vendor terms.

The trade-off line runs along latency, mutability, and scale. Block storage is the fastest and most flexible per byte but the hardest to scale and the most expensive. Object storage scales to effectively unlimited capacity at low cost but has higher latency and treats objects as write-once. File storage sits in the middle and is the most familiar to applications but the least elastic. A system design answer that says "store the user uploads in object storage and put the hot operational data on block storage" is already showing it understands the layers underneath the database.

On top of these primitives, the storage engine decides how data is laid out on disk. Row-oriented storage keeps a whole record together, which is ideal when you read and write entire rows in transactional workloads. Columnar storage keeps each column together, which is what makes analytical scans over billions of rows fast because you only read the columns you need and they compress beautifully. Knowing whether a workload is row-shaped or column-shaped is the first fork in almost every storage decision.

The Database Families and What Each Is Built For

There is no single best database, only databases that are good at specific access patterns. Key-value stores give you the fastest possible lookup by a single key and almost nothing else, which is exactly right for sessions, feature flags, and caches. Document stores let each record carry a flexible nested structure, which suits product catalogs and user profiles where the shape varies. Column-family stores spread wide, sparse rows across many machines and are built for very high write throughput at massive scale. Time-series databases optimize for append-heavy, timestamp-ordered data like metrics and sensor readings, where you almost always query recent windows.

Graph databases store relationships as first-class objects, so traversing connections like "friends of friends who liked this" stays fast even many hops deep, where a relational join would explode. In-memory databases keep the working set in RAM to push latency into the microsecond range, trading durability guarantees and cost for raw speed. Vector databases store high-dimensional embeddings and answer similarity-search queries, which is the storage layer that makes semantic search and retrieval-augmented generation possible. Embedding storage, similarity search, and approximate nearest neighbor are the mechanics that make a vector database useful at scale, since exact nearest-neighbor search over millions of vectors is too slow and ANN trades a little accuracy for a large speedup.

For search itself, full-text search and search engines like Elasticsearch and Solr exist because no general-purpose database ranks and matches free text well. They are powered by an inverted index, which maps each term to the documents that contain it, the inverse of a forward index that maps each document to its terms. Bitmap indexes accelerate filtering on low-cardinality columns, and geospatial indexing makes "what is near me" queries fast by partitioning two-dimensional space.

Choosing, Combining, and the Trade-offs That Matter

The honest answer to "which database should I use" is usually "more than one." Polyglot persistence is the deliberate practice of using a different store for each part of the system: a relational database for orders, a key-value store for sessions, a search engine for the catalog, a time-series database for metrics. Database federation puts a query layer in front of several stores so they look like one, and multi-model databases try to support several data models in a single engine to reduce operational sprawl. Each of these is a trade between operational simplicity and using the best tool for each job.

The SQL world has been catching up to the scale that NoSQL once owned. NewSQL databases keep the relational model and ACID transactions while scaling horizontally, and distributed SQL extends that across many nodes and regions. Global tables replicate a single logical table across regions so users everywhere read locally, accepting the consistency trade-offs that come with geo-replication. When you need transactions and scale together, these are the families to study, because the old assumption that you must give up SQL to scale is no longer true.

Underneath all of this sit the data structures that make storage engines work, and understanding them explains why each database behaves the way it does. LSM trees buffer writes in memory and flush sorted runs to disk, which is why write-heavy stores like Cassandra are fast on writes but pay a read and compaction cost. Bloom filters let an engine skip reading a file that definitely does not contain a key, cutting wasted disk reads. Skip lists give in-memory stores ordered data with simple, fast inserts, and merkle trees let replicas detect and repair differences efficiently. These are also the topics that show up most in senior interviews.

How Real Companies Put This Together

Look inside any large system and you find a fleet of storage systems, not one database. Netflix stores user-facing data in Cassandra (a column-family store on LSM trees), keeps its viewing history and metrics in time-series and search systems, and pushes its enormous video catalog into object storage on S3 fronted by a data lake for analytics. The point is not the brand names, it is that each workload landed on the storage type that fits its access pattern.

Cost discipline is the other half of the story, and it is handled by storage lifecycle. Hot-warm-cold architecture and tiered storage move data to cheaper, slower media as it ages, so the metrics you queried today live on fast storage while last year's logs drift down to cold storage and eventually archive storage that costs almost nothing but takes hours to retrieve. Hybrid storage blends on-premise and cloud to balance control and elasticity. Companies that get this right pay for performance only where they need it. Spotify, Uber, and most data-heavy platforms run exactly this kind of tiered, polyglot setup: a graph or document store for the social and catalog data, a search engine over an inverted index for discovery, a vector database for recommendations, a data warehouse and data lake for analytics, and aggressive tiering underneath to keep the bill sane. Learning these as one connected map, rather than as isolated buzzwords, is what lets you design storage that holds up in production.

Frequently asked questions

Learn Database Types and Storage the interactive way

All 39 lessons with step by step diagrams, runnable code, and quizzes. One payment of ₹499 in India or $7.99 worldwide. Lifetime access, no subscription.

Database Types and Storage

From Raw Storage to Database Engines

The Database Families and What Each Is Built For

Choosing, Combining, and the Trade-offs That Matter

How Real Companies Put This Together

Frequently asked questions

Database Types and Storage

From Raw Storage to Database Engines

The Database Families and What Each Is Built For

Choosing, Combining, and the Trade-offs That Matter

How Real Companies Put This Together

All 39 lessons in Database Types and Storage

Frequently asked questions

Learn Database Types and Storage the interactive way

Database Types and Storage

From Raw Storage to Database Engines

The Database Families and What Each Is Built For

Choosing, Combining, and the Trade-offs That Matter

How Real Companies Put This Together

All 39 lessons in Database Types and Storage

Frequently asked questions

Learn Database Types and Storage the interactive way