Is this a video course?

No. This is an interactive, slide-based learning platform. Each lesson has rich text, animated diagrams, live code editors, and quizzes. You learn by reading, interacting, and doing, not by watching videos passively.

How long do I have access?

Forever. Both pricing tiers are one-time payments with lifetime access. This includes all current 766 lessons and any future content we add.

What level of experience do I need?

None. We start from absolute basics like 'What is latency?' and build up to distributed consensus protocols. The Foundation level assumes zero prior knowledge of system design.

How much does the system design course cost?

7.99 US dollars for lifetime access globally, or 499 Indian rupees for lifetime access in India. One-time payment, no subscription, no hidden fees. 11 lessons are free with no signup required.

What technologies are covered?

Everything from DNS and load balancers to Kubernetes, Kafka, distributed databases, consensus protocols, stream processing, security architecture, and observability. We cover principles and real-world implementations used at Netflix, Google, Amazon, Uber, Stripe, and more.

Is this useful for system design interview preparation?

Yes. The lessons are structured around the exact topics asked in system design interviews at FAANG and top-tier companies. Interactive diagrams help you practice whiteboard-style explanations. Covers everything from URL shortener design to distributed payment systems.

How is this different from ByteByteGo or Educative?

766 interactive lessons (4x more than most competitors), 16 different diagram types that build step by step, real production examples from Netflix, Google, Amazon, Uber, and Stripe, and lifetime access for a one-time payment of $7.99 instead of annual subscriptions costing 100 to 200 dollars per year.

What is the difference between OpenTelemetry and Prometheus?

Prometheus is a metrics database and scraping system; it only handles metrics. OpenTelemetry is a broader standard that covers traces, metrics, and logs and is vendor-neutral about storage. They work together: OTel can produce metrics and the Collector can expose or forward them to Prometheus, which stores and queries them.

Is OpenTelemetry free?

Yes. OpenTelemetry is open source under the Apache 2.0 license and hosted by the CNCF. The SDKs and Collector cost nothing. What you pay for is the backend that stores and visualizes the data, whether that is a self-hosted tool like Jaeger or Grafana or a commercial vendor like Datadog or Honeycomb.

OTLP is the OpenTelemetry Protocol, the native wire format for sending telemetry. It encodes traces, metrics, and logs as Protobuf and transports them over gRPC or HTTP. Almost every modern observability backend can receive OTLP directly, which is what makes switching vendors a config change instead of a rewrite.

Do I need the OpenTelemetry Collector?

No, you can export directly from your app's SDK to a backend. But the Collector is recommended in production because it centralizes batching, sampling, attribute filtering, and PII redaction, and it lets you change backends without redeploying your services.

What are traces, metrics, and logs in OpenTelemetry?

Traces follow a single request across services and are made of spans. Metrics are numeric measurements over time, like request rate or memory usage. Logs are timestamped text or structured records of discrete events. OpenTelemetry can correlate all three using shared identifiers like the trace ID, so a log line can link back to the exact trace it belongs to.

IntermediateObservability & Monitoring

OpenTelemetry

A vendor-neutral open standard for collecting metrics, logs, and traces from applications. Provides SDKs and a collector that ships telemetry to any observability backend.

What is OpenTelemetry?

In short

OpenTelemetry (often shortened to OTel) is a vendor-neutral open standard, hosted by the CNCF, for generating and collecting telemetry data from software: traces, metrics, and logs. It gives you one set of SDKs, one wire protocol (OTLP), and one collector so your application code stays the same whether you send that data to Datadog, Grafana, Jaeger, Honeycomb, or any other backend.

What OpenTelemetry actually is

OpenTelemetry is two things at once: a specification and a set of implementations. The specification defines what a trace, a span, a metric, and a log record look like, and how they are encoded on the wire. The implementations are language SDKs (Go, Java, Python, JavaScript, .NET, Rust, and more) plus a standalone service called the Collector.

Before OpenTelemetry, every observability vendor shipped its own agent and its own SDK. If you instrumented your code with the Datadog tracer and later wanted to move to Grafana, you ripped out and rewrote the instrumentation. OpenTelemetry breaks that lock-in. You instrument once against the OTel API, and you choose where the data goes by swapping an exporter, not by rewriting your app.

It is the result of merging two earlier projects, OpenTracing and OpenCensus, in 2019. Today it is the second most active CNCF project after Kubernetes, and nearly every major observability vendor accepts its native protocol, OTLP.

How it works under the hood

The three signals are traces, metrics, and logs. A trace is the story of one request as it moves through your services. It is made of spans, where each span records a unit of work with a start time, a duration, and key-value attributes like http.status_code or db.statement. Spans carry a trace ID and a parent span ID, so the backend can stitch them into a tree even when they came from ten different services.

Context propagation is the glue. When service A calls service B, the OTel SDK injects the trace ID and span ID into the outgoing HTTP headers (the W3C traceparent header). Service B reads those headers and continues the same trace. This is what lets you follow a single user request across a whole distributed system.

In your code, the API layer creates spans and records measurements. The SDK batches them and hands them to an exporter, which serializes to OTLP (a Protobuf format over gRPC or HTTP) and ships them out. Most teams do not export straight to a vendor. They send to the OpenTelemetry Collector, a separate process that receives telemetry, runs it through processors (batching, sampling, dropping noisy attributes, redacting PII), and then fans it out to one or more backends.

When to use it and the trade-offs

Reach for OpenTelemetry when you run more than a couple of services, when you want to avoid being locked to one observability vendor, or when you need a single consistent way to instrument apps written in different languages. Auto-instrumentation libraries can wire up popular frameworks like Express, Flask, Spring, and gRPC with almost no code changes, which makes the on-ramp cheap.

The main cost is operational. Running and tuning the Collector is real work, and naive setups generate huge volumes of span data that get expensive to store. Most teams turn on sampling, often tail-based sampling in the Collector, so they keep the interesting traces (errors, slow requests) and drop the boring ones. Cardinality on metric attributes is another trap; a label like user_id can explode your time series count.

Maturity also varies by signal and language. Tracing is stable across most SDKs. Metrics are stable in the major languages. Logs reached stability later and the bridge from existing logging libraries is still uneven in some ecosystems. Check the status page for your specific language before betting on a signal.

A concrete example

Picture a checkout flow: a browser hits an API gateway, which calls a cart service, which calls a payments service, which writes to Postgres. With OTel auto-instrumentation on each service, the gateway starts a trace and a traceparent header rides along on every internal call.

When a customer reports that checkout was slow, you open the trace in your backend and see a waterfall: the gateway took 40ms, the cart service 30ms, but the payments service span shows a 2.1 second database query against Postgres. The slow span carries the exact SQL statement as an attribute, so you know which query to fix without guessing or adding more logging. That single, end-to-end view across four services is the payoff OpenTelemetry is built to deliver.

Where it is used in production

Shopify

Standardized telemetry across thousands of services on OpenTelemetry to unify tracing during Black Friday-scale traffic.

Grafana (Tempo, Mimir, Loki)

Ingests OTLP natively, so OTel traces, metrics, and logs flow straight into the Grafana stack without proprietary agents.

Datadog

Accepts OTLP and ships an OTel-compatible exporter, letting customers instrument with OTel and still use Datadog's UI.

Microsoft Azure Monitor

Built its Application Insights distro on OpenTelemetry, making OTel the recommended way to instrument apps on Azure.

Frequently asked questions

What is the difference between OpenTelemetry and Prometheus?: Prometheus is a metrics database and scraping system; it only handles metrics. OpenTelemetry is a broader standard that covers traces, metrics, and logs and is vendor-neutral about storage. They work together: OTel can produce metrics and the Collector can expose or forward them to Prometheus, which stores and queries them.
Is OpenTelemetry free?: Yes. OpenTelemetry is open source under the Apache 2.0 license and hosted by the CNCF. The SDKs and Collector cost nothing. What you pay for is the backend that stores and visualizes the data, whether that is a self-hosted tool like Jaeger or Grafana or a commercial vendor like Datadog or Honeycomb.
What is OTLP?: OTLP is the OpenTelemetry Protocol, the native wire format for sending telemetry. It encodes traces, metrics, and logs as Protobuf and transports them over gRPC or HTTP. Almost every modern observability backend can receive OTLP directly, which is what makes switching vendors a config change instead of a rewrite.
Do I need the OpenTelemetry Collector?: No, you can export directly from your app's SDK to a backend. But the Collector is recommended in production because it centralizes batching, sampling, attribute filtering, and PII redaction, and it lets you change backends without redeploying your services.
What are traces, metrics, and logs in OpenTelemetry?: Traces follow a single request across services and are made of spans. Metrics are numeric measurements over time, like request rate or memory usage. Logs are timestamped text or structured records of discrete events. OpenTelemetry can correlate all three using shared identifiers like the trace ID, so a log line can link back to the exact trace it belongs to.

Learn OpenTelemetry hands-on

This page explains the idea. The full lesson lets you step through the ring as servers join and leave, read the implementation, and check yourself with a quiz. It is one of 760+ lessons in the System Design Masterclass, from your first API call to distributed consensus. Eleven Foundation lessons are free, no signup. Lifetime access is ₹499 in India or $7.99 worldwide, one payment, no subscription.

Open the OpenTelemetry lesson See pricing

Lessons that touch on OpenTelemetry as part of a larger topic.

What OpenTelemetry actually is

How it works under the hood

When to use it and the trade-offs

A concrete example

Where it is used in production

Shopify

Standardized telemetry across thousands of services on OpenTelemetry to unify tracing during Black Friday-scale traffic.

Grafana (Tempo, Mimir, Loki)

Ingests OTLP natively, so OTel traces, metrics, and logs flow straight into the Grafana stack without proprietary agents.

Datadog

Accepts OTLP and ships an OTel-compatible exporter, letting customers instrument with OTel and still use Datadog's UI.

Microsoft Azure Monitor

Built its Application Insights distro on OpenTelemetry, making OTel the recommended way to instrument apps on Azure.

Frequently asked questions

What is the difference between OpenTelemetry and Prometheus?: Prometheus is a metrics database and scraping system; it only handles metrics. OpenTelemetry is a broader standard that covers traces, metrics, and logs and is vendor-neutral about storage. They work together: OTel can produce metrics and the Collector can expose or forward them to Prometheus, which stores and queries them.
Is OpenTelemetry free?: Yes. OpenTelemetry is open source under the Apache 2.0 license and hosted by the CNCF. The SDKs and Collector cost nothing. What you pay for is the backend that stores and visualizes the data, whether that is a self-hosted tool like Jaeger or Grafana or a commercial vendor like Datadog or Honeycomb.
What is OTLP?: OTLP is the OpenTelemetry Protocol, the native wire format for sending telemetry. It encodes traces, metrics, and logs as Protobuf and transports them over gRPC or HTTP. Almost every modern observability backend can receive OTLP directly, which is what makes switching vendors a config change instead of a rewrite.
Do I need the OpenTelemetry Collector?: No, you can export directly from your app's SDK to a backend. But the Collector is recommended in production because it centralizes batching, sampling, attribute filtering, and PII redaction, and it lets you change backends without redeploying your services.
What are traces, metrics, and logs in OpenTelemetry?: Traces follow a single request across services and are made of spans. Metrics are numeric measurements over time, like request rate or memory usage. Logs are timestamped text or structured records of discrete events. OpenTelemetry can correlate all three using shared identifiers like the trace ID, so a log line can link back to the exact trace it belongs to.

Learn OpenTelemetry hands-on

Open the OpenTelemetry lesson See pricing

OpenTelemetry

What is OpenTelemetry?

What OpenTelemetry actually is

How it works under the hood

When to use it and the trade-offs

A concrete example

Where it is used in production

Frequently asked questions

See also

OpenTelemetry

What is OpenTelemetry?

What OpenTelemetry actually is

How it works under the hood

When to use it and the trade-offs

A concrete example

Where it is used in production

Frequently asked questions

See also

What is OpenTelemetry?

What OpenTelemetry actually is

How it works under the hood

When to use it and the trade-offs

A concrete example

Where it is used in production

Frequently asked questions

Related lessons

See also

What is OpenTelemetry?

What OpenTelemetry actually is

How it works under the hood

When to use it and the trade-offs

A concrete example

Where it is used in production

Frequently asked questions

Related lessons

See also