Metrics
Numerical measurements collected over time that describe system behavior: request rate, error rate, latency percentiles, CPU utilization. Prometheus is the standard collector.
What is Metrics?
Numerical measurements collected over time that describe system behavior: request rate, error rate, latency percentiles, CPU utilization. Prometheus is the standard collector.
Metrics is a intermediate-level concept that sits in the Observability & Monitoring area of system design. Engineers reach for it whenever they need to reason about real-world trade-offs in that space — not just for textbook correctness, but because real production systems at companies like Netflix, Amazon, and Google make these decisions every day.
If you want to go deeper than this definition — with diagrams, code, and a quiz to lock it in — work through the "Metrics" lesson linked below. It walks through the why, the mechanism, the trade-offs, and how the giants actually use it in production.
Learn Metrics in depth
Full interactive lesson with diagrams, code examples, real-world references, and a quiz.
Open the Metrics lessonRelated lessons
Lessons that touch on Metrics as part of a larger topic.
Metrics Aggregation
Combining metrics across instances, time windows, and dimensions, from raw data to actionable dashboards
intermediate · observability monitoring
Metrics Collection
Gathering numerical measurements from your systems, the raw data behind every dashboard
intermediate · observability monitoring
Counter Metrics
Monotonically increasing numbers that count events, requests served, errors thrown, bytes transferred
intermediate · observability monitoring
Gauge Metrics
Values that go up and down, temperature readings for your systems
intermediate · observability monitoring
Time-Series Metrics
Data points indexed by time, the heartbeat of every monitoring system
intermediate · observability monitoring
See also
Related glossary terms you might want to look up next.
Observability
The ability to understand a system's internal state from its external outputs. Built on three pillars: metrics, logs, and traces.
SLI
Service Level Indicator: a quantitative measure of service behavior, like the proportion of requests faster than 300ms. The raw metric that feeds SLOs.
Alerting
Automatically notifying engineers when metrics cross predefined thresholds. Good alerts are actionable, not noisy. PagerDuty and Opsgenie route alerts to the right on-call person.