A few years back, an engineering team at Amazon ran an experiment. They artificially added 100 milliseconds of delay to their page loads. That's a tenth of a second. You literally cannot snap your fingers that fast.
The result? A 1% drop in revenue. For a company doing hundreds of billions a year, that single tenth of a second translated to hundreds of millions of dollars in lost sales. People didn't complain. They didn't write angry emails. They just... Left. Silently. Without buying.
Google ran a similar test. They slowed search results by 500 milliseconds, half a second. Traffic dropped 20%. One in five people just bounced.
Here's what makes this wild: nobody consciously thought "this feels slow, I'm leaving." It's subconscious. Your brain registers the delay before you even realize it, and your thumb is already hitting the back button.
That invisible delay? That's latency. And honestly, most developers don't think about it until production is on fire and the Slack channel is blowing up at 2 AM.
This is the single most important performance metric you'll ever learn. Everything else in system design, , CDNs, load balancers, database optimization, exists because someone, somewhere, was fighting latency.
Strip away the jargon: latency is how long you wait.
You tap a button. Stuff happens. A response shows up. The time between your tap and that response appearing on screen? That's latency. Measured in milliseconds, thousandths of a second.
Think of it like ordering an Uber on a Friday night in Manhattan. You open the app, request a ride, and then you stand there staring at your phone. The driver has to see your request, accept it, navigate through gridlock, and pull up to your pin. That entire wait, from tapping "Request" to the car arriving, is the latency of your Uber request.
Now here's how users actually feel about different wait times. This isn't theoretical, it comes from decades of HCI research:
| Response Time | What It Feels Like |
|---|---|
| Under 100ms | Your brain treats this as instant. No perception of delay whatsoever. |
| 100 - 300ms | There's a tiny lag, but it still feels snappy. Most users won't care. |
| 300ms - 1 second | Now you notice. Your brain shifts from "this is responsive" to "I'm waiting for something." |
| 1 - 3 seconds | Frustration sets in. You're consciously aware you're waiting. You might glance at another tab. |
| Over 3 seconds | About 40% of people just leave. Gone. They'll try a competitor or come back later (spoiler: they won't come back later). |
The thing most people miss is that latency is a round trip. Your request has to travel to a server, maybe on the other side of the planet, get processed, and then the response has to travel all the way back. Every single hop along that journey adds time. And it all adds up fast.
One more thing worth hammering home: latency is not about how much data you're sending. That's (we'll get there). Latency is purely about time. You could be sending a single byte, and if the server is in Sydney and you're in New York, you're still eating 200ms of physics.
So what actually happens in those 100-500 milliseconds between you clicking a link and the page showing up? It's not one thing. It's a whole chain of events, and each one adds its own delay.
Walk through this step by step, you'll never look at a page load the same way again.
When a senior engineer says "we have a latency problem," the next question is always: which kind? Because latency isn't one thing, it's five different beasts wearing the same name. And the fix for each one is completely different.
Network latency is pure physics. Data travels through fiber optic cables at about two-thirds the speed of light. If your server is in Virginia and your user is in Mumbai, that's roughly 13,000 km. Light takes about 45ms to cover that distance one way, so you're looking at 90ms minimum just for the round trip. You literally cannot make this faster without moving your server closer to the user or bending the laws of physics. This is why CDNs exist.
Processing latency is how long your server spends actually thinking. Running your code, applying business logic, serializing a response. A well-optimized endpoint might take 5-10ms. A gnarly resolver doing N+1 queries? I've seen those hit 2-3 seconds. This is usually where bad code hides.
Queue latency is the sneaky one. Your request arrives at the server, but the server is already busy handling 500 other requests. So yours sits in a queue, waiting for a free thread. Under normal load this is near zero. During a traffic spike? I've seen queue times hit 10+ seconds. Your server isn't slow, it's just overwhelmed. This is zero milliseconds at 2 AM and infinite milliseconds during a product launch.
Database latency is the time your database takes to find and return data. A simple primary key lookup on Postgres? 1-5ms. A complex JOIN across three tables with no index? 500ms+. I once traced a production outage to a single missing database index, adding it dropped query time from 1.2 seconds to 3 milliseconds. Three. Milliseconds.
Serialization latency is usually the smallest, converting objects to JSON, protobuf encoding, that sort of thing. Typically under 1ms. You can mostly ignore this one unless you're serializing enormous payloads.
Here's the thing that trips up junior engineers: total latency is the sum of all five, but the slowest one dominates. If your database is taking 800ms, shaving 10ms off your network latency accomplishes nothing. You could get network latency to literally zero and your page still takes 800ms. Always find the bottleneck first. Measure, don't guess.
The biggest companies in tech have entire teams whose job is shaving milliseconds. Here's how three of them do it, and why their approaches are so different.
Amazon's secret isn't some magical algorithm. It's geography. They have over 400 CloudFront edge locations scattered around the globe. When you load a product page, the images and static files don't come from some central data center, they come from a server that might be in your city. Maybe even your neighborhood.
But the really clever bit is pre-computation. Your "Recommended for You" section? Amazon doesn't calculate that when you visit the page. That would take hundreds of milliseconds of ML inference. Instead, they compute recommendations in batch jobs overnight and just serve the cached result. By the time you see it, the hard work happened hours ago. You get a 5ms response for what would have been a 500ms computation.
Google built their own private fiber optic network that spans the entire planet. Their data doesn't touch the public internet, it takes private, optimized routes between their data centers. That alone cuts network latency significantly.
But my favorite Google trick is what they call "hedged requests." Here's the idea: you send the same request to two different servers simultaneously. Whichever one responds first, you use that response and throw away the other one. Sounds wasteful? It is. But it eliminates "tail latency", those occasional requests that take 10x longer than normal because a server was doing garbage collection or hit a slow disk.
Oh, and Google Search serves results from RAM. Not from SSDs. Not from hard drives. From memory. Reading from RAM is about 100,000x faster than reading from a spinning disk. When you're serving billions of queries a day, that difference between 0.0001ms and 10ms per read is the difference between Google and a search engine nobody uses.
This one blows my mind every time. Netflix has a program called Open Connect where they build custom servers and physically install them inside ISP data centers. So when you stream Stranger Things, the video data doesn't cross the internet. It might travel one or two network hops within your ISP's own infrastructure.
They also do predictive . Netflix knows that if you're watching episode 4, there's an 80%+ chance you'll watch episode 5. So while you're still watching, they're quietly buffering the next episode on your device. When you hit "Next Episode," it plays instantly. Not fast, instant. The data was already there.
3 questions - Score 80% to pass
If someone asks you 'what is latency?' in an interview, which answer nails it?
A page takes 600ms to load. Network: 20ms. Server processing: 30ms. Database query: 540ms. Your tech lead asks where to focus optimization. What do you say?
Netflix streams video to 230+ million subscribers with almost zero buffering. How do they pull this off?
You don't need Google's private fiber network or Netflix's ISP deals. But the principles scale down perfectly:
Put data closer to your users (a costs like $20/month). Cache things that don't change often (you'd be shocked how many apps recompute the same data on every request). And when something feels slow, measure each piece of the chain individually, don't guess where the bottleneck is. It's almost never where you think.