Here's something that trips up a lot of developers early in their career: they think is an infrastructure thing. Something the DevOps team handles. Spin up a Redis instance, point your app at it, done.
But the most impactful caching decisions happen inside your application code. Not in infrastructure. Not in configuration files. In the actual logic where you decide what to store, when to store it, and when to throw it away.
Think about it this way. Your database is fast, maybe 5-10ms for a simple query. But your API endpoint doesn't make one query. It makes five. It joins data from three tables, applies business logic, serializes the result to JSON. Suddenly that "fast" database turns into a 50ms response. Multiply that by a thousand concurrent users and your server is sweating.
An application cache sits right inside your app and says: "Hey, I already computed this result 30 seconds ago. Here it is. Skip all that work." The database never gets hit. The business logic never runs. The user gets their response in 2ms instead of 50ms. And your server goes from sweating to yawning.
An application cache is any mechanism managed by your application code. It could be a local in-memory dictionary, a connection to Redis, or a Memcached instance. The defining characteristic isn't where the data lives, it's that your application explicitly controls what gets cached and when.
This is different from, say, browser caching (the browser decides) or database query caching (the database decides). With an application cache, you're in the driver's seat.
There are two broad flavors:
In-process cache (local) Lives in the same memory as your app. A HashMap, a ConcurrentHashMap, a Node.js Map, whatever your language offers. Blazing fast. Dies when the process dies. Not shared across instances. We covered this in the previous lesson.
Out-of-process cache (shared/distributed) A separate service. , Memcached, Hazelcast. Your app connects to it over the network. Slightly slower (1-3ms network hop), but shared across all your application instances. Survives app restarts.
| Aspect | In-Process | Out-of-Process |
|---|---|---|
| Speed | Microseconds | 1-5ms |
| Shared across instances | No | Yes |
| Survives restarts | No | Yes (usually) |
| Memory limit | Process memory | Dedicated server memory |
| Complexity | Low | Medium |
Most production systems use both. Check the local cache first (microseconds). If it misses, check Redis (milliseconds). If that misses too, hit the database (tens of milliseconds). This multi-layer approach is sometimes called a tiered cache or L1/L2 cache, borrowing terminology from CPU architecture.
Every team caches differently, but a few patterns come up again and again.
Cache frequently read, rarely changed data. User profiles, product catalogs, configuration settings. If 90% of your traffic reads the same data, it once saves the database from repeating the same work thousands of times.
Cache expensive computations. Leaderboards, analytics aggregations, recommendation results. If a query takes 500ms, caching the result for 60 seconds means only one request per minute pays that cost. Every other request gets a free ride.
Cache external API responses. Calling a third-party API? It might rate-limit you, charge per request, or just be slow. Cache the response. If the exchange rate for USD-to-INR hasn't changed in the last 5 minutes, don't ask the API again.
Don't cache everything. This is the mistake beginners make. Caching data that changes constantly (real-time stock prices, chat messages) creates a nightmare of stale data and cache invalidation. If the data changes every second and your is 60 seconds, users see minute-old data. That's not a cache, that's a bug.
The rule of thumb: if the read-to-write ratio is 10:1 or higher, caching probably helps. If it's closer to 1:1, think carefully before adding a cache.
Twitter's timeline cache. When you open Twitter, you don't want to wait while the system queries followers, fetches tweets, ranks them, and assembles your timeline. Twitter pre-computes timelines and caches them. When you pull to refresh, you're mostly reading from cache, with only the newest tweets fetched in real-time.
Shopify's storefront. Every product page on Shopify is backed by aggressive application-level . Product data, pricing, inventory counts, all cached at the application layer with smart invalidation. When a merchant updates a product, only that specific cache entry gets busted, not the entire cache.
GitHub's repository pages. Ever notice how a GitHub repo page loads almost instantly, even though it needs file listings, README rendering, contributor counts, and commit history? Application caching. The rendered README alone is cached. Markdown-to-HTML rendering is expensive, and most READMEs change once a week at most.
The common thread: all three cache aggressively at the application layer, but they're very deliberate about what they cache and when they invalidate. Caching everything blindly would give users stale data. Caching nothing would melt their servers. The art is in the middle.
The table above gives you the numbers, but seeing the two approaches side by side makes the trade-off concrete. One lives in your process memory. The other is a separate service on the network. The right answer is usually both.
Most production systems don't rely on just one cache. They stack them, check the local in-process cache first, then a shared cache like , then finally the database. This diagram walks you through that multi-layer lookup, showing you exactly how a request flows through an L1/L2 cache setup and why each layer exists.
3 questions - Score 80% to pass
What's the key difference between an in-process cache and an out-of-process cache?
When should you think twice before caching data at the application layer?
Your API endpoint makes 5 database queries and takes 50ms. You add an application cache with a 60-second TTL. During a traffic spike of 1000 requests/second, how many database queries happen per minute for this endpoint?