Scaling Microservices — Understanding and Implementing Cache

DZone 's Guide to

Scaling Microservices — Understanding and Implementing Cache

A guide to implementing and understanding cache in scaling microservices.

· Microservices Zone ·
Free Resource

Caching has been around for decades, as accessing data quickly and efficiently is critical when building application services. Caching is a mandatory requirement for building scalable microservice applications. Therefore, we will be reviewing three approaches to caching in modern cloud-native applications.

Image result for caching memeIn many use cases, the cache is a relatively small data storage medium using fast and expensive technology. The primary benefit of using a cache is to reduce the time it takes to access frequently queried data on slower and cheaper large data stores. In modern applications, there is a multitude of storage methods of caching data, but let’s briefly cover the two most popular approaches:

  1. Storing small amounts of frequently accessed data in local or shared memory (RAM)
  2. Storing larger amounts of data on fast or local disks (SSD)

Caching is considered an optimization along the z-axis of the AKF scaling cube, and while not exclusive, it is one of the most popular approaches to scaling on that axis.

Cache Storage

Let’s consider the first method, caching in memory. This approach was once fairly straightforward, as systems were large and many aspects of a computer program executed on a single machine. The many threads and processes could easily share RAM and cache frequently accessed data locally. As companies have begun transitioning to the cloud, microservices, and serverless, caching has become more of a challenge because service and functional replicas may not be running on the same host, let alone in the same data center.

Systems engineers have adapted to this. We have technologies such as Hazelcast that offer a shared caching model that transcends local restrictions through a secure networking model. There are also other technologies such as Redis, which offer a hash-based lookup system that can be run on RAM as well as fast disks (SSDs) for the tiered cache.

The second storage medium to consider when caching are local or shared SSD systems which may be faster than older magnetic or tape medium. These systems are usually deployed when the content is much larger than the RAM system can store. Typically, large images or video media are cached using these systems.

Cache Warming

The entire premise of caching is that disk lookups are slow, especially for large databases that have many orders of magnitude of data that cannot economically be stored in fast memory (RAM). For example, a stock trading company may keep the most recent transactions in RAM, a process called “cache warming”. The engineers at this company know that their customers will be accessing this data frequently, so they push the latest transactions into the cache system as they occur. This is a more proactive approach than waiting for a user to access data before storing it in the cache, which is the most popular method of caching and we’ll discuss that next.

Cache Hit or Miss?

Most cache implementations are a variation of the approach where data is accessed through normal methods, whether that be a database, storage bucket, or another implementation such as an API. Caching systems are an intermediary where responses are intercepted and stored in memory where they can be accessed again with much lower latency than their slower counterparts. When a request is made, the caching system checks to see if it has the appropriate response. If it does not then it is called a cache miss. In this case, the request will be passed along to the slower system to be fulfilled. The response will be stored in memory. Once this data has been stored in the cache and is accessed again, it is called a cache hit.

When deciding what to cache, and when, it’s important to take into considering the latency of various actions on computer hardware and networking. There are some great resources available to help inform these decisions on the interwebz, but as a general rule of thumb you’ll want to cache anything that is accessed frequently, does not change often, and is on media that is slower than RAM.

Measuring Cache Effectiveness and Costs

Caching is always a good idea when hit rates are high, and the overhead costs outweigh the costs of lost revenue due to unhappy customers. Paying close attention to these costs used to be incredibly important because RAM was very expensive a decade or two ago — and the industry had complex formulas to determine budgets.

Today, RAM is cheap. Use cache whenever you’ve determined the cache look-ups and updates are fast or near zero and the performance benefit is a function of the cache hit ratio. Let’s assume an average lookup time of 10 seconds and a hit ratio of 50%. With instant look-ups, the average would then fall by 50 percent to 5 seconds. Even with less than ideal scenarios (ie, the real world), we’re going to see a huge performance increase by implementing a cache.

Where Art Thou Cache?

Before we wrap up this post, we’re going to cover cache placement. Now, there are several other topics of importance when considering cache that could take up an entire blog post, and if topics such as persistence, replacement algorithms, invalidation, and sizing are of interest I highly recommend picking up the book, The Practice of Cloud System Administration: DevOps and SRE Practices to learn more.

cache locations

When implementing caching systems, you’ll want to consider where to place the caching components. Each approach has its benefits along with its downsides. Let’s analyze each approach and their use case, as well as their pros and cons

Client-Side Caching

This approach can be observed with most web browsers and has the benefit of reducing network latency and remote storage I/O.


  • Reduce load time for end-users, this makes users happy
  • Utilizes local end-user storage for cache, reducing costs for operators


  • Cache size is limited, browsers clients store a finite amount of data
  • Cache invalidation is hard, browsers/clients may not honor the server's request for invalidation. This can lead to outdated images and scripts being served, which in turn leads to a poor user experience
  • Once the cache is invalidated surges may happen in request volume which must be handled by the content provider

CDN — “Cache-in-the-middle”

This approach is used by web service providers who have larger payloads or must exert better control and geographic distribution over content, many of the benefits of local caching apply here without some of the drawbacks.


  • CDNs deploy large distributed fleets which are better suited to handle the surge in demand, resilient against denial of service attacks
  • Reliable cache invalidation mechanisms
  • Handles large documents, video and audio files
  • Chances are they’re more reliable than you are


  • Requires engineering expertise to deploy and manage
  • Slower than local cache (network I/O)
  • Costs money

Server-Side Caching

We’ve covered this extensively, but this method is preferred for keeping transactional lookups fast and secure.


  • This is by far the most reliable and fastest method of caching available
  • Useful for high volume transactions that can be kept secure
  • The highest degree of control over invalidation


  • Caches become ineffective across data centers (use a CDN)
  • Heavy reliance may cause major downstream issues if cache systems suddenly become unavailable


As we wrap this post up, it’s important to mention that not all cache systems need to be RAM-based. Large files may be cached to disk rather than reading data from cold storage in another data center. Even our CPUs use high-speed cache called L1 and L2, which reduce the burden on RAM since they are built into the CPU die.

It’s important to understand that without cache many of the other scaling approaches will not be nearly as successful for services that lookup and read data because of the overhead that is generated by I/O operations. Before you even begin to consider scaling out your services you should make sure that you have a solid caching solution in place.

In the next post, we’ll talk about how to monitor your microservice applications and identifying performance bottlenecks and opportunities for scaling and caching. The goal of observability, in general, is to get real feedback into your scaling efforts — because without monitoring and visibility you won’t know what to scale, nor will you have a benchmark to determine if your efforts were successful.

scaling microservices ,microservices

Published at DZone with permission of Kevin Crawley , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}