Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Caching With Apache HTTP Client and Spring RestTemplate

DZone's Guide to

Caching With Apache HTTP Client and Spring RestTemplate

We take a look at server and client caching, and how Apache and Spring can help developers implement them in their applications.

· Web Dev Zone ·
Free Resource

Deploying code to production can be filled with uncertainty. Reduce the risks, and deploy earlier and more often. Download this free guide to learn more. Brought to you in partnership with Rollbar.

Server Caching Is a Hard Job

In a typical SOA web application, we have a web server that gathers data from many backend services and generates HTML output for the user's browser.

Image title

Chances are that some services are performing too slow and you are thinking about adding a caching mechanism to them. The Spring Framework has the cache abstraction out of the box and Hibernate has the second-level cache that can help you to improve the performance of services. But hold on! Caching on the service layer is not as easy as you might think. From our experience, there are two burdens that can make you regret deciding to do so.

1. You have to always be concerned about cache eviction whenever you change or develop new APIs for the service. Let's see some example situations. Suppose you have a very simple product service with the only APIs for CRUD operations. At this state, it is not yet too difficult to manage to cache for this service, you just have to cache the READ API and clear the cache when UPDATE and DELETE APIs are called.

Image title

Imagine you have to add a new API to search for a list of products using the input criteria. Instead of just thinking only about the application logic, you now have to define another cache storage for the output of this API and be careful not to forget to clear the cache in the CREATE, UPDATE, and DELETE APIs.

Image title

It can get harder still. Say you have to create a new API to update product stock when users update their carts or confirm orders. You have to write some not-so-simple code to clear the cache of all products in the given cart. Chances are that you will get called on your happy Saturday night because you forget to add logic to clear the cache of the product list of the search API, which is causing the customers to see the wrong product stocks in the search result page of your website.

2. It will introduce a nightmare into the horizontal scaling of your service. Horizontal scaling is simple. You deploy more instances of your service to share the work distributed by a load balancer. No code change and you develop the service without the concern that it will be run in multiple instances. That's true only when you haven't added caching to the service. With the cache, you have to find some way to manage them across all instances. Mature cache engines such as Ehcache support this but it is still not an easy task.

HTTP Caching to the Rescue!

HTTP standards already define a mechanism to handle caching efficiently by having the client manage the cache storage and having the server check the validity of the cached resources. All HTTP client engines are supposed to support this standard and web browsers we use everyday are an example of an HTTP client. In the simplest form, the cache storage is in the web browser which works when the web server returns an HTTP response with appropriate headers.

Image title

From the picture above, when the web browser sends a request to http://mywebsite.com/popular-products for the first time, the web server sends back the response with headers

Cache-Control: public
Last-Modified: Fri, 27 Jul 2018 12:45:26 GMT

The web browser then understands that this response is "cacheable" and that it was modified onFri, 27 Jul 2018 12:45:26 GMT , thereby the response is stored in the browser's cache. When the user refreshes the page, the browser sends the same request but this time with an additional header If-Modified-Since: Fri, 27 Jul 2018 12:45:26 GMT. The web server, finding the header, uses the date in the header to check if there has been any changes made to the resource since a specified time. If so, it returns the new version of the resource as if it is the first-time request, otherwise, it only responds using HTTP status 304: Not Modified with empty body to tell the browser to use the resource from its cache.

Explicit Modified Date Re-Validation Style

There are some variant flows for HTTP caching supported by the standard such as when you cannot use the modified date to determine if the resource has changed. You have an option to hash the entire response and put the hash value in an ETag header or you can tell the browser to keep using the resource from its cache without asking the server at all by specifying the resource's max-age in the Cache-Control header.

In this article, we will focus only on using Last-Modified together with If-Modified-Since, which we will refer to as the explicit modified date re-validation style. It has some advantages over the other variants, the only constraint is that you have to always track the last modified date of the resource, which you should do that by design even if it's not for caching.

With this style, the resource is validated by the origin server every time, which enables the safe control flow where the origin server has the authority to refresh the resource. The cost of data transferring over the network is not an issue as the parties mostly communicate with an empty body. The origin server itself has the chance to skip the heavy business logic when it finds that the resource is not modified.

Browser Caching in SOA

In the first section, we talked about the SOA application where our product service is too slow. Lets see what it looks like when we add browser caching supports to our application.

Image title

The browser sends a first-time request to http://mywebsite.com/popular-products. The web server then requests the popular products from the service using a predefined query http://product-service/products?orderBy=perchaseCount&page=0&size=10. The product service puts the headers in the response. The web server then renders the HTML and forwards the headers to the browser.

Cache-Control: public
Last-Modified: Fri, 27 Jul 2018 12:45:26 GMT

When the browser sends the second request with If-Modified-Since in the headers, the web server forwards this header to the product service. The service then uses the value in the header to check if the products table in the database has had any modifications since the specified time. When not modified, it responds with the HTTP status 304. The web server, in turn, forwards the 304 response to the browser.

Browser Caching Is Not Enough!

Caching in the web browser can help us a lot in the case where a single user accesses the same resource many times. But what about when many users access the resource? Our poor new users still have to deal with the slow responses caused by the product service.

Fortunately, in our SOA application, we have the light-weight web server in the middle tier between the sluggish product service and the users. Our web server is another kind of the aforementioned HTTP client that should support HTTP caching standard. We can handle caching between the web server and the product service in the same way we have done so with the browser and the web server.

Image title

With this architecture, the new users can have the fast result from the web server's cache. The later requests can still, optionally, use the browser cache to further help with data transferring load.

In the subsequent sections, we will show how to handle cache control headers in a product service, a typical Spring REST API, and how we use Spring's RestTemplate together with Apache HTTP Client at the web server to handle caching and forwarding the cache control header to the web browser.

Getting Started

Initialize two Maven projects for the product service and the web server.

Product Service Dependencies

<parent>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-parent</artifactId>
  <version>2.0.0.RELEASE</version>
  <relativePath/>
</parent>
<dependencies>
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
  </dependency>
</dependencies>

Web Server Dependencies

<parent>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-parent</artifactId>
  <version>2.0.0.RELEASE</version>
  <relativePath/>
</parent>
<dependencies>
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
  </dependency>
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-thymeleaf</artifactId>
  </dependency>
  <dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
  </dependency>
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-cache</artifactId>
  </dependency>
  <dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient-cache</artifactId>
  </dependency>
</dependencies>

Product Service

In product service project, we have a simple REST controller with a single API for searching products with given criteria from request parameters.

@RestController
@RequestMapping("/products")
public class ProductApiController {
  private ProductService productService;

  public ProductApiController(ProductService productService) {
    this.productService = productService;
  }

  @GetMapping
  public ResponseEntity<List<ProductDTO>> searchProducts(
    @RequestParam MultiValueMap<String, String> params,
    WebRequest webRequest) {

    ZonedDateTime productTableLastModifiedDate = productService.getProductTableLastModifiedDate(); 
    if (webRequest.checkNotModified(productTableLastModifiedDate.toEpochSecond())) {
      return null;
    }

    List<ProductDTO> productList = productService.searchProducts(params)
      .stream()
      .map(ProductDTO::new)
      .collect(Collectors.toList());

    return ResponseEntity.ok()
      .header(HttpHeaders.CACHE_CONTROL, CacheControl.empty().cachePublic().getHeaderValue())
      .body(productList);
  }
}

There are three remarkable points in this API.

  1. We have a service method to get the last modified date of the product table. Modern RDBMSs store this information out of the box.

  2. We use Spring's WebRequest#checkNotModified() to handle some of the boring work. We give it the last modified date of our resource, it then compares the date with the input header If-Modified-Since and set the response headers and status 304 appropriately. We only have to return null body if the method return true.

  3. If the resource has modified or the request header has no If-Modified-Since in headers, we process the normal case calling the sluggish method to find the product from the database. We then set Cache-Control: public in the response header to tell the client that this resource is cacheable.

Web Server

At the web server, we have an MVC controller that uses RestTemplate to call to product search API and render the result in popular-products.html.

@Controller
@RequestMapping("/")
public class ProductWebController {

  private RestTemplate restTemplate;

  public ProductWebController(RestTemplate restTemplate) {
    this.restTemplate = restTemplate;
  }

  @GetMapping("popular-products")
  public String renderPopularProductPage(
    Model model,
    HttpServletRequest request,
    HttpServletResponse response) {

    UriComponents uriComponents = UriComponentsBuilder
      .fromHttpUrl("http://localhost:8081/products")
      .queryParam("page", 0)
      .queryParam("size", 10)
      .queryParam("orderBy", "purchaseCount")
      .build();

    String ifModifiedSince = request.getHeader(HttpHeaders.IF_MODIFIED_SINCE);
    HttpHeaders headers = new HttpHeaders();
    if (ifModifiedSince != null) {
      headers.set(HttpHeaders.IF_MODIFIED_SINCE, ifModifiedSince);
    }

    HttpEntity<Object> httpEntity = new HttpEntity<>(headers);
    ResponseEntity<List<ProductDTO>> apiResponse = restTemplate.exchange(
      uriComponents.toUri(),
      HttpMethod.GET,
      httpEntity,
      new ParameterizedTypeReference<List<ProductDTO>>() {});

    if (apiResponse.getStatusCode().equals(HttpStatus.OK)) {
      List<ProductDTO> productList = apiResponse.getBody();
      model.addAttribute("productList", productList);
      String lastModified = apiResponse.getHeaders().getFirst(HttpHeaders.LAST_MODIFIED);
      if (lastModified != null) {
        response.setHeader(HttpHeaders.LAST_MODIFIED, lastModified);
      }
      String cacheControl = apiResponse.getHeaders().getFirst(HttpHeaders.CACHE_CONTROL);
      if (cacheControl != null) {
        response.setHeader(HttpHeaders.CACHE_CONTROL, cacheControl);
      }
      return "popular-products";
    } else if (apiResponse.getStatusCode().equals(HttpStatus.NOT_MODIFIED)) {
      response.setStatus(HttpStatus.NOT_MODIFIED.value());
      return null; 
    } else {
      throw new RuntimeException("Got unexpected response from product service");
    }
  }
}

In the code, there are works handled before and after calling the API. First, we check if the request from the browser has If-Modified-Since in the header, if so we forward it the API. After the call, if the response is 200: OK, we add the header Cache-Control and Last-Modified to the browser before rendering the HTML. But if the response is 304: Not Modified, we forward the response status to the browser with an empty response body.

For the injected RestTemplate, let's define the simple version for now. We will come back to add caching configuration in this class later.

@Configuration
class RestTemplateConfiguration {

  @Bean
  public RestTemplate restTemplate() {
    return new RestTemplate();
  }
}

And for the popular-products.html, we print the current time when the page is rendered, the time should not change if the page is from the cache.

<html>
  <body>
    <h1>
      Response At : <span th:text="${#dates.format(#dates.createNow(), 'dd MMM yyyy HH:mm:ss')}"></span>
    </h1>
  </body>
</html>

Browser Caching Test

Let's test if the caching is working in the browser.

Since we have to run the two applications together, we have to set them to run in different ports. We can do this by creating the file src/main/resources/application.properties in both projects and putting the following line in product service so that it runs on port 8081.

server.port = 8081

And for the web server, we would like that it runs on port 8080

server.port = 8080

Now, start both applications together, open a web browser, and navigate to http://localhost:8080/popular-products. You will see the actual generation time of the response written on the web page.

Image title

Pressing F5 to refresh the page, you will see that the time doesn't change. But if you press ctrl+F5 to force the browser to get the fresh resource from the server, you will see the changes.

If you use Google Chrome or a browser with similar features, you can press F12 to bring up the developer toolbar. Go to tab "Network" and press F5 again, you will see that the response status is 304.

Image title

HTTP Caching Configuration

Let's configure cache for RestTemplate, change the code in RestTemplateConfiguration to the following:

@Configuration
public class RestTemplateConfiguration {

  @Bean
  public RestTemplate restTemplate() {
    SimpleClientHttpRequestFactory requestFactory = new SimpleClientHttpRequestFactory();
    RestTemplate restTemplate = new RestTemplate(requestFactory);
    // BufferingClientHttpRequestFactory allows us to read the response more than once - Necessary for debugging.
    restTemplate.setRequestFactory(new BufferingClientHttpRequestFactory(new HttpComponentsClientHttpRequestFactory(httpClient())));
    return restTemplate;
  }

  @Bean
  public HttpClient httpClient() {
    return CachingHttpClientBuilder
      .create()
      .setCacheConfig(cacheConfig())
      .build();
  }

  @Bean
  public CacheConfig cacheConfig() {
    return CacheConfig
      .custom()
      .setMaxObjectSize(500000) // 500KB
      .setMaxCacheEntries(2000)
      // Set this to false and a response with queryString
      // will be cached when it is explicitly cacheable .setNeverCacheHTTP10ResponsesWithQueryString(false)
      .build();
  }
}

Most of the code is self-explanatory. There are other configuration options to try. But in order to do it properly, you have to understand the behavior of the cache engine. Chances are that you have to download the source code of the cache engine and gradually debug the application and explore its implementation. The following are some notes from our investigation of Apache Caching HTTP Client.

  • The cache key is formed by combining the following elements of the request URL to the API endpoint:hostname + port + path + query-string. This means you should be careful if the API can return different results with regard to some value in the request header.

  • By default, it does not cache requests with query strings in the URL, so you have to enable it like so: CacheConfig.setNeverCacheHTTP10ResponsesWithQueryString(false) 

  • To explicitly declare that the response is cacheable, the API should put Expires and Date in headers and the value of  Expires should be greater than the value of Date, or put Cache-Control in the headers with value in one of following entries: max-age, s-max-age, must-revalidate, proxy-revalidate or public.

  • Only requests with the methods GET or HEAD will be cached.

  • Only responses with the status 200, 203, 300, 301, 410 will be cached.

  • Responses with the header Content-Length greater than the configured maxObjectSize will not be cached.

  • Responses with the header Age greater than 0 will not be cached. Note that the header Age is the time in seconds that the object has been stored in a proxy cache. In this case, it means that only the response from origin server will be cached.

  • Responses without the header Date will not be cached.

  • Responses with the header Expires greater than Date will not be cached.

  • Responses with the header Vary = * will not be cached.

  • Responses with Cache-Control in no-store or no-cache will not be cached.

  • If the cache is configured as a shared cache, it will not cache a response with the header Cache-Control: private.

  • If the cache is shared it will not cache a request with the header Authorization unless the response explicitly has a Cache-Control value of s-maxagemust-revalidate or public.

Web Server Caching Test

To test the caching of the Apache Caching HTTP Client, let's enable its log by putting this line in application.properties of the web server project.

logging.level.org.apache.http = TRACE

Start the two applications, navigate to http://localhost:8080/popular-products. 

For the first request, you will see the log similar to the following which shows that the cache was missing and that the API returned a response status 200. 

o.a.h.i.c.cache.CacheableRequestPolicy : Request was serveable from cache
o.a.http.impl.client.cache.CachingExec : Cache miss
o.a.http.impl.client.cache.CachingExec : Cache miss [host: http://localhost:8081; uri: /products?page=1&size=10&purchaseCount=50]
o.a.http.impl.client.cache.CachingExec : Calling the backend

...

org.apache.http.headers : http-outgoing-0 << HTTP/1.1 200
org.apache.http.headers : http-outgoing-0 << Last-Modified: Sun, 18 Jan 1970 17:45:17 GMT
org.apache.http.headers : http-outgoing-0 << Cache-Control: public

...

o.a.http.impl.client.cache.CachingExec : Handling Backend response

Press F5 while in the browser and then come back to see the log. You will see the cache was hit but needs re-validation from the API server which then returned 304 to tell the cache engine that the cache is still valid.

o.a.h.i.c.cache.CacheableRequestPolicy : Request was serveable from cache
o.a.http.impl.client.cache.CachingExec : Cache hit [host: http://localhost:8081; uri: /products?page=1&size=10&purchaseCount=50]
h.i.c.c.CachedResponseSuitabilityChecker : Cache entry was not fresh enough
o.a.http.impl.client.cache.CachingExec : Revalidating cache entry

...

org.apache.http.headers : http-outgoing-1 >> GET /products?page=1&size=10&purchaseCount=50 HTTP/1.1
org.apache.http.headers : http-outgoing-1 >> If-Modified-Since: Sun, 18 Jan 1970 17:45:17 GMT

...

org.apache.http.headers : http-outgoing-1 << HTTP/1.1 304

Conclusion

Caching at the resource origin is hard. It's better to delegate the caching work to the client, which then you can scale out the resource server easily.

Caching in the web browser is free as it is implicitly supported by most browsers but it is not enough to replace the server cache because it is not shared across many users.

Middle tier caching is the best of both worlds. It removes the complexity of caching at the resource server and can also serve the cached resource for many users.

Finally, you can find the complete source code in this article on GitHub.

Deploying code to production can be filled with uncertainty. Reduce the risks, and deploy earlier and more often. Download this free guide to learn more. Brought to you in partnership with Rollbar.

Topics:
spring ,resttemplate ,apache httpclient ,cache ,tutorial ,web dev

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}