Resilient Microservice Design – Bulkhead Pattern

The ability of the system to recover from the failure and remain functional makes the system more resilient. It also avoids any cascading failures.

Vinoth Selvaraj

Nov. 04, 20 · Tutorial

Likes (24)

Comment

Save

40.7K Views

Need For Resiliency:

MicroServices are distributed in nature. It has more components and moving parts. In the distributed architecture, dealing with any unexpected failure is one of the biggest challenges to solve. It could be a hardware failure, network failure, etc. The ability of the system to recover from the failure and remain functional makes the system more resilient. It also avoids any cascading failures.

Why Bulkhead?

A ship is split into small multiple compartments using Bulkheads. Bulkheads are used to seal parts of the ship to prevent the entire ship from sinking in case of a flood. Similarly, failures should be expected when we design software. The application should be split into multiple components and resources should be isolated in such a way that failure of one component is not affecting the other.

For ex: let's assume that there are 2 services A and B. Some of the APIs of A depends on B. For some reason, B is very slow. So, When we get multiple concurrent requests to A which depends on B, A’s performance will also get affected. It could block A’s threads. Due to that A might not be able to serve other requests which do NOT depend on B. So, the idea here is to isolate resources / allocate some threads in A for B. So that We do not consume all the threads of A and prevent A from hanging for all the requests!

Sample Application

We are going to use the same application which we had considered as part of the previous articles.

Source Code is here.

To understand the use of bulkhead patterns, Let's consider this in our application. Our product-service has 2 endpoints.

/product/{id} – an endpoint that gives more details about the specific product along with ratings and stuff. It depends on the results of the rating-service. Users updating their rating, leaving comments, replying to the comments everything goes via this endpoint.
/products – and endpoint which gives a list of products we have in our catalog based on some search criteria. It does not depend on any other services. Users can directly order products (add to cart) from the list.

Product-service is a typical web application with multiple threads. We are going to limit the number of threads for the application to 15. It means product-service can handle up to 15 concurrent users. If all the users are busy with knowing more about the product, leaving comments, checking reviews, etc, users who are searching for the products and trying to order products might experience application slowness. This is a problem.

ProductController

    Java
   
xxxxxxxxxx

@RestController
@RequestMapping("v1")
public class ProductController {
    @Autowired
    private ProductService productService;
    @GetMapping("/product/{id}")
    public ProductDTO getProduct(@PathVariable int id){
        return this.productService.getProduct(id);
    }
    @GetMapping("/products")
    public List<ProductDTO> getProducts(){
        return this.productService.getProducts();
    }
}

ProductService internally calls the RatingService whose implementation is as shown below.

    Java
   
xxxxxxxxxx

@Service
public class RatingServiceImpl implements RatingService {
    @Value("${rating.service.url}")
    private String ratingServiceUrl;
    @Autowired
    private RestTemplate restTemplate;
    @Override
    public ProductRatingDTO getRatings(int productId) {
        String url = this.ratingServiceUrl + "/" + productId;
        ProductRatingDTO productRatingDTO = new ProductRatingDTO();
        try{
            productRatingDTO = this.restTemplate.getForObject(url, ProductRatingDTO.class);
        }catch (Exception e){
            e.printStackTrace();
        }
        return productRatingDTO;
    }
}

ProductService’s application.yaml is updated as shown below.

    YAML
   
xxxxxxxxxx

server:
  tomcat:
    max-threads: 15

If I run a performance test using JMeter – to simulate more users trying to access specific product details while some users are trying to access a list of products, I get results as shown here. We were able to make only 26 products request. That too with an average response time of 3.6 seconds even when it does not have any dependency.

Let's see how bulkhead implementation can save us here!

Bulkhead Implementation:

I am using the Resilience4j library.
application.yaml changes
- We allow max 10 concurrent requests to the rating service even when we have 15 threads.
- max wait duration is for when we get any additional requests for rating service when the existing 10 threads are busy, we wait for only 10 ms and fail the request immediately.

    YAML
   
xxxxxxxxxx

server:
  tomcat:
    max-threads: 15
  port: 8082
rating:
  service:
    url: http://localhost:8081/v1/ratings
resilience4j.bulkhead:
  instances:
    ratingService:
      maxConcurrentCalls: 10
      maxWaitDuration: 10ms

RatingServiceImpl changes

@Bulkhead uses the instance we have defined in the application.yaml.
fallBackMethod is optional. It will be used when we have more than 10 concurrent requests

    Java
   
xxxxxxxxxx

@Service
public class RatingServiceImpl implements RatingService {
    @Value("${rating.service.url}")
    private String ratingServiceUrl;
    @Autowired
    private RestTemplate restTemplate;
    @Override
    @Bulkhead(name = "ratingService", fallbackMethod = "getFallbackRatings", type = Bulkhead.Type.SEMAPHORE)
    public ProductRatingDTO getRatings(int productId) {
        String url = this.ratingServiceUrl + "/" + productId;
        ProductRatingDTO productRatingDTO = new ProductRatingDTO();
        try{
            productRatingDTO = this.restTemplate.getForObject(url, ProductRatingDTO.class);
        }catch (Exception e){
            e.printStackTrace();
        }
        return productRatingDTO;
    }
    public ProductRatingDTO getFallbackRatings(int productId, Exception e) {
        System.out.println("Falling back : " + productId);
        return new ProductRatingDTO();
    }
}

Now after starting our services, running the same test produces the below result which is very very interesting.

Products requests average response time is 106 milliseconds compared to 3.6 seconds without bulkhead implementation. This is because we do not exhaust the resources of product-service.
By using the fallback method any additional requests for the product/1 are responded to with default response.

Summary:

Using the bulkhead pattern, we allocate resources for a specific component so that we do not consume all the resources of the application unnecessarily. Our application remains functional even under unexpected load.

Other design patterns could handle this better along with the bulkhead pattern. Please take a look at these articles.

microservice Design application Requests Circuit Breaker Pattern

Opinions expressed by DZone contributors are their own.

Related

Trending