Microservices architecture based systems are becoming more and more popular in IT environments. Integration of different components and services/applications is an integral part of any application and systems. Almost all applications, which perform anything useful for a given business, need to be integrated with one or more applications. But this integration also presents huge challenges with respect to the performance of the overall integrated system. With microservices-based architecture, where a number of services are broken down based on the services or functionality these microservices offer, count of integration points or touch points increase by huge numbers. This, in turn, increases the performance challenges, which can impact the overall working of the system. This article discusses some of the key performance challenges which can impact the performance of microservices-based systems. It also presents some techniques and patterns, which can be used to avoid these challenges
Performance Challenges with Respect to Integrated Systems
Distributed computing has its own challenges and all of these challenges are not only well documented, but also are experienced by professionals working on distributed systems almost daily. With the number of touch points likely to be much higher than a ‘normal’ integration scenario, things can get real ugly. While connecting to other microservices (within the same bounded context or of some remote, external system), a lot of things can go wrong. Microservices being connected to may be slow or down. If our application is not designed to handle this scenario gracefully, it can have an adverse impact on the performance and stability of our application.
Mitigation of Performance Challenges
In this section, we will talk about some approaches and design decisions, which can help us achieve better performance, resilience, and overall stability w.r.t. integration challenges in a microservices-based environment.
Throttling is one technique which can be used to avoid any misbehaving or rouge application, from overloading or bringing down our application, by sending more requests than what our application can handle.
One simple way to implement throttling is by providing fixed number of connections to individual applications. Consider there are two vendors who call our microservice to deduct money from one account. If one vendor is a big application like say Amazon, then it is likely to consume our service more often than say a vendor which has a small user base. So we can provide these two vendors two separate and dedicated ‘entry point’ with dedicated throttled connection limit. This way a large number of requests coming from Amazon will not hamper requests coming from the second vendor. Moreover, we can throttle individual partners so that none can send requests at a rate faster than what we can process.
Generally, synchronous requests from external services/systems are throttled at load balancer/HTTP server or another such entry point.
If a microservices being invoked, is responding slow, this can cause our application to take a longer time to complete a request. Application threads now remain busy for a longer duration. This can have a cascading impact on our application, resulting in application/server becoming totally choked/unresponsive.
Most of the libraries/APIs/frameworks and servers provide configurable settings for specifying different kinds of timeouts. You may need to set timeouts for read requests/write requests/wait timeouts/connection pool wait for timeouts/keepalive timeouts and so on. Values of these timeouts should be determined only by proper performance testing/SLA validation etc.
Dedicated Thread Pools/Bulkheads
Another important design decision is to have separate dedicated thread pools for different tasks or for connecting to different microservices. Consider a scenario where, in your application flow, you need to connect to five different microservices using REST over HTTP. You are also using a library to use the common thread pool for maintaining these connections. If for some reason, one of the five services starts misbehaving by responding slow, then all your pool members will be exhausted, while waiting for the response from this service. To minimize the impact, it is always a good practice to have dedicated pool for individual service. This can minimize the impact caused by a misbehaving service, thus allowing your application to continue with other parts of the execution path.
This pattern is commonly known as Bulkheads. The following figure depicts a sample scenario of implementing bulkhead. On the left-hand side, Microservice A, which is calling both microservice X and microservice Y, is using a single common pool to connect to these microservices. If either one of service X or service Y is misbehaving, this can impact the overall behavior of the flow as connection pool is common. If instead bulkhead is implemented, as shown in the right side of the figure, even if say microservice X is misbehaving, pool for X will be impacted. The application can continue to offer functionality which depends on microservice Y.
Circuit Breaker is a design pattern, which is used to minimize the impact of any of the downstream being not accessible or down (due to planned or unplanned outages).Circuit breakers are used to check the availability of external systems/services, and in case these are down, an application can be prevented from sending requests to these external systems. This act as a safety measure, on top of timeouts/bulkheads, where one may not want to even wait for the period specified by timeout. If a downstream system is down, it is of no use to wait for the TIMEOUT period for each request, and then getting a response of timeout exception. Instead, requests should not even try to connect to these systems, during the time, when these are down.
Circuit breakers can have built in logic to perform the necessary health check of external systems, and start forwarding the requests, once these systems are up and working fine.
Most of the performance issues related to integrations can be avoided by decoupling the communications between microservices. Asynchronous integration approach provides one such mechanism to achieve this decoupling. Take a look at the design of your microservices-based system, and give it a serious thought if you see the point to point integration between two microservices.
Any standard message broker systems can be used to provide publish-subscribe capabilities.
In this paper, we talked about some of the performance challenges, which are faced, while integrating microservices-based systems. It also presented some patterns which can be used to designing systems, to avoid these performance issues. We discussed throttling, timeout, bulkheads and circuit breaker patterns. Apart from these, asynchronous integration approach is also discussed.
In a nutshell, asynchronous integration should be preferred, wherever possible. Other patterns, should be used in integration scenarios, to avoid the ripple/cascading side effect of a misbehaving downstream system.