Improving the Reliability of Microservices
A developer dives into how fault tolerance can be built into a microservices-based architecture and the benefits this has on our application's performance.
Join the DZone community and get the full member experience.Join For Free
Some time back, deploying a monolithic application was so simple. You'd take out a bare metal server, install all the necessary software, and heap all your application code, data, and assets onto it. This kind of application would handle massive traffic and was very easy to administer and deploy since everything was on one machine. However, it did have many inefficiencies that were eventually exposed. For example, you'd have to estimate your peak load and take out a server big enough to handle it, but a majority of these resources lay idle at normal load and were a real waste at minimal load. Furthermore, scaling this machine up remains a painful manual task. If one component needed more resources than the machine had, the whole machine had to be brought down, affecting all other components! The bare metal, admittedly, is still the best fit for certain use cases, but their inefficiencies have been rightfully taken over by the microservices architecture.
What Is a Microservices Architecture?
A microservices architecture is a software development technique that structures an application as a collection of loosely coupled services. The services are fine-grained and the protocols lightweight, thus improving application modularity, making it easier to understand, develop, and test. It parallelizes development by enabling small autonomous teams to develop, deploy, and scale their respective services independently. Microservices-based architectures enable continuous delivery and deployment. (source: Wikipedia)
So, instead of deploying all your code, databases, and assets on one server, you divide each into separate groups, which can all be executed independently of one another. So, you have code that is responsible for uploading, manipulating, and saving an image and specific storage meant for it. You have code that is responsible for checking and creating sessions and a specific database for it. You have a specific block of code to handle new user posts, another block (or service) to check it for profanity, a database to store the comment, another one to search it by keywords, and so on.
The major benefit is that, now, each microservice can be scaled up or down according to usage, thus saving precious resources. Another benefit is that you can now have a much better separation of concerns for developers and engineers, those responsible for databases, those who write backend code, the UI/UX people, and so on. Another major benefit is that since each service is working independently of one another, individual components can now achieve a higher uptime guarantee. This is because one stressed out component cannot affect another and also because upgrading one component does not mean taking down the other.
Limitations of Microservices Architecture
But they say everything has its advantages and disadvantages. One of the main disadvantages of the microservices architecture is application uptime guarantee. Huh? Yes, this can be a bit confusing. See, by decoupling the entire application into microservices, you improve each individual component's reliability but at the cost of your application's overall reliability. Let's dive a bit deeper into this.
In the monolithic bare-metal application, if the server has an issue, be it network, hard drive, memory, or otherwise, the whole application goes down. So, if your provider gave you a 99.5% uptime guarantee, then you are confident of being up 99.5% of the time, however, with the microservice architecture, each component has its own uptime guarantee. So, if your application uses 10 services, each with 99.5% guarantee, then you now have 99.5% to the power of 10 = 95.0%. This number is now not so pretty, to put this in a better context, you now expect to be down for 5 whole seconds out of every 100. The situation is much worse if your provider gave you a 99% guarantee which translates to a 90% uptime guarantee for the entire application, or that you now expect to be down for 10 whole seconds out of every 100.
Does this mean that the microservices architecture is bad? Not at all, it just means that over and above using microservices, developers need to take measures to guard their application from potential downtime. One way to do this is, rather ironically, by introducing another microservice! And one such service is Alibaba Cloud's Message Service.
What Is the Alibaba Cloud Message Service?
Alibaba Cloud Message Service is a distributed message queuing and notification service that supports concurrent operations to facilitate message transfer between applications and decoupled systems. Alibaba Cloud Message Service enables users to move data between distributed applications to achieve complex tasks, and build decoupled fault tolerant applications.
If you haven't already, sign up on Alibaba Cloud. You can use this link to get $300 in free credit to test out the Message Service.
How Can Message Service Improve Uptime Reliability?
To get a good grasp of how the Message Service can improve reliability, let's take a look at a typical group chat application. Let's say you've built a Football Fan's group chat application. Here the alias sports app have various chat groups named after popular football clubs like Chelsea FC (my personal favorite), Barcelona FC, Real Madrid, Bayern Munich, Manchester United, and so on.
Anybody can post a message into any group, and can subscribe to their favorite sports club group to get message notifications of what other fans are saying. You developed this application with massive scalability in mind, since you know how loved the football game is. You also want to make some money from it. You also want to monitor its growth, filter messages and images for profanity, provide a great message search function as well as derive some business analytics from it. The result? You hook into a bunch of cloud services for each and it works a treat.
In the end, you have a system architecture that resembles the structure below.
In the above typical setup, when a user wants to post a new message to a group, the Sports App Backend receives the request and passes it along to various microservices before informing the user of the result – or errors, if any occur.
For example, the backend might first check if the user is authorized to post a message to that particular group, then sanitize the message by striping HTML tags or other dangerous input. The message can then be checked for profanity via some Artificial Intelligence service, and, if the check passes, proceed to save it in the messages database – another cloud service. The message can then be passed along to a search optimized data store like Elasticsearch. The Sports App may also decide to add a separate service for data analytics, i.e. which club is mentioned the most and when, etc. The developer can also decide to add application monitoring to know how the request performed. Finally, the Sports App now needs to alert all members that a new message has been posted to the group.
As we've seen, the whole process can be a bit lengthy and even if they all respond in mere milliseconds, the whole chain introduces many potential points of failure. Moreover, if one service in the chain fails, then the whole request errors out.
Let's take a few steps back and recheck the business logic of the request. Does the user posting the message really care that you have to perform authentication checks? Is he really bothered that you must conduct message sanitization and profanity checks? What about where you store messages, and that you have to alert each group member of a new message. No he doesn't! This user has just seen his favorite team score and just wants to post, "Yes! 1-0 to us, woo hoo!" In other words, the user just wants to post his message and be done with it, the rest or the specifics of how to do it are not of his concern, especially if they fail.
So let's adjust our tech stack to cater for this, by using the Message Service. The new stack system architecture would resemble the structure below.
In this new architecture, when a user posts a new message, all we need to do is pass it to the Message Service and, if this succeeds, tell the user it's a success. That's it, our arrangement between us and the user is over and it only took a few milliseconds, giving them a favorable perception of our app's performance.
In a separate request, the app service will then communicate back to the Sports Backend server, passing along the same credentials and parameters as if it were the user who did it. The beauty here is that, if an error occurs, all we need to do is respond with any standard error codes like 503: Service Unavailable. What happens is that the Message Service would then retry the same request again, over and over until it succeeds or until it reaches the default message timeout of 7 days.
No need to stop there, using the Message Service, you can go a step further and decouple thing more. Such that after each service, e.g. authorization check or profanity check, you can add the next step to the Message Service so that you don't need to redo work that succeeded before.
Opinions expressed by DZone contributors are their own.