Microservice architecture is the new normal these days, especially with the growth of distributed systems and need for scalability and fault tolerance. You partition your application in small “two pizza team” services, and they collaborate together to achieve the application goal. This is really nice from an architectural point of view, since development and maintenance are much easier, there is flexibility to change and adopt, domain knowledge is easier to be grasped by new people in teams and many more. But these perks come with a price.
The bad part of microservice architecture according to our experience is deployment, monitoring, and testing. There are many pieces which are harder to monitor and deploy, since applications consist of many small services, which have integration points, so conventional testing is not enough.
In this blog post, we will try to concentrate on testing and monitoring aspect, especially blackbox testing as health check whether the system is doing fine. We will explain why convenient unit and integration testing alone won’t be enough, explain blackbox testing as a concept and give all prerequisites which should be done in order to achieve successful blackbox testing and monitoring.
Types of Tests
In general, when it comes to testing levels, we have unit testing, integration testing, component interface testing and system testing. Unit testing is simply a must, it is the verification that even a smallest piece of code works, and developers gain comfort that the code they produce works as designed alone in isolation. Integration test, on the other hand, should verify that class or piece of code works well with other pieces in the application. Those two levels are widespread and every developer should be familiar with them.
As the system grows in complexity, other two levels start to become more and more important. Component interface testing verifies that data being passed among components in the system is done right and it verifies application flow, apart from simple integration between components. Usually, here the business flow is tested across different components in the system. Blackbox testing is a type of component interface testing.
At the highest level, we have system testing, which usually verifies non-functional requirements of a system. Each system has them, and we must be sure that the system can support that many users, fail of certain amount of nodes, certain latency etc. For these types of tests, the companies usually build software which deliberately simulate this type of environment and monitor how the system reacts. A good example is netflix Simian Army which is a group of system tests whose goal is to destroy nodes, data centers, regions, add artificial latency and test system performance.
Blackbox Testing for Microservices
Blackbox testing is a perfect fit for microservices. We try to divide the application into many single purpose modules, and usual business flow needs to touch at least a couple of them to finish the task. We need to be sure that those business flows work as expected. In our application we used Spring Boot for microservice configuration and Spring Retry mechanism to orchestrate test execution. There is a problem with latency and async execution of certain parts of the system and retry mechanism with pause in between is a great way to be sure that you have provided enough time for your test to be completed successfully.
We had a couple of prerequisites in order to perform blackbox tests successfully. The first thing that needed to be done was creating a HTTP client, since we needed to trigger our flow and communicate with our tests programmatically. We used Jersey HTTP client for that. Then we needed a way to generate data for our tests, so we created microservice whose sole purpose was to generate near production data using a HTTP client. We needed to setup the monitoring and scheduling of those tests, so we configured the main test scheduler to run all tests for 10 minutes each. We saved only the results of the latest run (for now, later we might save all runs so we can audit) and showed results over HTTP in form of JSON report. This way we can later integrate GUI for monitoring and we can query results with HTTP request.
After all this setup, we isolated business flow in our application which was critical to customers and for which we had to be sure that it worked properly. Basically, we have a batch job which is used for import, it starts with FTP file upload, parses that file, generates request to other microservice, and that microservice stores data to DB. So, here we are testing FTP server microservice, batch microservice, oauth server, gateway server and application which store data. Cleanup is done on the end. What is most important, and where retry mechanism comes as helpful, is the part when you are waiting for the result to be stored. We started with FTP upload and since it can take some time to store data, we needed a mechanism to wait (we did not want to use Thread.sleep) so we created retry with a 30 second pause and maximum 5 attempts to read data, which gave us a lot of window to verify and make sure this flow works.
As for monitoring, if the test fails we place a notification in message queue which can be consumed with anything (currently there is a small service which sends SMS after X failures) and we have integration with Rollbar which provides a nice overview of all failures.
Microservice architecture is great for flexibility, it enables you to change small parts of a system without too much risk. Also, it is great for maintenance since parts of a system can be maintained by small teams, which are familiar with all internals of that part down to the smallest details. It is also great for scaling, if there is a bottleneck in some part you can multiply servers. But the problem arises from testing, monitoring and debugging since there are many more components which can produce errors. With that big systems, you must remain in control and the only way to achieve that is through constant tests and good monitoring tools. Blackbox testing with notifications on failure provides that possibility for important business flows.