Chaos Testing Your Microservices With Istio
In this post, we discuss the idea of chaos testing and how to perform fault injection with Istio to make sure your microservices are resilient.
Join the DZone community and get the full member experience.
Join For FreeWhile architecting distributed cloud applications, you should assume that failures will happen and design your applications for resiliency. A microservice ecosystem is going to fail at some point or the other and hence you need to learn about embracing failures. In short, design your microservices with failure in mind.
Chaos Testing is a practice to intentionally introduce failures into your system to test the resiliency and recovery of your microservices architecture. The Mean Time to Recovery (MTTR) needs to be minimized in modern day architectures. Hence, it is beneficial to validate different failure scenarios ahead of time and to take the necessary steps to stabilize the system and make it more resilient.
Chaos Monkey is a popular resiliency tool created by Netflix that can help applications to handle random instance failures. Chaos Monkey randomly terminates virtual machine instances and containers that run inside of your production environment to raise errors and exception scenarios. Exposing the development team to failures more frequently assists them to build resilient services.
Fault Injection With Istio
With Istio, failures can be injected at the application layer like HTTP Errors or Delays to test the resiliency of the application. You can configure faults to be injected into requests that match specific conditions. You can inject either delays or faults into the requests. This will mimic service failures and latency between service calls.
Injecting planned errors and delays into your Production system will determine how resilient your microservice ecosystem is. It's a good way to identify if there are cascading errors if notifications are triggered to development teams when there is an outage. This happens when there is proper observability available to identify the root cause of the outage and, most importantly, recover from the failure.
Istio enables you to inject two types of faults: HTTP Error Codes and Time Delays.
Injecting HTTP Errors
The below VirtualService manifest introduces the fault injection rule to send 503 errors for 50% of the ServiceB v2 traffic:
apiVersion networking.istio.io/v1alpha3
kind VirtualService
metadata
name serviceB
spec
hosts
serviceB
http
fault
abort
httpStatus503
percent50
route
destination
host serviceB
subset v2
Injecting Time Delays
The below VirtualService manifest introduces an HTTP delay of 10 sec for 50% of the incoming traffic to ServiceB v1 -
apiVersion networking.istio.io/v1alpha3
kind VirtualService
metadata
name serviceB
spec
hosts
serviceB
http
fault
delay
fixedDelay 10s
percent50
route
destination
host serviceB
subset v1
Conclusion
Istio provides an easy way to test the resiliency of your services, The injection of errors and delays are transparent to the application and does not require any code level changes. Since Envoy intercepts all the incoming and outgoing network traffic, it handles the fault injection at the network layer itself.
Check the previous articles related to Istio Service Mesh Resiliency features:
Additional Resources:
Published at DZone with permission of Samir Behara, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments