Mind the Timeout Configurations in WSO2 Enterprise Integrator Endpoints
In this post we will show how the timeout settings can impact the integration flows in WSO2 Enterprise Integrator.
Join the DZone community and get the full member experience.Join For Free
Because the WSO2 Micro Integrator acts as a middle layer in a distributed system architecture connecting different systems, one of the main aspects that it needs to deal with is failures. One of the things that need to be considered in this communication is Timeout. Usually, this is a thing that is not considered while developing the integrations but can generate lots of problems in a production environment.
In this post, we will show how the timeout settings can impact the behavior of an integration in WSO2 Micro Integrator.
Happy Path Scenario
To illustrate the scenario, we will have a typical MI use case like below:
It is a typical passthrough scenario where:
- Client sends a Service/API request to MI;
- MI sends the request to the Backend Service;
- MI sends back the response to the Client;
For this very simple scenario, we will use the ProxyService below:
As we can see, it is a very simple proxy that directs to a Rest endpoint and then uses a payloadFactory in the response to generate an XML response. On normal days, it would not cause any problems, and we would have responses like below:
Then the Service Starts Receiving More Requests
Everything goes OK, and then it started receiving more requests and the backend service started to become slow:
As in the picture above, we can see the backend started to respond very slowly. like 30s, 60s, 70s.., but the client keeps receiving responses:
Then Backend Services Start Crashing…
Then something unexpected happened. The backend services start crashing and the response times became super slow:
As the response times start increasing, we start seeing errors in the client and in MI logs:
Client impact with the backend failure:
As we could see in the above error message, EI is dropping the message after the GLOBAL_TIMEOUT — 120 seconds. The first question that comes to mind is: why did it use that global timeout?
The answer is because, in the above endpoint definition, we do not specify the timeout settings, so, by default, it will use the value specified in the property synapse.global_timeout_interval set in the deployment.toml.
A good practice when dealing with remote communication is to always specify how long the socket will wait for a response. This way, an obvious solution is to specify the timeout settings in the endpoint. In our example, we consider that the Proxy will wait for 30 seconds. The endpoint will look like below:
If we try the same endpoint again, now we get the following error:
But the client did not receive any response again. And now we come to the next parameter. The responseAction specifies which action will be taken by the synapse engine if no response is received in the timeout value defined in the endpoint. The possible values for that are:
- never: This is the default value. In this case, the message will be discarded and no fault handlers will be engaged, but the endpoint will never go to the timeout state;
- discard: likewise, the fault handlers will not be engaged, but in case of a timeout, the endpoint will go to the suspended state;
- fault: in this case, the faultHandler will be engaged in case of a timeout and the endpoint will be put in a suspension mode;
Let us see the behavior when changing the response action parameter value.
Using Response Action Discard
Let us change our definition a bit to add the responseAction and set it to discard.
Like with never, it does not trigger the fault handling process, but we see different log messages saying the endpoint has become suspended:
We do not see the error logs specified in the faultSequence, and no response was sent to the client. But if we try a request in the next 30s while the endpoint is suspended, the client receives the response, and we will see the following in the logs:
So, differently from the default behavior after a timeout, the endpoint goes to suspended mode. While in this state, it will not forward messages to the backend and it will also direct the flow to the faultSequence, as we can see the error logs we defined. We can change the number of retries before suspension, the maximum time of suspensions, and even if the endpoint will be suspended by tuning the endpoint settings.
Using Response Action Fault
Now let us try our endpoint with the response action fault.
Now, when trying the endpoint after the timeout, it will suspend the endpoint and engage the faultSequence:
And the subsequent messages will go to the fault sequence while the endpoint is suspended. They will display the same error messages displayed above.
As we can see, we need to carefully choose the timeout value we will use and the action that will be taken in case of a timeout. These can lead to unexpected behaviors, depending on the configurations applied.
We can find a very good explanation of the different timeouts involved in a Request/Response flow in EI in the MI Best Practices Page.
I hope this helps! See you in the next post.
Published at DZone with permission of Francisco Ribeiro, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.