Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Is My Service Healthy?

DZone's Guide to

Is My Service Healthy?

Learn what it really means for a service to be healthy and how to evaluate this to ensure good performance in your applications.

· Microservices Zone ·
Free Resource

The new Gartner Critical Capabilities report explains how APIs and microservices enable digital leaders to deliver better B2B, open banking and mobile projects.

It is a good practice to monitor your service and check whether it is available and/or is performing as expected. In order to do this, we need to specify what service's health term means. In this article, I will present two different definitions. However, keep in mind that you can have your own, project-specific definition. All examples are prepared in Mule ESB 4.1. If you are familiar with Spring Boot Actuator you should see some interface similarities. I have decided to use a Spring approach as it is clear and easy to read.

Service's Health

In order to efficiently monitor your services, a set of the service's health conditions should be chosen. It may be a universal list, but it may as well be tailored to your project's specification. Here are a couple of ideas that you can use:

  • Has the service started?
  • Does the service have a reachable endpoint?
  • Has the runtime created an anchor file?
  • Has the service successfully established a connection with another system via HTTP?
  • Has the service established a connection within a threshold with another system?

As you can see, this is just the beginning and it can be extended as needed. Keep in mind that health checks should be quick and simple and not too complex, as it may lead to difficult maintainance. I decided to do two approaches. The first one will be entirely focused on the question of whether my service has started. The second one will be more sophisticated, as I would expect to see if my service has established a connection within an acceptable threshold.

Anchor File

Health check condition: has our service been deployed?

We have a couple of ways to check if a service is running in Mule ESB. First of all, we may look in mule-ee.log. After a Mule service starts, in the log file, you should see a table with applications' start-up statuses, like in the screenshot below. We can tell that the health-check application from the default domain has been DEPLOYED. Mule will set it to FAILED in case of any error.

Image title

Mule run-time creates a file [application name]-anchor.txt when the service is deployed correctly. Note that the extension txt will exist for both Windows and Linux systems. In this scenario, we need to look for that file's existence within the apps directory. Using the previous example, I would look for health-check-anchor.txt. If my monitoring tool does not find this file, I should receive an alert that something went wrong.

Health Endpoint

Spring Boot Actuator

While I was implementing microservices using Spring Boot, I encountered the Spring Boot Actuator library. This library enabled a couple of simple endpoints. The most important for me were /health and /info. The first one, shown below, allowed me to easily check my application's status. As you can see, although configService and Hystrix are marked as UP, the overall status is DOWN. This means that some other condition did not evaluate correctly.

Image title


Simple Health Check

Health check condition: Has our service been deployed? Does service run?

How can we achieve that scenario? Mule does not have something like a health endpoint allowing us to check whether a service is running or not. I think that the easiest way would be to enable httplistener on a specific URI like /health. Under this address, we should receive clear status information. Like in the diagram below, this can be as simple as always returning the status UP by a service with a 200 HTTP status code.

Image title

If I am not able to reach the /health endpoint, I know promptly that something is wrong with my service. On the other hand, if I receive any response, I am happy to mark my service as running and working as expected. Let's see something more complex.

Complex Health Check

Health check condition: Has our service been deployed? Does service run? Has service established connection with external system withing defined timeout threshold?

In comparison with the previous simple health check, here we have higher expectations of our service. We expect that it can connect with an external system through HTTP protocol, or query a DB using a simple select statement. What is more, we may require some timeout threshold to be met. The diagram below depicts a simple process.

Image title

In the presented example, we are performing three different checks in parallel, two external HTTP calls and one DB call. For each call, we performa a custom status verification. For the HTTP call, it could be to check if a 200 or 201 HTTP status code has been returned. After all the steps have been performed, we compute the overall service status. Usually, if one of call is marked as DOWN, the service status is also reflected as DOWN. The most complex part here is "Verify status" and "Compute status." In these two actions, you can put as much custom login as you need.

If you decide to expose the service's status using a REST endpoint, you should also consider changing the returned HTTP status. It is a good practice to return a 200 code for status UP and 503 in case of status DOWN. Why? 200 means OK, and I reckon that DOWN status is definitely not OK. Most of all, the client code will notice that a 5xx code occurred and this is an exceptional situation which requires action.

Image title

Implementation

After this brief introduction to services' health status, it is time to see the implementation in Mule ESB. I have prepared one application that has the /health endpoint exposed. This endpoint only accepts GET requests and returns content in JSON.

A Simple Scenario

The first and easiest is to always return an UP status. As you can see, we perform this in three steps. We could do it in only one step, however, I decided to have a more generic flow. In consequence, only the first message processor will change. We'll talk m about this in the next section.

Image title

What this flow actually does is to set the status to "successful." After calling GET /health we should always receive:

{
 "status": "UP"
}

This solution is fairly simple, but it may fill your needs. If you have more sophisticated requirements like checking if we have established a connection or if we get a response within specified time boundaries, go to the next section.

Verifying Connection

The flow health-status-flow is far more complex. First of all, we get scatter gather that calls two private flows concurrently. The next two steps are similar to what you already saw: computing status and preparing the final response.

Image title

I am expecting a structure like in the example below:

{
  "status": "DOWN",
    "details": [
      {
        "serviceType": "http",
        "status": "DOWN",
        "errorCode": "THRESHOLD BREACHED",
        "statusCode": 200
      },
      {
        "serviceType": "db",
        "status": "DOWN",
        "errorCode": "CONNECTIVIT"
      }
    ]
}

In comparison to the previous example, I now have details arrays. Each item is a specific health check. For this particular example

  • Getting a response took longer than expected.
  • The connection to the database did not work due to connectivity issues.

As a result, the overall status is DOWN.

Connecting to an HTTP Endpoint

The flow that checks health is performing the request then computing the status. The ;ogic is fairly simple. If HTTP response status code is 200 than service's status is UP. Mule ESB by default would throw an exception for codes greater or equal then 400. We need to suppress this behavior. In order to treat any status code as a success we need to configure HTTP Request' response validator like below:

<http:response-validator >
  <http:success-status-code-validator values="100...599" /> </http:response-validator>

I have decided on a range from 100 to 599 because this is a standard and I should not receive anything outside this range.

If you are not up to date with the newest match and if DataWeave syntax, you may find it useful to read article DataWeave - Tip #1. To keep it short, follow the transformation set variable status. DataWeave engine adds errorCode and statusCode properties when the status equals "DOWN."

%dw 2.0
output application/java
---
{
  serviceHealth: {
    serviceType: "http",
   (using ( 
     status = if (vars.service.statusCode == 200) "UP" else "DOWN" ) { 
       status: status, 
       (status match { 
         case "DOWN" -> {
           errorCode: vars.service.reasonPhrase
           statusCode: vars.service.statusCode
        }
        else -> {}
      })
    })
  }
}

Timeout Threshold

We may also extend conditions and expect to receive a response within a specified time range. Both conditions should be fulfilled to consider the status as "running":

  • HTTP response status code is 200
  • Connection time is less than the defined threshold (if the threshold is specified)

In case of a breached threshold, I would like to provide an error code. Here is the excerpt from the transformation:

...

errorCode: vars.service.reasonPhrase match {
 case met ifthresholdMet -> $ else -> "THRESHOLD BREACHED" }, ...

Connecting to DB

How can we check database health? In Mule ESB, we need to use the Try block to handle all exceptions that can occur during a call to the database. We can use On Error Continue to continue our flow. Then, in "Transform Message," we check whether we received any errors during the call and set the status appropriately.

Source Code

The source code is available on GitHub.

Summary

To check if a Mule ESB service has been deployed correctly, we can use anchorfiles. In advanced scenarios where conditions are much more complex, it is worth it to expose a /health endpoint that will inform us about the service's status. We can define a threshold, we can perform simple calls to the DB, etc. It is totally up to you and your requirements. Bear in mind that checks should not be too complex, as it may become too cumbersome.

If you find this article interesting, please share it.

The new Gartner Critical Capabilities for Full Lifecycle API Management report shows how CA Technologies helps digital leaders with their B2B, open banking, and mobile initiatives. Get your copy from CA Technologies.

Topics:
microservices ,tutorial ,performance ,health checks ,mule esb

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}