Monitoring Web Servers Should Never Be Complex
Monitoring Web Services can become very very complex. But what really is needed to detect a failure? And how can setup and simplify your monitoring?
Join the DZone community and get the full member experience.Join For Free
If you run several web servers in your organization or even public web servers on the internet, you need some kind of monitoring. If your servers go down for some reason, this may not be funny for your colleagues, customer, and even for yourself. For that reason, we use monitoring tools. And there are a lot of monitoring tools available providing all kinds of features and concepts.
For example, you can monitor the behavior of your applications, the hardware usage of your server nodes, or even the network traffic between servers. One prominent solution is the open-source tool Nagios which allows you to monitor hardware in every detail. In Kubernetes environments, you may use the Prometeus/Grafana Operator, which integrates into the concept of Kubernetes providing a lot of different export services to monitor a cluster in various ways.
There is also a large market providing monitoring solutions running in the cloud. The cloud solutions advertise that no complex installation is required. But personally, I wonder if it is a good idea to send application and hardware metrics to a third-party service.
Monitoring Can Become Very Complex….
Independent of which solution you choose, monitoring can become very very complex. For most tools, you need to install additional agents on your servers and often you also need one or more databases to store metrics. In order to build a meaningful monitoring solution, you also need a deep understanding of your hardware and network infrastructure. And of course, you need to configure some kind of alerting to get informed when something went wrong.
Monitor your Monitoring
Even if you have successfully set up your monitoring solution you should start thinking about monitoring your monitoring tool. For example, if you run a Prometheus/Grafana Operator solution within your Kubernetes Cluster and your internal network goes down, you won't be informed. In such a situation, your monitoring solution often can not alert you because it is no longer possible to send out E-Mails or even detect such situations.
Kubernetes itself can not heal the state as it is not reaching your nodes. A similar situation can occur if your public internet is down. All this can become very painful. So you should set up a second monitoring system for your monitoring tool. Even if you run your monitoring solution as a cluster solution the situation will become more and more complex.
If you don’t see any problems with all the points I listed so far, you are lucky; stop reading and save yourself the time to do something more important. But if you can guess what I mean and if you are already annoyed by the high complexity of all these different monitoring solutions, then read on….
The Most Minimalistic Monitoring
If you think about monitoring, the question is: what really is needed to detect a failure? A failure can mean that your hardware is defective, you run out of memory or even your application behaves incorrectly. In the latter case, monitoring your hardware is of little use.
If we talk about Web Applications and Web Services, which is probably the most common case, you can test your application with a simple HTTP GET request. In the easiest way, this can be done with the ‘curl‘ command-line tool. The result of a curl GET command will be the content of your website. And it is quite easy to test the response for a specific string:
$ curl https://ralph.blog.imixs.com | grep 'Open Source Blog'
If you get a result, your website seems to be OK. And this solution is not as stupid as it may seem at the first glance. You can grep the result for very specific content within your website. For example, if you run an online shop you can test for specific text identifying one of your products. This will automatically indicate not only that your shop is available but also that your Database is up and running and it contains the data you expect. So you can see that with one HTTP GET request you check the following things:
- The network is available
- The server is reachable and responding
- The web application is responding
- The database is up and running
- Data Tables contain the correct data
If the request failed you have of course to do something. But to be honest, as an administrator it takes you only seconds to check your server disk, memory, and network status. You do not need any fancy chart diagrams or dashboards to see that your network is down.
The Muluk Web Monitoring Project
At Imixs we have done exactly this to build a very robust and super easy monitoring solution. The Open Source project Muluk provides a quite simple service to check any website or web service in your private or public network. This service can be run with Docker. You can set up the monitor with one command and only one single configuration file:
$ docker run \ -e TZ="CET" \ -e LANG="en_US.UTF-8" \ -e MULUK_CONFIG_FILE="/opt/jboss/wildfly/config.xml" \ -v $PWD/config.xml:/opt/jboss/wildfly/config.xml \ -p "8080:8080" \ imixs/muluk:latest
Within the config.xml file you can define the services you want to monitor and the alerting via E-Mail notification:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <cluster name="local-dev"> <node> <target>http://localhost:8080</target> </node> </cluster> <mail host="mail.imap.sever" port="465" user="xx" password="yyy" from="firstname.lastname@example.org"> <recipients>email@example.com</recipients> </mail> <monitor> <object type="web" > <target>https://www.imixs.org</target> <pattern>Imixs-Workflow supports the BPMN 2.0 standard</pattern> </object> <object type="web"> <target>https://foo.com/</target> <pattern>my-data</pattern> <auth type="basic"> <user>yyy</user> <password>xxx</password> </auth> </object> </monitor> </config>
The monitor starts with a simple dashboard and sends out an E-Mail notification if something goes wrong.
A monitoring object can be configured with different authentication methods and regular expressions to test the content in various ways:
<object type="web"> <target>https://foo.com/</target> <pattern>(^Muluk Web Monitor)\w</pattern> <auth type="basic"> <user>yyy</user> <password>xxx</password> </auth> </object>
If something goes wrong the tool will automatically send a message to the mail recipients defined in the ‘mail’ section of the config.xml file. Also, the tool will send out a daily notification with an overall report.
Monitoring the Monitor
As the setup is quite easy you can start the service in any environment including Kubernetes clusters. To monitor if your monitor is up and running you simply add your monitor endpoints into the ‘cluster’ section of each monitor:
<cluster name="local-dev"> <node> <target>http://my.first.cluster.com:8080</target> </node> <node> <target>http://my.second.cluster.com:8080</target> </node> <node> <target>http://my.thrid.cluster.com:8080</target> </node> </cluster>
In this way, the Muluk Monitoring Tool is monitoring itself by just checking the health status of each listed monitor. You even can run the monitor from your personal notebook.
The solution is open source and hosted on Github. If you have some ideas or improvements, join the project now.
Published at DZone with permission of Ralph Soika. See the original article here.
Opinions expressed by DZone contributors are their own.