Clustering Algorithms to Build a Smart Monitoring Tool With Machine Learning K

DZone 's Guide to

Clustering Algorithms to Build a Smart Monitoring Tool With Machine Learning K

Learn how to build your smart system monitoring tool by using Microservices and Machine Learning (k-mean clustering) algorithms.

· Microservices Zone ·
Free Resource

clusters of berries

Utilize clustering algorithms.

Learn how to build your smart system monitoring tool by using Microservices and Machine Learning (k-mean clustering) algorithms to leverage existing server-centric log events system to a predictive smart events system.

Why — Need a Smart Monitoring System

In the modern IT world, technology is evolving rapidly, and businesses need to keep up to stay relevant. More and more applications’ continuous integration and deployment (CI/CD) processes will depend on IT infrastructure and the platform’s stability. Most traditional system monitoring tools are server-centric and not able to handle tends based on application and system runtime experience.

You may also like: 10 Interesting Use Cases for the K-Means Algorithm

Especially, a generic monitoring tool is hard to be effective and efficient for detecting, isolating, and predicting issues from custom applications. It is a burden for our developers to dig into numerous system logs to troubleshoot problems.

Many system monitoring tools are focusing on different aspects of IT infrastructure. We are not; however, looking for a real-time monitoring tool. Instead, we need a monitoring tool that can provide predictive alters and trend analysis on specific applications, especially, a logging model adapts into your own IT environment.

What — Challenges We Are Facing

In this context, we define a Machine Learning predictive model by using the k-means clustering algorithm on existing system event logs to monitor and predict server and applications’ availabilities.

K-means clustering is one of the simplest and popular unsupervised machine learning algorithms that can be used for enhancing system monitoring by analyzing the data and providing insights regularity based on its findings.

To reduce the time and proactively finding the system issues, the smart monitoring tool will address the following challenges:

  • Tracking and analysis system performance overtime via the existing application and system log events.
  • Timeliness prediction on system overload over the time stream.
  • Analysis of the historic time fragment k-mean clustering results and provide prediction and recommendation on future system overload over the timeline.

The model is built as microservices with SpringBoot technology.

Where — Monitoring Infrastructure and Logical Structure

In this context, we will collect the various system events by using small agents implemented by SpringBoot microservices technologies. The agent will trace and filter different systems and application log files by using predefined topics and extract topics related information to the backend Machine Learning/AI processing module.

The event topics are sent to the AI processing module by publish-subscribe pattern via Hazelcast’s distributed architecture. Hazelcast provides in-memory access to frequently used data and across an elastically scalable data grid. Hazelcast IMDG is a clustered, in-memory data-grid that uses sharing for data distribution and supports monitoring.

It provides redundancy for continuous cluster up-time and for caching, distributed processing, and distributed messaging. Its capacity grows elastically with demand, without compromising performance or availability.

The image below describes how the infrastructure is going to look like.

hazelcast imdg

To collect enough information, we designed a microservice agent to keep on listening to system logs on predefined topics. The topic is defined based on your own IT environment and applications. Filtered events are encapsulated in entities and published by the Hazelcast IMDG framework.

As we mentioned above Hazelcast IMDG is a clustered, in-memory data grid that is used for data distribution and monitoring. It can be integrated with SpringBoot microservice easily by just configuring it as Spring bean and started in a server instance.

public HazelcastInstance hazelcastInstance(Config config) {
            HazelcastInstance instance = Hazelcast.newHazelcastInstance(config);
            return instance;

By using the Hazelcast IMDG open framework topic publish/subscribe pattern, a sample event entity can be defined as below. We can always build many sub-event entities to deal with various system logs.

public abstract class LogInfo {
private String nodeName;
private String projectName;
private Date dateTime;
private String ipAddress;
private String sessionId;
private boolean isNewSession;
public class AppAccessInfo extends LogInfo {
private String httpMethod;
private String uri;
private long responseTime;

Log Parser is a filter to collect auditing events on many system logs. It is implemented as a microservice and installed in each server node. By tracing system logs (Here we use Tail from Apache Common-IO 2.6 library), it will filter out the event and send them to the Hazelcast IMDG pub/sub-queues.

public class LogFileListener {
public void start() throws InterruptedException {
    TailerListener listener = new LogListener();
    tailer = new Tailer(new File(logfilePath), listener, interval);        

public class LogListener extends TailerListenerAdapter {
    public void handle(String line) {
    processTheLine(line); //convert log to object and publish


Hazelcast IMDG pub/sub-framework is used to publish the filtered large log from a clustered environment to the backend log event analyzer.

public class LogInfoPublisherHaszelcastImpl implements LogInfoPublisher {

public final String TOPIC_NAME = "topic:Sample-log-info";
private ITopic<String> sampleLogInfoTopic;

public LogInfoPublisherHaszelcastImpl(HazelcastInstance hazelcastInstance){
    sampleLogInfoTopic = hazelcastInstance.getTopic( TOPIC_NAME );        

public void publish(LogInfo logInfo) {
ObjectMapper mapper = new ObjectMapper();
String json;
json = mapper.writeValueAsString(logInfo);

How — Utilizing K-Mean Clustering Algorithms to Build an Auto-Predictive Smart Monitoring System

Since we have shown the data collection, topic event entity, publish and transformation parts, now we will build the cool part, event processing, and analysis pieces. Our environments could be far more dynamic than the static environments so the model can have many event listeners dealing with multiple topics.

Once events reach the server side, we will use Machine Learning/AI algorithm to analyze the data and find out data co-relationship and regularity and predict the server performance.

In this model, we will use the math library from Apache Commons-Math 3.6.1 with the Mean/standard deviation/K-means clustering method. K-means clustering is one of the popular unsupervised machine learning algorithms and is used to segment the interest area from the large background none regulated data.

In this model, we select the K-mean method to quickly and objectively provide the first set of clustering results on a system performance alert and provide new predictions based on temporal difference.

@Scheduled(cron = "${appAccessLogMonitor.cron}") 
public void doAnalyze() {
logMonitorFactory.process(projectName, buffer);
buffer = new ArrayList<LogInfo>();

Once the backend processer starts, it will load events to the buffer since the process pattern is designed as an asynchronized mode to avoid block events stream from distributed nodes. A factory pattern is adopted for event processing to deal with different event types on various audited topics.

KMeansPlusPlusClusterer<DoublePoint> kMeansClusterer = new KMeansPlusPlusClusterer<DoublePoint>(2, 100);
List<DoublePoint> listOfPoints = new ArrayList<DoublePoint>();
for (double point : points) {
double[] singlePoint = {point};
DoublePoint dp = new DoublePoint(singlePoint);
List<CentroidCluster<DoublePoint>> clusters = kMeansClusterer.cluster(listOfPoints);
CentroidCluster<DoublePoint> c1 = clusters.get(0);
CentroidCluster<DoublePoint> c2 = clusters.get(1);

Threshold configHitsAndTime = retrieveThreshold(statReport.getUri();

//compare 2nd cluster data to threshold

Training and Learning

By using the k-means clustering algorithm, system events are processed into several categories based on response time. Initially, the system generates several different randomly-initiated points in the data and assigns every data point to the nearest centroid.

It’s hard to get any accurate predictions from the initial raw clustering centroids. The model needs to be trained in your system environment and you preferred topics with the following approaches. In this model, we use system web application server logs and exception logs for analyzing and adjusting the learning model.

  • Adjust k number to reduce clustering centroids that make noise.
  • Adjust the threshold on alerts deduced from clustering centroids.
  • Adjust the event topics to enhance accuracy that clustering centroid detects.
  • The training process could be simple or more complex based on your expectations. After training the model, the resulting cluster can be used to identify exceptions if one node response time is far beyond normal. To present the result more directly, we use open source JFreeChart to present the clustering results and probability histogram on system server performance exception.

    Once the system issue is identified, an email will be sent out to the system administrator with the server name, URL, timestamp and nodes details which are collected from system log files.


    Moreover, the second pass k-mean clustering on the temporal difference.

    The model and training process above are basic methods to identify the outstanding event or issues in a system. Comparing with most commercial monitoring tools, this model is not limited to an in-house application, an operation system, a middleware application server, or a database application. It is implemented as an independent microservices and utilizes an existing system or application log files and can be customized for many purposes.

    A one-time k-mean analysis is still a responsive monitoring method that sends notification only when system exceptions are detected. As we have collected 1st pass k-mean analysis results, the chronological results can be stored in a repository as structure data and using the offline analyzer to process events again. The result will be very interesting since we add the time factor into clustering.

    The clustering result will be subject to fluctuation with the duration of the time monitoring. That is what we are looking for to predict the potential system or application exception in a time segment. As the 1st pass analysis, the threshold and k value will be adjusted for training the second pass duration analyzing.


    In this article, we built up a distributed smart monitoring system on tracing system performance and exceptions. We utilized the Machine Learning k-mean clustering algorithm to analysis collected log events and provides recommendations to system administrator on exceptions. The implementation is all based on open source and using SpringBoot microservices, Hazelcast IMDG, Apache Commons-Math, and JFreeChart.

    It is a simple independent monitor tool and can be used for system analyzing on many perspectives. Performance monitoring and prediction are the basic steps of the model. It is an open structure and good for analyzing many environments' behaviors with hosted applications.

    In the next step, we can use the same event publish/scribe framework and adjust the Machine Learning algorithm to add policy iteration learning on existing system logs exploitation and exploration.

    I hope you found this article helpful.

    Further Reading

    Cluster Analysis and Big Data: The Basics

    How to Cluster Images With the K-Means Algorithm

    Deciphering K-Means Algorithm

    hazelcast imdg, k-means algorithm, machine learning, microservice, monitoring, smart alarm, springboot

    Opinions expressed by DZone contributors are their own.

    {{ parent.title || parent.header.title}}

    {{ parent.tldr }}

    {{ parent.urlSource.name }}