The Challenges of Monitoring Data-First Applications

DZone 's Guide to

The Challenges of Monitoring Data-First Applications

What makes monitoring data applications, microservices, and docker containers so difficult?

· DevOps Zone ·
Free Resource

It seems that every few years, there is a paradigm shift in the way applications are built.  Back when I started, the shift was from monolithic applications to 3-tier architectures.  3-tier evolved to n-tier, which then gave way to service-oriented architecture (SOA).  From that point on, cloud-based architectures have continued their rapid pace of change. SOA led to microservices, and now, thanks to Docker and Mesos, we’re in the midst of moving onto containerized architectures.  While Docker and Mesos represent a change in the infrastructure architecture, there is at the same time even a larger shift happening in how applications are getting built around huge volumes of unstructured data.

Data and data processing frameworks, that often process data in real-time, sit at the core of data  first applications. Data first applications are primarily driven by new digital business use-cases such as real-time fraud detection, personalization, real-time Ad-Networks, etc.  These applications give enterprises an opportunity to create new user experiences quickly, engaging on multiple devices on a more rapid cycle than has ever been possible before.

These architectural paradigm shifts occur every few years. It takes a few years for the architecture and underlying technologies to gain momentum, developer supports and stability.  However the operational tooling and support invariably lag behind the architectural changes.  Consider the data first application architecture.  Today, it is easier than ever to put together a very complex, real time, data processing pipeline.  You can pump data into Kafka from your application servers, process them with Storm or Spark, in real time, then move them into a data store like HDFS/S3, Cassandra, or ElasticSearch.  Many of these technologies weren’t usable in a production environment just a few years ago, but now you’ll be hard pressed to find a company not building something with them.  OpsClarity in fact, runs 5 such pipelines, using many of these technologies, as a part of our overall Monitoring Platform.  Meanwhile, the tooling to monitor and support these technologies are far behind.  In fact, the whole mindset of how to monitor data-first production systems needs to catch up.  Existing monitoring tools and methodologies simply don’t work in this domain.

Let me explain.

We are building and managing systems with more complexity and interdependence than ever.  Problems that manifest in one part of the system can often originate in a completely different place.  Say you detected that your Storm job is reporting a drop in throughput of data processed.  Is the problem upstream in Kafka or the system that is feeding Kafka?  Or is it downstream in ElasticSearch where there’s a problem writing data to the store?  Or is there something wrong with the Storm job itself?  All very real possibilities!



Even if you’ve isolated it to Storm, do you know if it’s something wrong with your application code?  Or is something wrong the Storm cluster infrastructure?  Or, the ZooKeeper cluster which it depends on?  Is it something wrong with one of the hosts that is a part of the Storm cluster, or one the the storm executor processes running your application code?  Again, all very real possibilities!  Can your monitoring solution help you figure this out?


The odds are, no, because most existing monitoring solutions were never designed to help you navigate such complexities.  The problems we’re dealing with here are a matter of correlation and context.  Because such systems are so interdependent, it’s important to correlate horizontally and understand how different parts of the pipeline relate to each other.  And, because of how complex even individual components of the system are, it’s important to correlate vertically and understand how the application, the service infrastructure, and the system infrastructure affect each other.  Existing monitoring tools tend to try to solve the problem of collecting a whole bunch of data and presenting a whole bunch of data (usually in the form of graph dashboards), but they do a poor job of helping you navigate, much less understand, your data first application, either vertically or horizontally.

big data, containers, docker, kafka, monitoring, storm

Published at DZone with permission of Alan Ngai , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}