How to Use Grafana for Technical Monitoring in Software Products
Learn the basics of Grafana for visualizing metrics and other data on app performance.
Join the DZone community and get the full member experience.Join For Free
Definition of Grafana
Grafana is an open-source platform for data visualization, monitoring and analysis. Our company uses this tool, paired up with Graylog, to monitor technical state of software systems we use internally or build for our customers. Grafana allows users to create dashboards with panels, each representing specific metrics over a set time-frame. Every dashboard is versatile, so it could be custom-tailored for a specific project or any development and/or business needs.
We mostly use Grafana with Elasticsearch and InfluxDB, but there is a variety of other supported data sources (Prometheus, MySQL, Postgres to name just a few) for this software. For each data source, Grafana has a customized query editor and specific syntax.
- A Panel is the basic visualization building block presented per the metrics selected. Grafana supports graph, singlestat, table, heatmap, and freetext panels, as well as integration with official and community-built plugins (like world map or clock) and apps that could be visualized, too. Each panel can be customized in terms of style and format; all panels could be dragged, dropped, resized, and rearranged.
- A Dashboard is a set of individual panels arranged on a grid with a set of variables (like server, application and sensor name). By changing variables, you can switch the data being displayed in a dashboard (for instance, data from two separate servers). All dashboards could be customized and sliced and diced depending on the user needs. Grafana has a large community of contributors and users, so there is a large ecosystem of ready-made dashboards for different data types and sources.
- Dashboards can utilize annotations to display certain events across panels. An annotation is added by custom requests to Elasticsearch; it shows as a vertical red line on the graph. When hovering over an annotation, you can get event description and tags, for instance, to track when server responds with 5xx error code or when the system restarts. This way, it is easy to correlate with a time, specific event and its consequences in an application and investigate system behaviour.
Our Best Practices With Grafana
Grafana in Internal Projects
For our internal IoT solution for office weather monitoring, we connected Grafana to InfluxDB, a time series database, to visualize the changes in office weather parameters and react to them accordingly. A set of sensors measure temperature, humidity, atmospheric pressure and CO2 level in every zone of our office; these parameters are collected and visualized with Grafana graphs on a large kitchen monitor and online.
This way, we keep constant track of air quality parameters, and our Office Manager reacts to the changes: opens windows if the level of CO2 is too high, turns AC and humidifiers on and off.
Through Grafana-displayed time series graphs and annotations, we analysed trends in office weather changes over months and seasons. We also used the tool to visualize some useful widgets and pieces of information (weather forecast, currency exchange rates, internal calendars) on a large kitchen monitor.
How to Use Grafana in Custom Web Apps
Grafana + Graylog
We use Graylog to store and manage the logs of web applications and monitor their performance both in the development and production phases. Grafana is the tool that “translates” the logs stored in Graylog into visual forms for analytical and system monitoring purposes. For one of our ongoing projects, Grafana can figuratively be called a UI for a web application load and performance as well as customer flow. Graylog and Grafana exist independently from each other, and we made no custom and complex integration to connect them. Since Graylog stores all log data in Elasticsearch, one of Grafana’s data sources, we simply use certain Elasticsearch index where the logs are stored to connect Grafana to Graylog.
Visualizing Metrics in Grafana for Web Applications
Pure text logs or error notifications are not “interesting” to Grafana as its main purpose is to visualize the data in graphs, charts, and tables. We wrote a custom module for Django to collect the data we’d like to track for every web/worker request and response processed. It is not just the success/failure status but a set of structured fields (both general and project-specific), such as
- App version
- Uunique ID of every request
- Response time and status
- Error code (if any)
- IP address from where the request was sent
- User info (e-mail, username for registered users, role, permissions)
Django pushes custom structured analytical record into Graylog, which stores them in a separate stream. Though these data could be visualized by native Graylog dashboards, they are not that good-looking as Grafana’s. So we make Grafana read these analytical data and visualize them. This way, we keep track of the application performance and load both in real-time and in retrospective.
Grafana as a Debugging Tool
Primarily, Grafana dashboards help us in debugging the application. If the end customer reports a problem, Grafana gives us a way to distinguish between errors on customer/server side and real bugs or loopholes in application logic. We track all web requests initiated by the customer (using e-mail address), app admins and application itself within a given time-slot and find whom to blame by elimination.
We also do debugging and bug fixing if we notice an anomaly on the dashboard with application load and performance graphs. The following example of a Grafana graph shows the response time to web requests during a certain time-frame. For every web request, we could track a max, min and average response time. If we see a request that took us too long to process, we could scale a certain part of the graph and investigate the issue.
Another graph shows system load over a set time-frame and is useful for traffic tracking. If we see an unusual spike in activity on the graph, e.g. in non-business hours or on weekends, we investigate it. It could be caused, for instance, by Google crawlers who index the website content or evil bots scanning our system for vulnerability. Again, each case is investigated and addressed accordingly.
Grafana has a built-in alerting engine (e.g. email or Slack notifications) per some conditional rules. We do not use this option of Grafana as we have all notifications configured on Graylog side. However, some issues in system performance could be seen only after a runtime, e.g. unusually long response time to a web request. We would not receive a Graylog notification about this, yet the anomaly would be clearly seen in Grafana graph. So, both tools go hand-in-hand when we get to know about an issue: we check Grafana to understand what happened and why on the high level, then dig deeper in Graylog using a specific ID of a request.
Unlike Graylog, used both for apps under development and in production, Grafana is used only for the apps in production. The only exception when it is used for an app still underway is performance testing. We emulate system load with JMeter, then check Grafana dashboards to see how it responds.
Grafana for Business Analytics
Apart from performance tracking and debugging purposes, Grafana dashboards are a powerful tool for informed business decisions. When set up properly (preferably, in tandem with Google Analytics), Grafana can visualize custom analytics on user behavior in the system in the form of pie charts, time bar graphs, and other graphics. Based on these, product stakeholders could make the decisions on further scaling the application, adding or removing some functionality and improving customer journey.
Since the above dashboard is more business-oriented, developers use it internally, more like a collateral tool to keep abreast of the customers flow in e-commerce applications: signups, logins, orders placed within a set time interval, etc.
Here are a couple of real project cases where Grafana helped to improve the usability of a web app.
- Via Grafana, we regularly monitor the status of recurring orders in the system and filter failed ones. These orders are subscription-based, which means they are generated in the system each month, and money is automatically withdrawn from the customers’ bank accounts. Sometimes, payments fail (not enough money / financial institution refuse), so system admins check the case and contact the clients to re-generate the order manually. This way, no order is left behind, so both clients and vendors are satisfied.
- Using Grafana-generated reports for an eCommerce app, we figured out that a large percent of new clients jump off the Checkout page, though they already have products in their carts. This finding was backed up by Google Analytics reports, so the checkout procedure was analyzed step-by-step and improved; users are now able to complete the order in just a couple mouse-clicks. This increased the conversion rate, and consequently, the vendor’s profit.
Grafana is an important component of Logicify monitoring system both for internal and external projects. It is an open-source software with a large and active community of contributors, but what we like most about this software is its flexibility — it supports multiple data sources and allows easy customizations of dashboards and panels.
Opinions expressed by DZone contributors are their own.