I recently came across a blog post on OpenOpsIQ asking the question: “With the introduction of real-time logging, why cant we have a single solution for monitoring the entire stack?” In my opinion, you can! As we have seen lately from a survey we carried out across a sample of the Logentries’ 25,000+ user base, organizations are starting to apply logs for a wider range of use cases. So why would you do this, and why does using log data to create real-time dashboards for different views into your system make so much sense?
Here’s a few reasons why I think creating dashboards from your logs is what I like to call a ‘no-brainer’:
- Logs already contain valuable data on your systems. Before you add any log events from your applications, your system components (operating systems, web servers, application servers, databases, load balancers, routers, firewalls etc.) already produce log data that contain valuable information on performance, response time, who is accessing your system & from where etc. Adding your own log events into the mix from your software applications can give you the entire picture, but there’s already a lot for free that you should really take advantage of.
- Adding new log events is easy peasy – i.e. they scale if you need to add more data, and adding them doesn’t require some complex library integration. Simply add a few well structured log events to your logs to get more insights into your system, e.g. add a new feature, then add some log events that capture its usage at an individual user level for example. Then simply track these events to understand if, when and how that new feature is being used after its release. Some tips on how to better structure your log events can be found in this great post by Ryan Daigle entitled “5 steps to better application logging.”
- Logs are decoupled from your system: One of the beautiful simplicities of using your logs as data, is that your system does not end up being tightly coupled with your APM tool or web analytics solution. What do I mean by tightly coupled? Well if you are using an APM tool for example you generally have to integrate their monitoring libraries or agents into your system so that it is instrumented and the APM tool can start to capture some system traces, performance metrics and resource usage information for example. This can not only impact your application performance but also means that your application is essentially locked into using this solution unless you are prepared to rip out the library from your application code. With logs, this isn’t the case. You simply log your events to disk, or syslog for example, and then you can use a log management solution to extract and visualize the important data. If you decide you don’t like your logging provider you can simply send your logs to another service or solution, without the need to rip out any libraires or interfere with your application source code.
- Logs can visualize whatever data you add into them: With log data you are only really limited by your imagination – what you use them for really depends on what you put into them. Internally at Logentries, a few things we use our logs for include tracking user sign-ups and feature usage, identifying performance threshold breaches, understanding system resource usage, tracking marketing campaigns via pixel tracking, visualizing total $$$ sales per day … the list goes on…
- Logs can be generated from every component and device in your stack: Logs can be used to give a complete end-to-end view of your system and are generally produced by every component in all layers of your stack. I recently wrote a blog post on how logs are particularly useful when trying to get visibility into cloud components that can otherwise be considered as black boxes – in short, the blog outlines how cloud services, that you can not instrument with traditional APM solutions, produce log data that you can be used to get visibility into those cloud components and services. Furthermore, you can now also capture logs from your users’ web browsers, or mobile devices in real time that give true end-to-end visibility of your application from the client device, though your middleware components and all the way to the database – so that you can also track events through complex stacks.
- Logs maintain the evidence: Finally, and most important of all in my opinion, is that dashboards based on log data have an important property that does not exist when creating dashboards with many other approaches - i.e. your logs maintain the evidence! This means that if there is a spike in the number of signups or an increase in your customers’ using a particular feature, you can quickly validate what caused that change. Validating your data is something that can be particularly painstaking when using APM, web analytics tools or home grown metrics dashboards.
When trying to validate metrics with other monitoring approaches the process usually goes as follows:
- you see a sudden spike in one of your metrics, for example signups are up 200% from yesterday.
- knowing that there is no such thing as a free lunch, and the fact you didn’t kick off any new marketing campaigns recently, you wonder how signups could have increased in this way?
- you ask one of your developers, who is responsible for building your home grown metrics dashboard, to check this out and figure out what was responsible for the change.
- the developer is pretty busy and comes back a day or two later, after checking the code and one or two databases, explaining that you’ve been spammed and that the increase in signups was a result of some spammer signing up for a bunch of accounts
- you knew there was no such thing as a free lunch but it’s two days later and you feel a little frustrated having to wait so long for an answer
If you’ve created your dashboard from your logs the process looks more like this:
- you see a sudden spike in one of your metrics, for example signups are up 200% from yesterday
- knowing that there is no such thing as a free lunch, and the fact you didn’t kick off any new marketing campaigns recently, you click on the spike to drill down into the log data (i.e. the evidence) and look at the list of people who signed up today – and you immediately see the same email address pattern over and over again and you see that someone has been spamming you – QED
- No Search required: Our new dashboard is available out of the box and requires no setup or requirement to run complex search queries on your data. It gives you an immediate view on important trending events and data volumes from your different systems.
- Track event volume and identify trends in important events: The dashboards show volume of data from your different components, the distribution of these events over time, as well as how important events you have tagged are trending across your systems.
- Drill down to view the evidence: The dashboards are completely clickable – so you can drill down into your logs to validate any spikes or trends in your data. Spikes and trends can be easily identified such that you can easily dig into these to better understand their root cause.
- Share insights across your team(s): Because they have been designed to be easy-to-use and do not require complex search queries to build, the dashboards can be used to easily share insights related to your systems across different teams in your organization such as development, test, support, devops, product…and more. As I said above this is really only limited by what data you capture in your logs.