On the Merits of PubSub and Workflows
Learn more about Sensu, why we migrated, and why you should, too!
Join the DZone community and get the full member experience.Join For Free
Not too long ago, in the Sensu Community Slack, the question: “Why Sensu instead of Nagios?” arose. Specifically, “How do I convince my boss to choose Sensu over Nagios?” I responded to the thread but decided it was worthwhile to share my response with the wider community. At Willis Towers Watson, we moved from Nagios to Sensu 1.2 almost a year ago (and now, we’re upgrading to Sensu Go). In this post, I’ll share what we learned and why we migrated (and why you should, too).
Sensu bean, that is.
Matching modern technologies with the pubsub model
First of all, Sensu uses the pubsub (AKA, “publish-subscribe”) model for subscription checks; because Sensu uses a message bus for communication, you can publish messages to a specific “topic” and consumers can subscribe to one or more specific topics. This is in stark contrast to Nagios, where, in order to manage who gets alerted on what, you have to set up the command blocks, configure each host to use the service templates, or write some sort of config management. Said another way: with Nagios, it’s very manual, whereas Sensu is naturally set up to easily customize alerts. With Sensu, you set up all your checks (commands) in JSON files and assign them to a subscription name; after that, it’s just a matter of when you bootstrap the sensu-agent on the remote host and assign it to the subscriptions you want to use.
Why is this important? The thing about legacy monitoring tools like Nagios is they were designed for infrastructure that very rarely changed. However, in terms of today’s technology, that’s becoming increasingly untrue. With the advent of Infrastructure as Code (IaC) and container- and cloud-based infrastructure becoming the norm, the challenge of keeping monitoring up to date and cleaning up stale services and devices has become a large burden for many companies.
Now, we can easily spin up new infrastructure and — with Sensu — programmatically assign subscriptions to known checks.
In traditional monitoring, we’d watch for an event and do an action based on that event — all of which is extremely manual and requires constant vigilance, in turn contributing to alert fatigue. Approaching monitoring as a workflow can help. With a set of building blocks in place, you can customize your own workflow and automate it. Sensu has expanded on this concept with a couple tools such as check hooks, mutators, and handlers, each of which plays a specific role:
- Check hooks in Sensu 1.x are a great way to gain additional context into an event. By assigning a hooks key in your check JSON, you can run any arbitrary command on the remote host to gain additional information. For example, say you're using a
disk_checkplugin. This plugin takes two arguments warning and critical and it runs a WMI call or Linux subsystem command to give you available space on your drives. You can set up a hook to say for any non-zero or for any status code of 2 (meaning critical) run another command. e.g.,
du -a / | sort -n -r | head -n , 5, which finds the five largest files in a path sorted by largest. Learn more about check hooks in the Sensu docs. Bonus: Sensu Go makes check hooks even easier by making them a reusable block that can be shared between checks!
- Mutators are interesting, as they can actually inspect information in the event data and change the event payload based on parameters you set up. Say, in the case of some keepalive failure or disk space event, you need a different team to handle the event, and you want to change which handler receives the event. Or, say you’re using PagerDuty and you want to automatically move an event to a lower priority based on a custom status code you’ve set up in your check script/plugin. All of this becomes possible with mutators. Of course, this also gives you the ability to obfuscate data in your event or change the output data block in any way — for example, you can sanitize data before sending it to an external service, or add additional context by running server-side infrastructure queries that client doesn’t need to run locally. Learn more about mutators in the Sensu docs.
- Handlers are where workflows live for Sensu. A handler's job is to take an event (the check definition defines which handler to use) and route the payload to its final destination. I've re-written a custom handler for PagerDuty in Python where, based on a key in the check definition, I can:
- Tell the handler which
PagerDutyintegration key (
PagerDutyservice) to route the event to.
- Choose to not forward an event based on how many times it's occurred within a timeframe to help mitigate alert fatigue.
- Inspect the status code and choose to send an email instead of an alert.
- Tell the handler which
As you can see, the options are quite numerous. One additional thing to note: there is a concept of handler sets where a check defines handlers in a list and they work on the event data in concert. The way it is in Sensu 1.x, they don't fire in the order they are on the list. But there is a Community Plugin that solves this: sensu-handlers allow you to define handlers, which could act as remediation steps with much more power and flexibility than hooks because it has the ability to work within the bounds of the event data. Basically, if status code 3, then use handler A; if handler A is successful, change status code 0; else change status code 4, next handler. Learn more about handlers in the Sensu docs.
I hope this helped answer at least part of the question around why to use Sensu versus Nagios. Ultimately, it’s all about finding a monitoring solution that can keep up with modern infrastructures and provide the ability to customize your workflows. Questions or comments? Hit me up in the Sensu Community Slack: @Darth Scrumlord.
Published at DZone with permission of Jason Anderson, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.