At a certain point in the growth of the DataSift platform our network stopped being something that was silently working away in the background whilst we concentrated on higher level problems. Soon enough TCP buffers, switch fabrics, MTU sizes, uplink subscription, global latency, jitter, router interfaces and BGP peers became every day considerations. Now whilst this is something an Operations team enjoys (one moment you’re pulling apart a TCP dump to evaluate something, an hour later you’re at the other end of the stack spinning up 200 more servers that’ll auto provision themselves at a press of a key) it presented us with a problem as to how we monitored and responded to changes in the network.
The Chief Architect of DataSift Lorenzo Alberton recently presented a talk about intuitive dashboard design which covers the many ways in which we monitor our platform but many of our fancy tools require modern tricks like 0Mq, Graphite, mCollective or ruby. Few (if any) of these tools consider the (which some would call archaic) Simple Network Management Protocol that we networkers have come to rely on.
I considered the option of polling SNMP and pushing the data upstream into our monitoring / alerting stream to evaluate if anything was wrong but that’d require another Chef role on yet another systems management server, it’d require parsing the data, setting baselines and possibly introduce delay for something that as important as a downed supervisor, a link loss or a coldstart I didn’t want to wait for.
Looking around I found various pieces of software that would receive traps and pass the alert off to an SMS provider or email but they were either paid for offerings, required a Microsoft Server or an on-premise device and none of them really felt any better than just sending a trap to Zenoss and waiting for the SMS to come through (or using Rhybudd). There were lots of bells and whistles about dashboards, integrated reporting etc bit it didn’t address immediate alerting and it all felt a bit dated; granted using dated software as an argument when discussing SNMP whose first RFC was in 1988 might sound strange but I wanted to send alerts in a modern, fast and reliable way, the PaaS way if you will and I didn’t want to have to pay for it.
A few months later and I’m proud to announce: ColdStart.io
With an in-house fork of the net-snmp daemons, some backend servers, Googles GCM infrastructure and an Android device you can get your SNMP traps alerting on your phone in less than a second of being sent, for free, just using the standard SNMP trap configurations of your network devices.
Getting started: An Arista Example
- Download ColdStart from Google Play
- Start the app to create a new Authorisation key
- Paste the following into your Arista switch;
snmp-server host trap.coldstart.io informs version 2c AuthorisationKey
- Done, your Android device will receive traps within a few hundred milliseconds of them being dispatched from the switch
The ColdStart SNMP trap servers have over 1000 MIBs deployed and a crowd sourced database for MIB descriptions to ensure you receive meaningful versions of the trap instead of cryptic OIDs or abbreviated snippets of words. If we are missing any MIBs let us know, if a trap description is wrong you can edit it in-app and submit it back to the database for review.
Elements of ColdStart (such as the Android client) are open-sourced on GitHub and you can find a great deal of other open source software written by or contributed by DataSift employees at http://datasift.github.io/