The first component of any security program should be an alert system. Alerts are typically the fastest and most effective way to be notified when something goes wrong so you can jump into action. But alerts also have the stigma of being too noisy, throwing out false positives or requiring a lot of fine tuning to get right. After all, a minor bug in the code that doesn’t affect end users isn’t the type of thing you should be woken up in the middle of the night for.
So, what is the best approach to setting up your threat alert processing in a way that can be realistically followed in the event of an actual incident? Check out some best practices below for security alerting that are covered in the Threat Stack Cloud Security Playbook.
Dodging the "Noise": How to Set Alert Severity Levels
When something unusual happens in your cloud environment, you want to be alerted so you can respond in a timely manner. But a bunch of noisy alerts on every anomalous behavior, including brief downtime, won’t do you any good either. You need consistently accurate alerts, and you need them to be packed with context so you can always make a quick decision on whether it is a true threat that requires action. In other words, you need a Goldilocks system: one that delivers not too few alerts, not too many, but just the right amount.
A mistake that many organizations make is trying to put too many alert levels into this system. In fact, the traditional security escalation process has more than seven levels (P0 – P7). While it may seem comforting to have this many alerts laid out, the reality is that it won’t scale. Instead, that’s why we recommend having three types of alerts and corresponding processes: Critical, Warning, or Info/Audit/Log, depending on the severity of the threat.
Here’s what a simple, three-level escalation process should look like:
Staying Out of the Weeds: Eliminating False Positive Alerts
In addition to a three-tiered alert escalation process similar to the one outlined above, you should be continually baselining what is "normal" for your systems to avoid false positives. To do this, choose a cloud security platform that can aggregate historical data to build a baseline understanding of what constitutes "normal" versus "abnormal" activity on your server(s). Of note, this is best done in an automated fashion, since manual baselining is just too difficult in the world of Big Data, IoT, and BYOD, as well as an ever-evolving threat landscape.
By understanding patterns in activity across your cloud environment, you can much more accurately determine what is and isn’t worth actively logging or monitoring.
Back to Basics: Streamlining the Setup Process
You need a system of alerts that will get your attention when you need to take action on something; but you also don’t have all the time in the world to set up and fine tune each alert.
In reality, the more you can streamline this process, the more time you will have to focus on responding.
Base rules sets are a good place to start and are baked into many products. Base rule sets provide automatic alert levels based on what has been observed in other environments. For example, base rule sets can notify you if a new node is detected on the network, if there are unauthorized configuration changes, new users, or changes to access rights. There is usually some room to turn these on and off or toggle severity based on your individual organization, but they offer a basic framework to get started.
Implementing Your Cloud Security Alerting System
The best way to assess what your organization’s alerting system should look like is to develop a clear understanding of what constitutes a tier one, two, or three alert within your environment, as outlined above. And keep in mind that what one company designates as a tier one alert may not be true for you, so be sure to focus on what makes sense for your unique environment and use case. From there, select a cloud security solution that can automatically baseline activity and offer you a base rule set so you can spend less time configuring and more time acting on real issues.