Welcome back to my blogging adventure. I want to take a break today from walking through the cybersecurity architecture for an analytic based incident response process to step back and look at the big picture in applying a mature analytic lifecycle to a cybersecurity analytic platform. I’ve had many conversations and read many articles recently debating what type of user interface is best, what are the best types of visualizations, and what is the best way to interact with this type of platform. Like folks arguing over what the best species of tree; I think we need to step back a bit and take in the beauty of the forest at large. Each tree has its place in the larger ecosystem and its more about balance than what monoculture is the best one.
One Size Doesn’t Fit All
Now, repeat after me; incident response is a (complex) process, not a product or platform. Our job as technologists is to assist in automating that process; not creating a new shiny dashboard. I believe Wolfgang Pauli would look at the debate over what is the best visualization or user interface and would repeat his famous quote: “You’re not right; you’re not even wrong.” The fallacy is believing that this complex process has a single user interface, single user, or even single goal to achieve through that new shiny dashboard chock full of your neat visualizations. To really understand the problem space we need to step back and look at the big picture – the analytic lifecycle.
Making the Unknown Known
The core problem we are trying to solve by applying analytic concepts to the cybersecurity space is to detect the unknown and make it known. So, what do I mean by the unknown? Within the incident response domain there are two major challenges to an efficient and automated response:
- Our ability to detect
- Our ability to respond
If we plot these to challenges as the axis of our problem graph from “I don’t have a clue” (unknown) to “I have an automated solution” (known); we can graph our relative ability with each threat vector (motive, method, opportunity) into a solution space.
Detect and Respond Graph
Now, if we plot where our traditional rules/signature based security tools work along with the risk of new emerging targeted attacks we generally look like the below. Yes, your graph will look different; however, the general point is that there is little, if any, overlap with our existing tools and where the risk lies.
Rules Versus Risk
Taking this approach further and bring us back to my main argument of there isn’t one dashboard or visualization-size-fits-all; let’s decompose this graph based on splitting down on unknown/known for each axis. This derives the below four panel graph below.
Unknown/Known Four Panel Graph
Now, we’re getting somewhere. Looking at this graph we see we have four high-level problems we are trying to solve:
- Unknown/Unknown: The first step in realizing that we have a problem is accepting that we may not have the answer. We may not have the right mental or computational models; or even the right data to find bad things.
- Known/Unknown: We’ve invested time and energy brainstorming what could happen, sought out and collected the data we believe will help, and created mental and conceptual models that SHOULD detect/visualize these bad things. Now, we need to hunt and seek to see if we’re right.
- Unknown/Known: We’ve been hunting and seeking for some time tuning and training our analytical models until they can automatically detect this new bad thing. Now we need to spend some time formalizing our response process to this new use case.
- Known/Known: Great, we’ve matured this use case to a point that we can trust our ability to detect; maybe even to the point of efficient rules and signatures. We have mature response playbooks written for our SOC analysts to follow. Now we can feel comfortable enough to design and implement an automated response for this use case.
If you look carefully, you’ll see that there isn’t a single path from Unknown/Unknown to Known/Known but a cycle where each piece of new information starts the analytic lifecycle anew.
Analytic Lifecycle: Four panels, Four Skills, Four Needs.
Hopefully by now you see the logical fallacy of arguing over what is the best user interface or visualization. At a minimum, we need four different user interfaces; each with their own specific visualization and workflow needs.
The four roles of the analytic lifecycle
Unknown/Unknown: Data Science
The act of coming up with new mental and computational models is the role of the Cybersecurity Data Science Group. This group trolls the underground communities for chatter on new approaches and techniques to break into systems (methods), for new threats such as financial gain and hacktivism (motives), and the ever present feed of new vulnerabilities in our systems (opportunities.) Using this information, domain experts (ethical hackers) come up with scenarios to leverage this new information to compromise our systems in new and unique ways. Individuals trained in statistical techniques, data engineering, and programming — Data Scientists — design and test new mental and computational models that may have the ability to detect these new techniques. This may involve creating and/or collecting new data to make the models work. As an emerging model (descriptive analytics), new ways of visualizing the model’s output may need created so that humans (Hunter/Seeker) can verify and validate the utility of the new model and provide feedback to the Cybersecurity Data Science Group. As the models mature they shift from descriptive in nature to anomaly detection (predictive analytics) in nature; this drives the new requirement of shifting from batch/historical data modeling to stream based models. These models are injected first to the hunter/seeker team for validation and then to the SOC analyst team for training.
Better known as the impossible task of digging through hayfields of haystacks for that bright new shiny needle of bad activity, the hunter/seeker group consists of domain experts (ethical hackers and senior SOC analysts) trained in statistical techniques that shifts through the collected historical data using generated by the computational models, and just raw data that has been enhanced and enriched using the library of visualizations at hand. They are tasked with the primary responsibilities of providing feedback to the Cybersecurity Data Science Group on the quality of the models with input being used to convert the descriptive analytic models into predictive analytic models that automate detection, feedback on visualization techniques that may make their jobs easier and more efficient, and shortcutting manual response as they trip over issues. They need a user interface that they can customize and adapt in an ever evolving manner as new tools and techniques are discovered and applied in unexpected ways. They also need to be tied into the response workflow so they can group the data relevant to the shortcutted response and inject it into the incident response workflow and case management process. The playbook the hunter/seeker team, working with the incident responders is created and injected into the incident response workflow knowledge base. As these predictive models mature, thresholds are created based on model scoring data that automated determination on what is normal versus abnormal (prescriptive analytics), it is the prescriptive analytic result that is injected into the SOC Analyst triage process.
Unknown/Known: SOC Analyst/Model Trainer
Once the Cybersecurity Data Science Group has created a streaming prescriptive analytic model for the new use case, the model is injected into the normal SOC triage process along with a customized visualization that enriches the model results with data that assists the SOC Analyst to triage and train the model. This new visualization is tied to the emerging playbook from the knowledge base to assist and guide the SOC Analyst. The SOC analyst has one job: reviewing alerts and events in the queue. These events are different then they types of events that a SOC Analyst would review from a traditional rules based tool. The prescriptive model is scoring data and determining if the activity is normal or abnormal. Only the activity that the predictive model can’t correctly score is turned into an event for triage. As the SOC analyst triages these alerts, the result is leveraged by the machine learning model to train the scoring engine. As the model matures and becomes well-trained, less events are created and more response automation is enabled.
Known/Known: Automated Response
Once the prescriptive models are trained sufficiently to detect abnormal activity, the driving concern is to automate response. This happens through two phases. The first phase is bypassing the SOC Analyst triage process and opening new incident response workflow cases for escalation and remediation for activity that is scored as definitely abnormal. The second phase is engineering an automated response that is preventative in nature that is capable of reacting fast enough to stop the abnormal activity before it can cause harm. This second phase requires the ability to mock up and simulate the response action to both determine that it would be effective in preventing harm and not cause disruption to normal business activity.
TL,DR (Too Long, Didn’t Read)
The debate what is the best user interface or visualization suffers from the logical fallacy that a single user interface or visualization is required. “You’re not right; you’re not even wrong.” – Wolfgang Pauli. By decomposing the problem, we see we have four areas that each have their own skills and user interface needs that is all part of the analytic lifecycle.
Join me next time when I return to my cybersecurity architecture series where we look at the cybersecurity data bus and how Apache Nifi can be leveraged to simplify this complex problem.