Building a Big Data Architecture for Cyber Attack Graphs
Cybersecurity is a huge deal. It can mean the difference between your enterprise thriving or failing horribly. Read on for details on how to protect your "big data" by harnessing the power of the graph.
Join the DZone community and get the full member experience.Join For Free
mitre corporation is a federally-funded, non-profit company that manages seven national research and development laboratories around the country — including the center for national security — to address issues of cybersecurity.
to be successful at cybersecurity, analysts have to keep track of large amounts of detailed information. this includes examining and tracking network and endpoint vulnerabilities, reviewing firewall configurations to ensure vulnerable systems are not exposed and tracking an ongoing deluge of intrusion detection events that necessitate responses.
in order to determine the appropriate response to an alert, a number of questions need to be answered:
- is the threat legitimate?
- what does it really mean if an alert happens to be true?
- is it related to a system that needs to be protected?
- is it a system that ultimately could be used as a stepping stone that leads to a critical service in my enterprise?
data is being continuously received through a variety of platforms, which can be placed in a security information and event management system , or siem. this places all data under one analytic umbrella and can be queried, but the system only tracks data points.
the recent data breach from target was a very methodical campaign with multiple steps that took place over a month. the entry point took place through one of target’s contractors, which received a security alert at the earliest stage of the attack, but identified it as a false alarm. had they been able to look at that event in a larger context and examined the potential repercussions of a breach in that area, they may have responded differently.
one of my favorite quotes about cybersecurity is from steve ragan of cso online: “…information exists without the means to process it in a way that’s meaningful…the little links between incidents, which on the surface look like random, meaningless threats, are often what cause the largest problems.”
in other words, it’s not the individual data points that are important, but how they are related . this suggests a graph model.
starting points for cybersecurity attack analysis
since 2001 — first at george mason and now at mitre — we’ve been working to build a way to analyze and pull together all these relevant pieces of information into a graph model.
we’ve built a tool called cauldron to analyze data in a way that prevents cyber attacks. it first takes an expression for how a network is segmented and how those segments fit together and then determines where the firewalls are located and the rules that are applied to each.
next, it examines the connectivity at a logical level and looks at known vulnerabilities across the endpoints. finally, it determines all the different ways in which an invasion would get routed through the network, including which firewalls it would pass and the rules applied to each of those firewalls. each one of those source destinations could be single movement in a potential multiple stepping-stone attack moving through your environment.
cauldron maps these steps, exposes bottlenecks and shows how an attacker could navigate throughout the environment.
cauldron also allows you to constrain a graph to points that you think are most vulnerable as well as the information you most want to protect, which is particularly helpful in a large environment:
another kind of analytic provides a ranked list of exposed vulnerabilities and how frequently they’re exposed, which provides a starting point to begin addressing security issues:
examining firewall rules to pinpoint vulnerable services — such as sources and destinations — is another great place to start. for example, you could examine the destination of one rule, which could be the source of another rule, and build a graph so that regardless of whether there are actually vulnerabilities on those services at each point, you could postulate that there are zero-day attacks, for example.
even while we built this custom code, we didn’t have a database — relational or otherwise — as a backend; everything was in xml. as we developed our code, we had some pre-determined notions regarding the type of analytics we were hoping to capture and the queries we were planning to run.
however, as our queries became more extensive, it became clear that they would require custom code, which is expensive. there were a number of things we wanted to run that we simply didn’t have time to code.
cygraph: data-driven architecture
the following is everything that should be considered when building a cybersecurity system:
unfortunately, a lot of times the cybersecurity left hand doesn’t know what the right hand is doing. however, in order to be successful, you need to know what’s in your environment, how the environment is configured, how the configuration and setup lends itself to a particular security posture, mapping what you know about your environment and its potential vulnerabilities, etc.
a focus on mission criticality — i.e., what’s important to your mission and understanding what it or cyber assets support those mission functions — allows you to develop even better cybersecurity systems. again, the idea is that you need to have an environment, a data model and a way to query and analyze all the information.
we have combined the lessons learned with cauldron with newer technologies such as neo4j graph databases to develop cygraph , a small research project that has been under development for about a year.
rather than designing a data model by building code to do the analysis, deciding the queries upfront and coding only to that set of requirements, we built a generic data-driven architecture:
capturing data in a very generic form and building the analytics based on that generic pattern provides the flexibility to extend the data model, morph the data model and then morph the analytics which, in our case, are graph queries.
but what do you do once you do graph queries? part of the work is understanding how to capture the problem domain as a graph; what are the nodes, what are the relationships and what are the attributes that need to be captured? how do you formulate the queries that solve important analytic problems?
narrowing your results
we’ve also spent a fair amount of time with graph visualization .
if you know the pattern you’re looking for, you may want to hard-code a query with a few parameters and then execute it. while we support this style, sophisticated analysts are going to need to do things that are outside the scope of the canned queries that have been coded. it’s important to provide the analyst with flexibility so they can perform exploratory ad hoc queries and pull the data they need.
what do you do with the information once it’s been returned by your query? if it’s a simple list relationship, it can be placed in a table. however, if it returns inherently unpredictable graph patterns, visualization becomes an important component of the user experience.
consider the below “attack graph” developed by using the cauldron tool:
the data model is a set of machines that are in a subnet, and each subnet contains a set of machines. each machine has one or more vulnerabilities that could potentially be exploited. machines within each subnet are connected to machines within the other subnets, so machines within a particular subnet can reach the vulnerabilities of computers in other subnets.
this is saying that that particular source machine can connect to a particular destination or victim machine that has a certain set of vulnerabilities. you could then click on one of the edges and get a list about the details for that particular set of vulnerabilities from one machine to another.
the first litmus test is to see if we can capture that data model in a neo4j property graph. the next is to see if we can get the same result through a cypher query.
in the above graph, all of the nodes are ip addresses. the blue relationships show the subnet of each machine, and the red relationships are vulnerability exposures extending from an attacker machine to a victim machine across subnet boundaries. the graph effectively shows that there are certain machines that act as bottlenecks and potential sources of attacks.
if you only want to examine vulnerabilities across subnet boundaries, you can restrict the query to only examine those relationships:
below we’ve extended our domain by adding more things to our stacks, which include those things listed on the left of the slide:
the acronym cve , common vulnerabilities and exposures, was an effort started by mitre that was taken over by the national institutes of standards and technology, nist. cve is the nomenclature that refers to a standard system for reporting known vulnerabilities about software.
cw stands for common weaknesses and categorizes vulnerabilities and the types of weaknesses they exploit. cpe platforms are the actual software platforms on which the vulnerabilities are exhibited, and cvss is a vulnerability scoring system that provides a number that ranks how severe each of the vulnerabilities are.
there are two important takeaways here: one is to be able to explore, understand and feel confident that you can capture the semantics of your environment as a graph, perform queries and get the analytic result that you need. two, we almost always end up with a large and sprawling graph, which can be reduced in size by applying specific queries that narrow the scope of the search.
using graphs to analyze multiple threat alerts
in the following example, snort — an intrusion detection system — has sent an intrusion alert. you need to determine how the alert is related to your environment, an answer you’re hoping to determine by running the query included at the top of the graph:
this particular alert includes a certain source (an outside domain) as well as a destination (one of our clients). this alert is detecting a certain kind of attack pattern using a standardized taxonomy called capec , common attack pattern enumeration and classification.
it has detected that this event is a certain kind of attack which we’ve correlated with a known vulnerability. we also know that the attack type works against a known vulnerability that’s associated with the destination machine of that alert. this indicates a legitimate alert that requires a response.
at a later point, you receive a second alert and aren’t sure whether or not to take it seriously. because you’re receiving a constant stream of alerts, without an investigation you can’t determine whether or not alerts are related or if there’s a larger pattern at play.
this particular query is looking for relationships between alert x and alert y. based on the below results, we can determine that the two alerts are part of a chain of known stepping-stones, which gives you more evidence that the attack is one that needs to be taken seriously:
you can also take information from recent alerts and – based on your environment – predict other attacks that could be made downstream:
this analysis could show that the attack on this certain database could lead to a mission-critical function, putting your entire mission at risk.
using graph databases to determine attack response
knowing that this is a serious threat and also understanding the entirety of your environment, how should you respond?
based on the previous actions of the attackers, you can write a query that will show all of the topological infrastructure paths — such as the routing and through which firewalls traffic passes between those two machines — to point out where to make changes in the environment that will block further access.
similarly, after the attack has concluded, you may want to do some forensics that will help inform future responses to attacks. a lot of times, intrusion detection systems only pick up some activities, so you may want to review logs in more detail.
given where the attacker has reached, you can query some of the different paths the attacker could have taken to get to that point and prevent it in the future.
models and queries have the potential to become extremely complicated. to address this, we’ve developed a domain-specific language for cygraph that encompasses the scope of our data model, the examination of that model, queries and subqueries.
antlr is a framework for automating. in antlr, you define your language, which in this case is domain-specific.
it has the ability to understand any instance or string that’s supposed to fit into that particular input grammar. then for each element of that parsed tree we have code that can convert it into cypher. so we have a language written specifically for a cypher domain with all the knowledge about how your model is expressed or represented, which simplifies and provides that layer of abstraction.
below is a screenshot of an actual customer dependency graph that was built over a period of years. each person in the organization was tasked with capturing the information and network assets they depended on to perform both standard and mission-critical roles.
we’ve also looked quite a bit at packet capture data — for example, trying to detect malicious activity within a network. one of the challenges we faced was that the large graphs can become too cluttered.
when walking through a play-by-play of attacks, you frequently need to focus on a particular moment. as part of the user interface visualization, we added this behavior where it highlights the most recent with that sliding window.
another use case is modeling and simulation.
we went through a project where we applied cygraph and captured a process model, which is a process flow that includes all the timing and relationships regarding how a process gets completed. we captured not only the mission processes but also the cyber attackers, the cyber defenders and all the corresponding required resources. cygraph becomes a window in the simulation.
another application i mentioned is capec, a taxonomy for common attack pattern enumeration and classification. it’s very laborious to navigate through the capec site and difficult to understand the big picture about large categories of different types of attacks and the more refined kinds of attacks within those larger groups.
we built the below navigation system for capec taxonomy to address these issues:
we’ve also done a lot of work relating to bitcoin transactions, which have a lot of issues related to cyber attacks:
in summary, the best way to build a cybersecurity tool is to build generic property graphs for flexible representation.
graph queries make it possible to focus your analysis on the relevant portions of the graphs, allowing you to pinpoint vulnerabilities and target responses. use of a domain-specific query language allows you to simplify queries and relying on data-driven architecture to inform your model provides added flexibility.
Published at DZone with permission of Andreas Kollegger, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.