Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Graph Algorithms in Neo4j: 15 Different Graph Algorithms and What They Do

DZone's Guide to

Graph Algorithms in Neo4j: 15 Different Graph Algorithms and What They Do

If you're using Neo4j, then you need to be familiar with these powerful graph algorithms. Who doesn't want to make their jobs easier?

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Graph analytics have value only if you have the skills to use them and if they can quickly provide the insights you need. Therefore, the best graph algorithms are easy to use, are fast to execute, and produce powerful results.

Neo4j includes a growing, open library of high-performance graph algorithms that reveal the hidden patterns and structures in your connected data.

In this series on graph algorithms, we'll discuss the value of graph algorithms and what they can do for you. Previously, we explored how data connections drive future discoveries and how to streamline those data discoveries with graph analytics.

This week, we'll take a detailed look at the many graph algorithms available in Neo4j and what they do.

Using Neo4j graph algorithms, you'll have the means to understand, model, and predict complicated dynamics such as the flow of resources or information, the pathways that contagions or network failures spread, and the influences on and resiliency of groups.

And because Neo4j brings together analytics and transaction operations in a native graph platform, you'll not only uncover the inner nature of real-world systems for new discoveries but also develop and deploy graph-based solutions faster and have easy-to-use, streamlined workflows. That's the power of an optimized approach.

Here is a list of the many algorithms that Neo4j uses in its graph analytics platform, along with an explanation of what they do.

Traversal and Pathfinding Algorithms

1. Parallel Breadth-First Search (BFS)

What it does: Traverses a tree data structure by fanning out to explore the nearest neighbors and then their sub-level neighbors. It's used to locate connections and is a precursor to many other graph algorithms.

BFS is preferred when the tree is less balanced or the target is closer to the starting point. It can also be used to find the shortest path between nodes or avoid the recursive processes of depth-first search.

How it's used: Breadth-first search can be used to locate neighbor nodes in peer-to-peer networks like BitTorrent, GPS systems to pinpoint nearby locations, and social network services to find people within a specific distance.

2. Parallel Depth-First Search (DFS)

What it does: Traverses a tree data structure by exploring as far as possible down each branch before backtracking. It's used on deeply hierarchical data and is a precursor to many other graph algorithms. Depth-first search is preferred when the tree is more balanced or the target is closer to an endpoint.

How it's used: Depth-first search is often used in gaming simulations where each choice or action leads to another, expanding into a tree-shaped graph of possibilities. It will traverse the choice tree until it discovers an optimal solution path (i.e. win).

3. Single-Source Shortest Path

What it does: Calculates a path between a node and all other nodes whose summed value (weight of relationships such as cost, distance, time, or capacity) to all other nodes is minimal.

How it's used: Single-source shortest path is often applied to automatically obtain directions between physical locations, such as driving directions via Google Maps. It's also essential in logical routing, such as telephone call routing (least-cost routing).

4. All-Pairs Shortest Path

What it does: Calculates a shortest path forest (group) containing all shortest paths between the nodes in the graph. It's commonly used for understanding alternate routing when the shortest route is blocked or becomes sub-optimal.

How it's used: All-pairs shortest path is used to evaluate alternate routes for situations, such as a freeway backup or network capacity. It's also key in logical routing to offer multiple paths; for example, call routing alternatives.

5. Minimum Weight Spanning Tree (MWST)

What it does: Calculates the paths along a connected tree structure with the smallest value (weight of the relationship such as cost, time, or capacity) associated with visiting all nodes in the tree. It's also employed to approximate some NP-hard problems such as the traveling salesman problem and randomized or iterative rounding.

How it's used: Minimum weight spanning tree is widely used for network designs: least-cost logical or physical routing such as laying cable, fastest garbage collection routes, capacity for water systems, efficient circuit designs, and much more. It also has real-time applications with rolling optimizations, such as processes in a chemical refinery or driving route corrections.

Centrality Algorithms

6. PageRank

What it does: Estimates a current node's importance from its linked neighbors and then again from their neighbors. A node's rank is derived from the number and quality of its transitive links to estimate influence. Although popularized by Google, it's widely recognized as a way of detecting influential nodes in any network.

How it's used: PageRank is used in quite a few ways to estimate importance and influence. It's used to suggest Twitter accounts to follow and for general sentiment analysis.

PageRank is also used in machine learning to identify the most influential features for extraction. In biology, it's been used to identify which species extinctions within a food web would lead to biggest chain reaction of species death.

7. Degree Centrality

What it does: Measures the number of relationships a node (or an entire graph) has. It's broken into indegree (flowing in) and outdegree (flowing out) where relationships are directed.

How it's used: Degree centrality looks at immediate connectedness for uses such as evaluating the near-term risk of a person catching a virus or hearing information. In social studies, the indegree of friendship can be used to estimate popularity and outdegree as gregariousness.

8. Closeness Centrality

What it does: Measures how central a node is to all its neighbors within its cluster. Nodes with the shortest paths to all other nodes are assumed to be able to reach the entire group the fastest.

How it's used: Closeness centrality is applicable in a number of resources, communication, and behavioral analysis, especially when interaction speed is significant. It has been used for identifying the best location of new public services for maximum accessibility.

In social network analysis, it is used to find people with the ideal social network location for faster dissemination of information.

9. Betweenness Centrality

What it does: Measures the number of shortest paths (first found with breadth-first search) that pass through a node. Nodes that most frequently lie on shortest paths have higher betweenness centrality scores and are the bridges between different clusters. It is often associated with the control over the flow of resources and information.

How it's used: Betweenness centrality applies to a wide range of problems in network science and is used to pinpoint bottlenecks or likely attack targets in communication and transportation networks. In genomics, it has been used to understand the control certain genes have in protein networks for improvements such as better drug/disease targeting.

Betweenness Centrality has also be used to evaluate information flows between multiplayer online gamers and expertise sharing communities of physicians.

Community Detection Algorithms

This category is also known as clustering algorithms or partitioning algorithms.

10. Label Propagation

What it does: Spreads labels based on neighborhood majorities as a means of inferring clusters. This extremely fast graph partitioning requires little prior information and is widely used in large-scale networks for community detection. It's a key method for understanding the organization of a graph and is often a primary step in other analysis.

How it's used: Label propagation has diverse applications, from understanding consensus formation in social communities to identifying sets of proteins that are involved together in a process (functional modules) for biochemical networks. It's also used in semi- and unsupervised machine learning as an initial preprocessing step.

11. Strongly Connected

What It Does: Locates groups of nodes where each node is reachable from every other node in the same group following the direction of relationships. It's often applied from a depth-first search.

How it's used:Strongly connected is often used to enable running other algorithms independently on an identified cluster. As a preprocessing step for directed graphs, it helps quickly identify disconnected groups. In retail recommendations, it helps identify groups with strong affinities that then are used for suggesting commonly preferred items to those within that group who have not yet purchased the item.

12. Union-Find/Connected Components/Weakly Connected

What it does: Finds groups of nodes where each node is reachable from any other node in the same group, regardless of the direction of relationships. It provides near constant-time operations (independent of input size) to add new groups, merge existing groups, and determine whether two nodes are in the same group

How it's used: Union-find/connected components is often used in conjunction with other algorithms, especially for high-performance grouping. As a preprocessing step for undirected graphs, it helps quickly identify disconnected groups.

13. Louvain Modularity

What it does: Measures the quality (i.e. presumed accuracy) of a community grouping by comparing its relationship density to a suitably defined random network. It's often used to evaluate the organization of complex networks and community hierarchies in particular. It's also useful for initial data preprocessing in unsupervised machine learning.

How it's used: Louvain is used to evaluate social structures on Twitter, LinkedIn, and YouTube. It's used in fraud analytics to evaluate whether a group has just a few bad behaviors or is acting as a fraud ring that would be indicated by a higher relationship density than average. Louvain revealed a six-level customer hierarchy in a Belgian telecom network.

14. Local Clustering Coefficient/Node Clustering Coefficient

What it does: For a particular node, it quantifies how close its neighbors are to being a clique (every node is directly connected to every other node). For example, if all your friends knew each other directly, your local clustering coefficient would be 1. Small values for a cluster would indicate that although a grouping exists, the nodes are not tightly connected.

How it's used: Local cluster coefficient is important for estimating resilience by understanding the likelihood of group coherence or fragmentation. An analysis of a European power grid using this method found that clusters with sparsely connected nodes were more resilient against widespread failures.

15. Triangle-Count and Average Clustering Coefficient

What it does: Measures how many nodes have triangles and the degree to which nodes tend to cluster together. The average clustering coefficient is 1 when there is a clique and 0 when there are no connections. For the clustering coefficient to be meaningful, it should be significantly higher than a version of the network where all of the relationships have been shuffled randomly.

How it's used: The average clustering coefficient is often used to estimate whether a network might exhibit "small-world" behaviors that are based on tightly knit clusters. It's also a factor for cluster stability and resiliency. Epidemiologists have used the average clustering coefficient to help predict various infection rates for different communities.

Conclusion

The world is driven by connections. Neo4j graph analytics reveals the meaning of those connections using practical, optimized graph algorithms including the ones detailed above.

This concludes our series on graph algorithms in Neo4j. We hope these algorithms help you make sense of your connected data in more meaningful and effective ways.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
neo4j ,graph database ,algorithms ,big data ,graph algorithms ,data visualization ,data analytics

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}