# Weekly Algorithm: Property Graph Algorithms

# Weekly Algorithm: Property Graph Algorithms

Join the DZone community and get the full member experience.

Join For FreeBuild vs Buy a Data Quality Solution: Which is Best for You? Gain insights on a hybrid approach. Download white paper now!

An example of a property graph with two vertices and one edge is diagrammed below.

Property graphs are more complex than the standard single-relational graphs of common knowledge. The reason for this is that there are different types of vertices (e.g. *people*, *companies*, *software*) and different types of edges (e.g. *knows*, *works_for*, *imports*). The complexities added by this data structure (and multi-relational graphs in general, e.g. RDF graphs) effect how graph algorithms are defined and evaluated.

Standard graph theory textbooks typically present common algorithms such as various centralities, geodesics, assortative mixings, etc. These algorithms usually come pre-packaged with single-relational graph toolkits and frameworks (e.g. NetworkX, iGraph).

It is common for people to desire such graph algorithms when they begin to work with property graph software. I have been asked many times:

“Does the property graph software you work on support any of the common centrality algorithms? For example, PageRank, closeness, betweenness, etc.?”

My answer to this question is always:

“What do you mean by centrality in a property graph?”

When a heterogeneous set of vertices can be related by a heterogeneous set of edges, there are numerous ways in which to calculate centrality (**or any other standard graph algorithm for that matter**).

- Ignore edge labels and use standard single-relational graph centrality algorithms.
- Isolate a particular “slice” of the graph (e.g. the
*knows*subgraph) and use standard single-relational graph centrality algorithms. - Make use of abstract adjacencies to compute centrality with higher-order semantics.

The purpose of this blog post is to stress point #3 and the power of property graph algorithms. In Gremlin, you can calculate numerous eigenvector centralities for the same property graph instance. At this point, you might ask: “How can a graph have more than one primary eigenvector?” The answer lies in seeing all the graphs that exist within the graph—i.e. seeing all the higher-order, derived, implicit, virtual, abstract adjacencies. Each line below exemplifies point #1, #2, and #3 in the list above, respectively. The code examples use the power method to calculate the vertex centrality rankings which are stored in the map *m*.

g.V.outE.inV.groupCount(m).loop(3){c++ < 10000} // point #1 g.V.outE[[label:'knows']].inV.groupCount(m).loop(4){c++ < 10000} // point #2 g.V.???.groupCount(m).loop(?){c++ < 10000} // point #3

The *???* on line 3 refers to the fact that *???* can be any arbitrary computation. For example, *???* can be:

outE[[label:'works_for']].inV.inE[[label:'works_for']].outV outE[[label:'works_for']].inV[[name:'ACME']].inE[[label:'works_for']].outV outE[[label:'develops']].inV.outE[[label:'imports']].inV[[name:'Blueprints']].back(7).outE[[label:'works_for']].inV.inE[[label:'works_for']].outV.outE[[label:'develops']].inV.outE[[label:'imports']].inV[[name:'Blueprints']].back(7)

The above expressions have the following meaning:

- Coworker centrality
- ACME Corporation coworker centrality
- Coworkers who import Blueprints into their software centrality

There are numerous graphs within the graph. As such, “what do you mean by centrality?”

These ideas are explored in more detail in the following article and slideshow.

Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms,” Journal of Informetrics, 4(1), pp. 29-41, Elsevier, doi:10.1016/j.joi.2009.06.004, 2009.

**The Gremlin in the Graph**

Source: http://markorodriguez.com/2011/02/08/property-graph-algorithms/

Build vs Buy a Data Quality Solution: Which is Best for You? Maintaining high quality data is essential for operational efficiency, meaningful analytics and good long-term customer relationships. But, when dealing with multiple sources of data, data quality becomes complex, so you need to know when you should build a custom data quality tools effort over canned solutions. Download our whitepaper for more insights into a hybrid approach.

Opinions expressed by DZone contributors are their own.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}