# Managing Logistical Relationships in a Graph Database

# Managing Logistical Relationships in a Graph Database

### To generate useful insights in the world of Big Data today, we don't only need to manage large volumes of data — we also need to concentrate on data relationships.

Join the DZone community and get the full member experience.

Join For Free**Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.**

In today’s Big Data world, it’s not only about managing large volumes of data but also about generating useful insights from them. To generate insights, we also need to concentrate on the relationships between data. What better way to leverage the relationships between data than using a database where relationships take first priority? We have such a database in a graph database that can represent our data as nodes and the relationships between them as edges.

Among many scenarios that can be naturally represented as graphs, maps are the most obvious. But when we think of maps, we can also think of logistics and geospatial applications, which are actually the original graph use cases. Geospatial applications of graph databases range from calculating routes between locations in networks such as a road or rail, airspace, or logistical network to spatial operations such as finding all points of interest in a bounded area, finding the center of a region, and calculating the intersection between two or more regions.

One concrete example of graph databases being used in logistics is eBay, who (owing to the acquisition of Shutl) provides a service that uses graphs to compute fast, localized door-to-door delivery of goods between buyers and sellers, scaling their business to include the supply chain. Incidentally, eBay observed that before turning to graphs the latency of their longest query was higher than their shortest physical delivery, both around 15 minutes — something that can’t now be replicated when an average query is powered by a graph database and takes 1/50th of a second!

The eBay example is not isolated. Organizations large and small are adopting and winning with graphs in retail, finance, telecoms, IT, gaming, real estate, healthcare, science, and dozens of more areas.

This article gives an example of how a use case for delivery can be modeled as a graph and solved through simple queries in the graph database.

*Figure 1: Nodes in a typical logistical distribution network.*

## The Use Case for Geospatial Data and Logistics

Let me present a highly simplified scenario. A parcel needs to be delivered and the delivery man needs to deliver it in the earliest time possible. So, he has to choose the fastest route to the customer’s doorstep from the delivery center.

### Graph Data Model

To solve his problem, we need to represent the streets between the delivery center and the customer as a graph. Hence, the intersections/road junctions are considered as nodes of the graph and the edges of the graph as the lanes connecting the junctions. In graph database terms, edges (lanes) are the relationship between the nodes (intersections) and we name the relationship as Road which has the following attributes:

**Distance**: The actual distance between the intersections.**Coverability**: Depends on a lot of factors like traffic, road condition, terrain, etc.

The delivery center and the customer address are also represented as nodes in the graph. So essentially, we have three types of nodes:

**sNode**: The Start node, i.e., Delivery center.**eNode**: The Finish node, i.e., Customer’s address.**Intersection**: Road junctions.

So now, finding the fastest route boils down to a weighted shortest path calculation. We calculate it keeping in mind the two attributes of the relationship: distance (shorter is better) and coverability (higher is better).

Here we use the world’s leading graph database, Neo4j, which is a highly scalable and ACID-compliant database and its declarative query language Cypher.

A sample dataset is created in Neo4j using the CREATE clause in Cypher as given in Query 1 (create clause in Cypher). This loads the data into Neo4j and generates the graph database as shown in Figure 2.

Neo4j has a lot of graph algorithms shipped with it as a package and those are accessible only from the JAVA API. Implementing some of these algorithms in Cypher is quite complex and time consuming. From Neo4j 3.x, the concept of user defined procedures had been introduced called APOC (Awesome Procedures On Cypher). Those are custom implementations of certain functionality, that can’t be (easily) expressed in Cypher itself. The APOC library consists of many (about 300) procedures to help with many different tasks in areas like data integration, graph algorithms or data conversion.

```
create (n0:sNode {name:'Start'}),
(n1:Intersection {name:'1'}),
(n2:Intersection {name:'2'}),
(n3:eNode {name:'Finish'}),
(n4:Intersection {name:'4'}),
(n5:Intersection {name:'5'}),
(n6:Intersection {name:'6'}),
(n7:Intersection {name:'7'}),
(n8:Intersection {name:'8'}),
(n9:Intersection {name:'9'}),
(n10:Intersection {name:'10'}),
(n11:Intersection {name:'11'}),
(n12:Intersection {name:'12'}),
(n13:Intersection {name:'13'}),
(n14:Intersection {name:'14'}),
(n15:Intersection {name:'15'}),
(n16:Intersection {name:'16'}),
(n17:Intersection {name:'17'}),
(n18:Intersection {name:'18'}),
(n19:Intersection {name:'19'}),
(n0)-[:Road {distance:5,coverability:1}]->(n4),
(n4)-[:Road {distance:15,coverability:1}]->(n5),
(n5)-[:Road {distance:5,coverability:1}]->(n6),
(n6)-[:Road {distance:100,coverability:1}]->(n7),
(n7)-[:Road {distance:80,coverability:0.9}]->(n8),
(n8)-[:Road {distance:5,coverability:0.8}]->(n1),
(n0)-[:Road {distance:100,coverability:0.9}]->(n9),
(n9)-[:Road {distance:10,coverability:0.8}]->(n10),
(n10)-[:Road {distance:10,coverability:0.8}]->(n1),
(n1)-[:Road {distance:5,coverability:0.8}]->(n11),
(n11)-[:Road {distance:10,coverability:0.8}]->(n12),
(n12)-[:Road {distance:30,coverability:0.8}]->(n13),
(n13)-[:Road {distance:60,coverability:0.8}]->(n2),
(n1)-[:Road {distance:5,coverability:0.8}]->(n8),
(n8)-[:Road {distance:105,coverability:0.9}]->(n14),
(n14)-[:Road {distance:5,coverability:0.8}]->(n2),
(n2)-[:Road {distance:15,coverability:0.8}]->(n15),
(n15)-[:Road {distance:75,coverability:0.9}]->(n3),
(n2)-[:Road {distance:20,coverability:0.7}]->(n16),
(n16)-[:Road {distance:15,coverability:0.9}]->(n17),
(n17)-[:Road {distance:75,coverability:1}]->(n3),
(n2)-[:Road {distance:5,coverability:0.9}]->(n14),
(n14)-[:Road {distance:40,coverability:0.9}]->(n18),
(n18)-[:Road {distance:40,coverability:0.9}]->(n19),
(n19)-[:Road {distance:70,coverability:1}]->(n3);
```

*Query 1: Query to add data into the database.*

*Figure 2: The sample graph database.*

We need to download the latest APOC library, drop the JAR file into the `. /plugins`

directory of our Neo4j installation, and then just run Query 2. We use the Dijkstra's algorithm for this task.

```
MATCH (startNode:sNode {name:"Start"}),(endNode:eNode {name:"Finish"})
call apoc.algo.dijkstra(startNode, endNode, 'Road', 'distance') YIELD path, weight
return path;
```

*Query 2: Query to apply Dijkstra’s algorithm using only distance as the edge weights.*

*Figure 3: Shortest path from Start to End using distance as edge weights.*

In Query 2, we just consider the distance property. To also consider the coverability property, we need to explicitly set the attribute, factor (= distance/coverability) into the relationships and then use Dijkstra’s algorithm. Run Query 3 to achieve the desired results.

```
//set distance/coverability to each relationship
MATCH (a)-[r:Road]->(b)
SET r.factor=r.distance/r.coverability;
//apply Dijkstra’s algorithm with factor as the weight
MATCH (startNode:sNode {name:"Start"}),
(endNode:eNode {name:"Finish"})
call apoc.algo.dijkstra(startNode, endNode, 'Road', 'factor') YIELD path, weight
return path;
```

*Query 3: Adding attribute distance/coverability and applying Dijkstra’s algorithm.*

*Figure 4: Shortest path from start to end with distance/coverability as edge weight.*

So, Figure 4 gives the route that the delivery man needs to follow to deliver the parcel on time.

## Conclusion

In logistics, planning, impact analysis, and many other applications, weighted shortest path algorithms have a great potential. The graph data model and graph databases can be used to generate competitive insights and significant business value in various use cases like social and professional networking (i.e., LinkedIn, Facebook), real-time recommendations, fraud detection, master data management, authorization and access control, etc.

Dr. Roy Marsten in a *WIRED* article says:

“The data that we have today, and in often the ways we look at data, are already steeped in the theory of graphs. Creating and analyzing graphs will bring us to answers automatically. When we let data connect itself, meaning will begin to emerge automatically.”

So, with all the chatter around graph databases, it won’t be completely unwise to assume that the future is graph shaped.

### Further Reads

### References

**Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub. Join the discussion.**

Published at DZone with permission of Angshuman Talukdar , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}