Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Managing Logistical Relationships in a Graph Database

DZone's Guide to

Managing Logistical Relationships in a Graph Database

To generate useful insights in the world of Big Data today, we don't only need to manage large volumes of data — we also need to concentrate on data relationships.

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

In today’s Big Data world, it’s not only about managing large volumes of data but also about generating useful insights from them. To generate insights, we also need to concentrate on the relationships between data. What better way to leverage the relationships between data than using a database where relationships take first priority? We have such a database in a graph database that can represent our data as nodes and the relationships between them as edges.

Among many scenarios that can be naturally represented as graphs, maps are the most obvious. But when we think of maps, we can also think of logistics and geospatial applications, which are actually the original graph use cases. Geospatial applications of graph databases range from calculating routes between locations in networks such as a road or rail, airspace, or logistical network to spatial operations such as finding all points of interest in a bounded area, finding the center of a region, and calculating the intersection between two or more regions.

One concrete example of graph databases being used in logistics is eBay, who (owing to the acquisition of Shutl) provides a service that uses graphs to compute fast, localized door-to-door delivery of goods between buyers and sellers, scaling their business to include the supply chain. Incidentally, eBay observed that before turning to graphs the latency of their longest query was higher than their shortest physical delivery, both around 15 minutes — something that can’t now be replicated when an average query is powered by a graph database and takes 1/50th of a second!

The eBay example is not isolated. Organizations large and small are adopting and winning with graphs in retail, finance, telecoms, IT, gaming, real estate, healthcare, science, and dozens of more areas.

This article gives an example of how a use case for delivery can be modeled as a graph and solved through simple queries in the graph database.


Typical nodes in a logistical distribution network

Figure 1: Nodes in a typical logistical distribution network.

The Use Case for Geospatial Data and Logistics

Let me present a highly simplified scenario. A parcel needs to be delivered and the delivery man needs to deliver it in the earliest time possible. So, he has to choose the fastest route to the customer’s doorstep from the delivery center.

Graph Data Model

To solve his problem, we need to represent the streets between the delivery center and the customer as a graph. Hence, the intersections/road junctions are considered as nodes of the graph and the edges of the graph as the lanes connecting the junctions. In graph database terms, edges (lanes) are the relationship between the nodes (intersections) and we name the relationship as Road which has the following attributes:

  • Distance: The actual distance between the intersections.
  • Coverability: Depends on a lot of factors like traffic, road condition, terrain, etc.

The delivery center and the customer address are also represented as nodes in the graph. So essentially, we have three types of nodes:

  • sNode: The Start node, i.e., Delivery center.
  • eNode: The Finish node, i.e., Customer’s address.
  • Intersection: Road junctions.

So now, finding the fastest route boils down to a weighted shortest path calculation. We calculate it keeping in mind the two attributes of the relationship: distance (shorter is better) and coverability (higher is better).

Here we use the world’s leading graph database, Neo4j, which is a highly scalable and ACID-compliant database and its declarative query language Cypher.

A sample dataset is created in Neo4j using the CREATE clause in Cypher as given in Query 1 (create clause in Cypher). This loads the data into Neo4j and generates the graph database as shown in Figure 2.

Neo4j has a lot of graph algorithms shipped with it as a package and those are accessible only from the JAVA API. Implementing some of these algorithms in Cypher is quite complex and time consuming. From Neo4j 3.x, the concept of user defined procedures had been introduced called APOC (Awesome Procedures On Cypher). Those are custom implementations of certain functionality, that can’t be (easily) expressed in Cypher itself. The APOC library consists of many (about 300) procedures to help with many different tasks in areas like data integration, graph algorithms or data conversion.

create (n0:sNode {name:'Start'}),
(n1:Intersection {name:'1'}),
(n2:Intersection {name:'2'}),
(n3:eNode {name:'Finish'}),
(n4:Intersection {name:'4'}),
(n5:Intersection {name:'5'}),
(n6:Intersection {name:'6'}),
(n7:Intersection {name:'7'}),
(n8:Intersection {name:'8'}),
(n9:Intersection {name:'9'}),
(n10:Intersection {name:'10'}),
(n11:Intersection {name:'11'}),
(n12:Intersection {name:'12'}),
(n13:Intersection {name:'13'}),
(n14:Intersection {name:'14'}),
(n15:Intersection {name:'15'}),
(n16:Intersection {name:'16'}),
(n17:Intersection {name:'17'}),
(n18:Intersection {name:'18'}),
(n19:Intersection {name:'19'}),

(n0)-[:Road {distance:5,coverability:1}]->(n4),
(n4)-[:Road {distance:15,coverability:1}]->(n5),
(n5)-[:Road {distance:5,coverability:1}]->(n6),
(n6)-[:Road {distance:100,coverability:1}]->(n7),
(n7)-[:Road {distance:80,coverability:0.9}]->(n8),
(n8)-[:Road {distance:5,coverability:0.8}]->(n1),
(n0)-[:Road {distance:100,coverability:0.9}]->(n9),
(n9)-[:Road {distance:10,coverability:0.8}]->(n10),
(n10)-[:Road {distance:10,coverability:0.8}]->(n1),
(n1)-[:Road {distance:5,coverability:0.8}]->(n11),
(n11)-[:Road {distance:10,coverability:0.8}]->(n12),
(n12)-[:Road {distance:30,coverability:0.8}]->(n13),
(n13)-[:Road {distance:60,coverability:0.8}]->(n2),
(n1)-[:Road {distance:5,coverability:0.8}]->(n8),
(n8)-[:Road {distance:105,coverability:0.9}]->(n14),
(n14)-[:Road {distance:5,coverability:0.8}]->(n2),
(n2)-[:Road {distance:15,coverability:0.8}]->(n15),
(n15)-[:Road {distance:75,coverability:0.9}]->(n3),
(n2)-[:Road {distance:20,coverability:0.7}]->(n16),
(n16)-[:Road {distance:15,coverability:0.9}]->(n17),
(n17)-[:Road {distance:75,coverability:1}]->(n3),
(n2)-[:Road {distance:5,coverability:0.9}]->(n14),
(n14)-[:Road {distance:40,coverability:0.9}]->(n18),
(n18)-[:Road {distance:40,coverability:0.9}]->(n19),
(n19)-[:Road {distance:70,coverability:1}]->(n3);

Query 1: Query to add data into the database.

sample graph databaseFigure 2: The sample graph database.

We need to download the latest APOC library, drop the JAR file into the  . /plugins directory of our Neo4j installation, and then just run Query 2. We use the Dijkstra's algorithm for this task.

MATCH (startNode:sNode {name:"Start"}),(endNode:eNode {name:"Finish"})
call apoc.algo.dijkstra(startNode, endNode, 'Road', 'distance') YIELD path, weight
return path;

Query 2: Query to apply Dijkstra’s algorithm using only distance as the edge weights.

shortest path graph database

Figure 3: Shortest path from Start to End using distance as edge weights.

In Query 2, we just consider the distance property. To also consider the coverability property, we need to explicitly set the attribute, factor (= distance/coverability) into the relationships and then use Dijkstra’s algorithm. Run Query 3 to achieve the desired results.

//set distance/coverability to each relationship
MATCH (a)-[r:Road]->(b)
SET r.factor=r.distance/r.coverability;
//apply Dijkstra’s algorithm with factor as the weight
MATCH (startNode:sNode {name:"Start"}),
(endNode:eNode {name:"Finish"})
call apoc.algo.dijkstra(startNode, endNode, 'Road', 'factor') YIELD path, weight
return path;

Query 3: Adding attribute distance/coverability and applying Dijkstra’s algorithm.

shortest path graph database with edge weightsFigure 4: Shortest path from start to end with distance/coverability as edge weight.

So, Figure 4 gives the route that the delivery man needs to follow to deliver the parcel on time.

Conclusion

In logistics, planning, impact analysis, and many other applications, weighted shortest path algorithms have a great potential. The graph data model and graph databases can be used to generate competitive insights and significant business value in various use cases like social and professional networking (i.e., LinkedIn, Facebook), real-time recommendations, fraud detection, master data management, authorization and access control, etc.

Dr. Roy Marsten in a WIRED article says:

“The data that we have today, and in often the ways we look at data, are already steeped in the theory of graphs. Creating and analyzing graphs will bring us to answers automatically. When we let data connect itself, meaning will begin to emerge automatically.”

So, with all the chatter around graph databases, it won’t be completely unwise to assume that the future is graph shaped.

Further Reads

References

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:
graph database ,big data ,logistics ,geospatial data ,data visualization

Published at DZone with permission of Angshuman Talukdar, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}