# Weighted Graph Queries in InfiniteGraph

### In this article, learn the concept of a weighted graph and queries in the InfiniteGraph database and show how InfiniteGraph offers several unique advantages.

· Big Data Zone · Tutorial
Save
8.80K Views

## 1. Introduction

Graph databases have moved to the forefront of trendy technologies. There are a lot of mature companies with graph database technologies and a lot of new players seem to be arriving on the scene almost daily, and for good reason; graphs are a more natural way to represent many kinds of data. They excel at modeling and managing the connections between data elements and this opens up new possibilities for what we can accomplish with our data.

In this blog, we will introduce the concept of a weighted graph and weighted graph queries in the InfiniteGraph database, and show how InfiniteGraph offers several unique advantages for performing weighted graph queries.

### 1.1 What Is a Graph?

A graph is made up of nodes, sometimes called vertices, and edges. The nodes of a graph typically represent a person, place, or thing and are drawn using a circle. Edges represent relationships between nodes and are drawn as a line between two nodes.

Figure 1: A Graph

### 1.2 What Is InfiniteGraph?

InfiniteGraph is a massively scalable, distributable, graph database. It is a schema-based graph database and has its own query language called “DO”.

## 2. Simple Paths

Consider the graph shown in Figure 2.

Figure 2: A Simple Graph

In this graph, all we have are nodes with labels, and edges that connect the nodes. If this graph represented people and their social connections to other people, you could figure out how two people were connected. You could figure out that A and C are a separated by minimum of two degrees, and this graph is perfectly fine for this kind of question. However, if this graph represented towns and the roads that connect them, this graph could be used to find routes between towns but it doesn’t provide enough information to find the best route since it doesn’t provide information on how far apart the towns are. For this, and many other kinds of problems, we need weighted graphs.

## 3. Weighted Graphs

A weighted graph is a graph where a weight value is associated with each edge. The weight value, called an edge-weight, typically serves as some kind of cost factor that is incurred when traversing the edge. In our map example above, the cost factor might be the distance associated with each road represented by the edge. Consider the following graph in Figure 3:

Figure 3: A Weighted Graph

Here, the edge weight on the edge from A to B represents the distance from A to B and has a value of 4. By summing the edge weights for a particular route, we will know the total distance for that route. To go from A to B to C, we add 4 + 5 and get a total distance of 9. We can compare the total distances for the different routes between a given starting point and end point to determine which route is the shortest.

In many cases, the edge-weight will simply be the value of an attribute on the edge data item. In our case, we would have a Road data item with a distance attribute.

In other cases, the edge-weight might be more complicated. It can be useful to be able to build a calculated edge-weight based on a variety of edge attributes or other data features. Consider our map example again. What if the shortest path by distance had considerably lower speed limits or considerably more stoplights? If we cared about travel time instead of just travel distance, we might want to compute an edge-weight based on some function of the distance, speed limits, and the number of stoplights along each edge. This graph is shown in Figure 4.

Figure 4: Weighted Graph, Distance, Speed Limit, and Stoplights

This is still a weighted graph, but now the definition of the edge-weight is much more complex. How do we deal with this? In many graph databases, the edge-weight must be a single attribute value stored in the edge data item. If the edge-weight is something more complex, like in Figure 4, we’d have to write and execute an update statement to sweep across the entire graph, perform the required computation on all of the edges, and then store the result as an attribute on each edge data item. Then, we’d be back in the situation where we could run our weighted graph query using that new attribute as the weight. This is terribly inefficient.

## 4. Introducing the User-Defined Weight Calculator

What if there was another way? What if, instead of having to sweep across our entire dataset updating the edges with a newly computed value, we could define an operation that could be used to compute the edge-weights on the fly at query time? And, what if the operation only needed to be executed for those edges that were actually touched by the navigational query? That would save an enormous amount of compute time. We would no longer have to compute the edge-weight on edges that were going to be trimmed during path processing.

This is exactly what InfiniteGraph does. InfiniteGraph’s DO query language allows the user to define a weight calculator operator and then use it in navigational path queries.

Let’s start with a simple example, using Figure 3, and find the shortest path from A to H.

First, we need to define our weight calculator as follows:

DO Query Language

``````CREATE WEIGHT CALCULATOR routeDistance {
minimum: 0,
default: 0,
edges: {
}
};``````

The weight calculator allows the user to specify a minimum weight, a default weight, and a set of edge patterns and their associated edge-weight functions. When a navigational query is executed using a weight calculator, the weight calculator is invoked only on the edges along candidate paths.

In this example, there is one pattern, `()-[r:Road]->()`, in the edges block that matches all Road edges and the edge-weight function states that the weight of edges of type Road, represented by the variable “r”, shall be defined as r.distance, the value of the distance attribute on the current Road edge.

The edges block within the weight calculator can be used to define several different edge-weight functions based on the type of the “from” node, “to” node, edge type, or values on them. If our schema contained definitions for Path, Road, and Highway, we might define a route distance calculator as follows with different multipliers based on the type of the edge:

DO Query Language

``````CREATE WEIGHT CALCULATOR routeDistance {
minimum: 0,
default: 0,
edges: {
()-[p:Path]->()   : p.cost * 2.0,
()-[h:Highway]->(): h.cost * 0.5
}
};``````

We can now use our newly defined routeDistance weight calculator to execute a query to find the shortest path. The query is as follows:

DO Query Language

``````MATCH m = LIGHTEST routeDistance
((:Town {name == ‘TownA’})-[*]->(:Town {name == ‘TownH’}))
RETURN m;``````

Here, we want to find the path from TownA to TownH through any number of edges (“-[*]->”) and we want the path with the lightest weight based on the routeDistance weight calculator.

When we execute the query, we get the following results:

DO Query Language

``````{
_Projection
{
m:WALK
{
length:2,         <—- The number of edges in the path.
weight:11.0000,   <—- The weight of the lightest path.
edges:
{
EDGE
{
weight:3.00000,
from:3-3-1-4,             <—- TownA
fromClass:’Town’,
attribute:’to’,
edgeData:3-3-1-35,
to:3-3-1-10,              <—- TownD
toClass:’Town’
},
EDGE
{
weight:8.00000,
from:3-3-1-10,            <—- TownD
fromClass:’Town’,
attribute:’to’,
edgeData:3-3-1-41,
to:3-3-1-18,              <—- TownH
toClass:’Town’
}
}
}
}
}``````

The following image shows the object IDs of the nodes in the database:

Figure 5: Weighted Graph Showing Object IDs

## 5. A More Complex Example

How complex can a weight calculator be? The answer to that question is very.

Graphs are great for keeping track of social networks including who emailed who or who called who and when. In this example, we will build a model of telephone calls between friends and then demonstrate the real power of the weight calculator.

### 5.1 Our Data Model

The thing about building a graph of interactions like emails, text messages, or telephone calls is that you don’t want to create a new edge between two people for every interaction. What you want to do is to create a single edge that represents the fact that two people have interacted and then add a detailed record to that edge to document each interaction.

The graph representation looks like this:

Figure 6: CallDetails on a Call

In DO, the schema definition looks like this:

DO Query Language

``````UPDATE SCHEMA {
CREATE CLASS Phone  {
number      : STRING,
callsIn     : List {
Element: Reference {
EdgeClass: Call,
EdgeAttribute: from
},
CollectionTypeName: TreeListOfReferences
},

callsOut    : List {
Element: Reference {
EdgeClass: Call,
EdgeAttribute: to
},
CollectionTypeName: TreeListOfReferences
}
}

CREATE CLASS CallDetail {
start       : DateTime,
end         : DateTime,
duration    : REAL { Storage: B32 }
}

CREATE CLASS Call {
from        : Reference { referenced: Phone, Inverse: callsOut },
to          : Reference { referenced: Phone, Inverse: callsIn },
callDetails : List {
Element: Reference {
referenced: CallDetail
},
CollectionTypeName: TreeListOfReferences
}
}
};``````

Note that in the Call class, there is an attribute, callDetails, that is a list of references to CallDetail objects.

The graph of our data looks like this:

Figure 7: A Call Graph with CallDetails

In our example, there are three CallDetail objects hanging off of each edge. The AB edge has three CallDetail objects in the callDetails list and those three CallDetail objects have durations of 18.5, 22.3, and 26.0.

Let’s define a weight calculator, callVolume, that defines the edge weight as the sum of all of the CallDetail durations in the callDetails list:

DO Query Language

``````CREATE WEIGHT CALCULATOR callVolume {
minimum:       0,
default: 0,
edges: {
()-[c:Call]->(): SUM(c.callDetails.duration)
}
};``````

In this example, “c” is the current Call edge object. The weight of the edge becomes the sum of the CallDetail duration values in the callDetails list in “c”. For the AB edge, this would be 18.5 + 22.3 + 26.0 = 66.8.

We can then use our callVolume weight calculator to find the path from A to C with the lightest call volume with the following query:

DO Query Language

``````MATCH m = LIGHTEST callVolume
((:Phone {number == ‘555-1111’})
-[*..10]->
(:Phone {number == ‘555-3333’}))
RETURN m;``````

The resulting path with its calculated weight is shown below:

DO Query Language

``````{
_Projection
{
m:WALK
{
length:2,               <— The length of the lightest path.
weight:31.0000,         <— The weight of the lightest path.
edges:
{
EDGE
{
weight:16.2000,

from:3-3-1-4,             <— Phone A (555-1111)
fromClass:’Phone’,
attribute:’callsOut’,
edgeData:3-3-1-32,
edgeDataClass:’Call’,
to:3-3-1-10,              <— Phone D (555-4444)
toClass:’Phone’
},
EDGE
{
weight:14.8000,
from:3-3-1-10,            <— Phone D (555-4444)
fromClass:’Phone’,
attribute:’callsOut’,
edgeData:3-3-1-40,
edgeDataClass:’Call’,
to:3-3-1-8,               <— Phone C (555-3333)
toClass:’Phone’
}

}

}

}

}``````

As you can see, the lightest path based on the callVolume weight calculator is the path from A to D to C, with a length of 2 and weight of 31.0000.

This is just the beginning. The function definitions inside the edge-weight calculator can use any of the available DO operators to build up a complex operation to calculate the edge-weight. And it is all done without updating any data in the graph.

There are many advantages to the InfiniteGraph user-defined weight calculator:

1. The edge-weight can be taken directly from the value of an edge attribute.
2. The edge-weight can be computed from one or more edge attributes.
3. The edge-weight can be computed from a wide range of data associated with the edge as was shown in the CallDetail example.
4. Weight calculators are easy to define and modify and support tying out different scenarios quickly.
5. The cost of executing the weight calculator is only incurred on those edges that are actually examined by the navigational query.
6. InfiniteGraph filters paths on the fly, not after all potential paths have been found. This can lead to a significant performance advantage over other graph database products.

## 7. Conclusion

Nearly all graph databases have some support for weighted queries. However, most graph databases use implementations that limit the weight of an edge to the value of a single attribute stored in the edge attribute data. InfiniteGraph’s weight calculator provides a powerful yet simple and effective way to define an operator that can be used to compute both simple and complex edge weights on the fly and without updating the data in the database.

Topics:
graph database, graph database analytics, big data, tutorial, infinite graph, weighted graphs

Published at DZone with permission of Daniel Hall.

Opinions expressed by DZone contributors are their own.