Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Database Building 101: The Cost of Graphing

DZone's Guide to

Database Building 101: The Cost of Graphing

As you might expect, graph databases have different issues than table-based databases. In this article we examine some of the computational costs.

· Database Zone
Free Resource

Download the Guide to Open Source Database Selection: MySQL vs. MariaDB and see how the side-by-side comparison of must-have features will ease the journey. Brought to you in partnership with MariaDB.

So far we looked into how we can store the nodes and the edges, and we explored some interesting data structures inside Voron. Now, let’s see how we can traverse the graph.

public NameValueCollection GetNode(long id)
{
    using(var tx = _env.ReadTransaction())
    {
        var nodesTable = new Table(_emptySchema, "Nodes", tx);
    
        var reader = new TableValueReader(table.DirectRead(id, out size), size);
        var data = reader.Read(0, out size);
        var nodeStr = Encoding.UTF8.GetString(data, size);
        return ParseNode(nodeStr);
    }
}


So getting the node is pretty easy, and remember that to access the table by ID gives us an O(1) cost.

Now, what about finding all the edges from a node?

public IEnumerable<Tuple<long, NameValueCollection>> GetEdgesFor(string type, long id)
{
    using(var tx = _env.ReadTransaction())
    {
        var edgesDataTable = new Table(_emptySchema, "EdgesData", tx);
        var edgeTypeTree = tx.CreateTree("Edges_"+ type);
        var fixedSizeTree = edgeTypeTree.FixedTreeFor(from, valSize: 8);
        using(var it = fixedSizeTree.Iterate())
        {
            if(it.Seek(Slices.BeforeAllKeys))
            yield break;
            do
            {
                var destNodeId = it.CurrentKey;
                var edgeDataId = it.CreateReaderForCurrent().ReadLittleEndianInt64();
        
                NameValueCollection edgeData = null;
                if(edgeDataId != null)
                {
                    var reader = new TableValueReader(edgesDataTable.DirectRead(edgeDataId, out size), size);
                    var data = reader.Read(0, out size);
                    var edgeStr = Encoding.UTF8.GetString(data, size);
                    edgeData = ParseNode(nodeStr);
                }
                yield return Tuple.Create(destNodeId, edgeDataId);
            }while(it.MoveNext());
        }
    }
}


This is a lot heftier, but let’s try to break it into individual pieces. First, we find the tree holding all the edges of that particular type, then we access (by the ID of the source node) the fixed size tree that holds all the connected nodes and the edge data ID.

Finally, we access the edges data (if it exists) to get the data about the edge itself, after which, we return it all to the caller.

Unlike the previous method, here we can’t claim to have O(1) costs. Instead, our costs are composed of:

  1. O(logN): where N is the number of edges types (typically, very few), to search for the right edges tree.
  2. O(logN): where N is the number of source nodes (typically, very large, but the fixed size trees are very good at being fast, and they don’t have to do much).
  3. O(N): where N is the number of connection between the source node and the destination node.

I’m excluding the cost of loading the edge data because this is an O(1) operation and is directly relevant to the iteration over all nodes connected to the source node.

Ideally, we can find a way to turn the second operation into an O(1) cost, but that should be more than good enough for the moment.

Now, this just gives us traversal of the nodes, but going from here to Breadth First Search/Depth First Search and doing them well is fairly trivial, although there are a few things that we'll need to talk about, but that would serve as a separate post.

Interested in reducing database costs by moving from Oracle Enterprise to open source subscription?  Read the total cost of ownership (TCO) analysis. Brought to you in partnership with MariaDB.

Topics:
graph database ,voron ,performance ,cost

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}