Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Working With Hierarchies in a NoSQL Database

DZone's Guide to

Working With Hierarchies in a NoSQL Database

See how a little bit of database-near code (a codehook) can automatically produce synthetic fields which make hierarchies much simpler to query.

· Database Zone
Free Resource

Running out of memory? Learn how Redis Enterprise enables large dataset analysis with the highest throughput and lowest latency while reducing costs over 75%! 

Organizing information in hierarchies is something programmers have to deal with from time to time. Examples are:

  • Threaded discussions/comments.
  • Addresses on a map.
  • Folders and documents.
  • Organizational structures.
  • Storage/shelf locations in a warehouse.
  • Pages on a web site.
  • Link referrals.

Using a NoSQL document database (or SQL, for that matter), it is quite easy to create a structure to organize this type of information. For each record/document/node, you simply need a reference to the parent (except for the top node).

So what's the problem?

The main problem is that querying and retrieving the whole or parts of the hierarchy will require complex code with recursion and an unknown number of queries which can put a big load on our database. Ouch!

In this blog post, we will walk through a simple solution which lets you retrieve any (or all) parts of a hierarchy (a threaded discussion) using only one efficient database query in restdb.io. To follow along, you can just create a new RestDB database yourself. Everything is done online — you only need a modern browser.

Approaching the Problem

If you have your own server with plenty of memory, a complete hierarchy structure can, of course, be created/recreated after adding a new record by programmatically creating the data structure and then cache it in memory. This is cumbersome — and what do you do if the structure is really big? It will clearly have performance implications.

We will not dive into the various ways of solving the problem this way (depth-first, breadth-first, iterative, a combination, etc.). So how can we avoid this complexity? As pragmatic programmers, we know from filesystems that a path can take us directly to a folder/file. So how can we create a path for each of our threaded comments? We need to somehow generate a new field and insert the path into it automatically on create/update. This synthetic field can then be queried with a regular expression as we'll see later.

We'll use a little snippet of JavaScript code and write a restdb.io Codehook to help us create two synthetic fields. Codehooks are JavaScript functions which can intercept REST API calls to your database, in many ways similar to SQL triggers/stored procedures.

Setting Up the Database Collection

For this simple demonstration, we're going to create a collection/table structure to represent the discussion threads. We'll create a new collection in restdb.io, comments, and add two fields:

  • Text (of type "text").
  • Replies (of type "relation/child-list").

"text" will hold just the comment itself, and "replies" is a restdb.io child list relation to comments (read more about parent-child relations in the documentation). Although we could have used a simple text field to hold the unique name of a parent, using a restdb.io child relation will make it easy to navigate the data in the backend, in this case to moderate the discussion (see screenshot below). Navigating hierarchy in the backend UI

Adding Synthetic Fields With Codehooks

As mentioned, what we need in order to query and retrieve the structure in the correct order is something we're familiar with: a path. Each record (or "document" in NoSQL speak) will need a path including the IDs of all its parents and its own id. For example:

"path": "/55bf2d54a8a9a00900000004/55bf2d54a8a9a00900000005/55bf2d54a8a9a00900000006"

In restdb.io, each record gets a unique id when saved. So upon each record creation (POST), we'll run a beforePOST codehook, and add two synthetic fields: thread and level. With these two fields available, we'll see that it becomes much easier to query and display the hierarchy.

After POSTing (creating) a new record, the following script will insert the synthetic variables we need (thread, level). Check out the JavaScript source with comments below.

function afterPOST(req,res){
    var parent_id = req.body._parent_id || "";
    // get the parent record (if any)
    db.get("/rest/comments",{_id:parent_id},{},function(err,parentresult){
        if (!err){
           var parent = parentresult[0];
           log.debug("parent",parent);
           log.debug("body",req.body);
           if (req.body._id){
               var thread = null;
               // we'll use _id in the thread
               if (parent){
                   // add the parents thread in addition to the _id of this record
                   thread = parent.thread + "/" + req.body._id;
               }else{
                   // no parent - the tread is just the _id
                   thread = "/" + req.body._id;
               }
               // find the level of this new node
               var level = thread.split("/").length-1;
               
               // update this record with the thread and the level we have computed
               db.patch("/rest/comments/"+req.body._id,{thread:thread, level:level},function(err,result){
                    if (err){
                        log.error("Unable to patch");
                    }
                    res.end();
               });
           }else{
                log.error("No id");
                res.end();      
           }
       }
       else{
           log.error("Error in query ",err);
           res.end();
       }
   })
}

With this codehook in place, a comments record will now look like this after save:

{
  "_id": "5901b44587cad85a000009d5",
  "text": "What a headless comment! Disagree.",
  "thread": "/5900a9a787cad85a000005ec/5901b44587cad85a000009d5",
  "level": 2,
  "_parent_id": "5900a9a787cad85a000005ec",
  "_parent_def": "comments",
  "_parent_field": "replies",
  "_created": "2017-04-27T09:05:09.755Z",
  "_changed": "2017-04-27T09:05:09.974Z",
  "_createdby": "knutmt@restdb.io",
  "_changedby": "api",
  "_version": 1
}

The fields with underscore (_) are restdb.io system fields. We see that this record has two new fields: thread and level. thread is a path-like string containing the id of the parent and the id of the record itself and level is the count of the number of parents + 1.

Querying (MongoDB Syntax)

If you know MongoDB NoSQL query syntax, you'll feel at home querying with the restdb.io REST API. Here, we'll show how simple it can be to:

  • Fetch the hierarchy of all comments in sorted order.
  • Fetch a comment and all its children.
  • Fetch all children comments on a specific level.
  • Use a restdb.io Page to query and show the results in HTML using only serverside HandlebarsJS tags.

Fetching the Hierarchy of All Comments in Correct Order

This is the simplest query and only requires that we sort on the thread field.

https://hierarchies-fa4f.restdb.io/rest/comments?sort=thread 

We'll not show you all the JSON objects from the query result here, but below at the end of the article, you can see the result of this query used on a dynamic HTML page hosted in the database.

Fetching a Comment and All Children

Let's check out a REST API query and the query result.

REST API Query

The $regex query is perfect to query path strings. Here, we query all comments starting with the parent comment's ID. We also add a sort parameter to get out the subtree in the correct order.

https://hierarchies-fa4f.restdb.io/rest/comments?q={"thread":{"$regex":"^\/5900a9a787cad85a000005ec"}&sort=thread 

Query Result

[{
    _id: "5900a9a787cad85a000005ec",
    text: "The word serverless is meaningless",
    thread: "/5900a9a787cad85a000005ec",
    level: 1
},
{
    _id: "5901b44587cad85a000009d5",
    text: "What a headless comment! Disagree.",
    thread: "/5900a9a787cad85a000005ec/5901b44587cad85a000009d5",
    level: 2
},
{
    _id: "5978557f922946410001fc5a",
    text: "The word meaningless is without meaning",
    thread: "/5900a9a787cad85a000005ec/5978557f922946410001fc5a",
    level: 2
},
{
    _id: "597856b2922946410001fe51",
    text: "I would say (meaning)less is more",
    thread: "/5900a9a787cad85a000005ec/5978557f922946410001fc5a/597856b2922946410001fe51",
    level: 3
}]

Fetching a Specific Level Only

Just adding {..., "level": 2} to our query will give us only child comments from level 2:

https://hierarchies-fa4f.restdb.io/rest/comments?q={"thread":{"$regex":"^\/5900a9a787cad85a000005ec"},"level":2}&sort=thread 

...gives:

[{
    _id: "5901b44587cad85a000009d5",
    text: "What a headless comment! Disagree.",
    thread: "/5900a9a787cad85a000005ec/5901b44587cad85a000009d5",
    level: 2
},
{
    _id: "5978557f922946410001fc5a",
    text: "The word meaningless is without meaning",
    thread: "/5900a9a787cad85a000005ec/5978557f922946410001fc5a",
    level: 2
}]

Other Queries

There are, of course, many ways you can query the database. You can use operators like {$and: [...]} to get out two levels for example. This is for you to try out. The core idea is that you can query hierarchies much more efficiently when you can use a path and a level as we've shown.

Displaying the Hierarchy Using a Little HTML in the Database

To conclude this article, we'll quickly show how you can run a server-side query to display data from the database. This can be useful for reporting and for setting up simple database-driven websites using HTML and no JavaScript. For more interactive web sites, you would, of course, use a framework like React, Angular, VueJS, or good old jQuery.

The code below shows our database query and all the CSS and HTML needed.

{{#context}}
{
    "comments": {
        "collection": "comments",
        "query": {},
        "hints": {"$orderby":{"thread":1}}
    }
 }
{{/context}}


<html>
    <head>
        <link href="https://maxcdn.bootstrapcdn.com/bootswatch/3.3.7/flatly/bootstrap.min.css" rel="stylesheet">
        <style>
           .media-heading{
               margin-left:10px;
           }
           span.space + span.space {
            margin-left:35px;
            }
            #main-container{
                padding:15px;
            }
            .thread{
                margin-top:35px;
            }
        </style>
    </head>
    <body>
        <div id="main-container">
            <h2>Threaded discussion output</h2>
            <ul class="media-list thread">
        {{#each comments}}
            <li class="media">
                <div class="media-left">
                {{{_ 'repeat' '<span class="space"></span>' level}}}
                </div>
                <div class="media-body">
                    <div class="media-heading">{{text}}</div>
                </div>
            </li>
        {{/each}}
            </ul>
        </div>
    </body>
</html>

The screenshot below shows the output of our Page with comments and sub-comments indented correctly. Output of HTML and serverside query

Summary

We have shown how a little bit of database-near code (a codehook) can automatically produce synthetic fields which make hierarchies much simpler to query. The threaded discussion example used here is just a demonstration. A real example would involve a way to actually add comments on a web page (POSTing using the restdb.io REST API and Ajax) and perhaps also authentication using JWT with Auth0. Watch out for a complete example on our demo site: restdb.site.

Please feel free to share this article if you enjoyed it!

Running out of memory? Never run out of memory with Redis Enterprise databaseStart your free trial today.

Topics:
database ,nosql ,hierarchies ,tutorial ,codehook ,queries

Published at DZone with permission of Knut Martin Tornes, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}