Working With Hierarchies in a NoSQL Database
See how a little bit of database-near code (a codehook) can automatically produce synthetic fields which make hierarchies much simpler to query.
Join the DZone community and get the full member experience.
Join For FreeOrganizing information in hierarchies is something programmers have to deal with from time to time. Examples are:
- Threaded discussions/comments.
- Addresses on a map.
- Folders and documents.
- Organizational structures.
- Storage/shelf locations in a warehouse.
- Pages on a web site.
- Link referrals.
Using a NoSQL document database (or SQL, for that matter), it is quite easy to create a structure to organize this type of information. For each record/document/node, you simply need a reference to the parent (except for the top node).
So what's the problem?
The main problem is that querying and retrieving the whole or parts of the hierarchy will require complex code with recursion and an unknown number of queries which can put a big load on our database. Ouch!
In this blog post, we will walk through a simple solution which lets you retrieve any (or all) parts of a hierarchy (a threaded discussion) using only one efficient database query in restdb.io. To follow along, you can just create a new RestDB database yourself. Everything is done online — you only need a modern browser.
Approaching the Problem
If you have your own server with plenty of memory, a complete hierarchy structure can, of course, be created/recreated after adding a new record by programmatically creating the data structure and then cache it in memory. This is cumbersome — and what do you do if the structure is really big? It will clearly have performance implications.
We will not dive into the various ways of solving the problem this way (depth-first, breadth-first, iterative, a combination, etc.). So how can we avoid this complexity? As pragmatic programmers, we know from filesystems that a path can take us directly to a folder/file
. So how can we create a path for each of our threaded comments? We need to somehow generate a new field and insert the path into it automatically on create/update. This synthetic field can then be queried with a regular expression as we'll see later.
We'll use a little snippet of JavaScript code and write a restdb.io Codehook to help us create two synthetic fields. Codehooks are JavaScript functions which can intercept REST API calls to your database, in many ways similar to SQL triggers/stored procedures.
Setting Up the Database Collection
For this simple demonstration, we're going to create a collection/table structure to represent the discussion threads. We'll create a new collection in restdb.io, comments, and add two fields:
- Text (of type "text").
- Replies (of type "relation/child-list").
"text" will hold just the comment itself, and "replies" is a restdb.io child list relation to comments (read more about parent-child relations in the documentation). Although we could have used a simple text field to hold the unique name of a parent, using a restdb.io child relation will make it easy to navigate the data in the backend, in this case to moderate the discussion (see screenshot below).
Adding Synthetic Fields With Codehooks
As mentioned, what we need in order to query and retrieve the structure in the correct order is something we're familiar with: a path. Each record (or "document" in NoSQL speak) will need a path including the IDs of all its parents and its own id. For example:
"path": "/55bf2d54a8a9a00900000004/55bf2d54a8a9a00900000005/55bf2d54a8a9a00900000006"
In restdb.io, each record gets a unique id when saved. So upon each record creation (POST), we'll run a beforePOST codehook, and add two synthetic fields: thread and level. With these two fields available, we'll see that it becomes much easier to query and display the hierarchy.
After POSTing (creating) a new record, the following script will insert the synthetic variables we need (thread, level). Check out the JavaScript source with comments below.
function afterPOST(req,res){
var parent_id = req.body._parent_id || "";
// get the parent record (if any)
db.get("/rest/comments",{_id:parent_id},{},function(err,parentresult){
if (!err){
var parent = parentresult[0];
log.debug("parent",parent);
log.debug("body",req.body);
if (req.body._id){
var thread = null;
// we'll use _id in the thread
if (parent){
// add the parents thread in addition to the _id of this record
thread = parent.thread + "/" + req.body._id;
}else{
// no parent - the tread is just the _id
thread = "/" + req.body._id;
}
// find the level of this new node
var level = thread.split("/").length-1;
// update this record with the thread and the level we have computed
db.patch("/rest/comments/"+req.body._id,{thread:thread, level:level},function(err,result){
if (err){
log.error("Unable to patch");
}
res.end();
});
}else{
log.error("No id");
res.end();
}
}
else{
log.error("Error in query ",err);
res.end();
}
})
}
With this codehook in place, a comments record will now look like this after save:
{
"_id": "5901b44587cad85a000009d5",
"text": "What a headless comment! Disagree.",
"thread": "/5900a9a787cad85a000005ec/5901b44587cad85a000009d5",
"level": 2,
"_parent_id": "5900a9a787cad85a000005ec",
"_parent_def": "comments",
"_parent_field": "replies",
"_created": "2017-04-27T09:05:09.755Z",
"_changed": "2017-04-27T09:05:09.974Z",
"_createdby": "knutmt@restdb.io",
"_changedby": "api",
"_version": 1
}
The fields with underscore (_
) are restdb.io system fields. We see that this record has two new fields: thread
and level
. thread
is a path-like string containing the id of the parent and the id of the record itself and level
is the count of the number of parents + 1
.
Querying (MongoDB Syntax)
If you know MongoDB NoSQL query syntax, you'll feel at home querying with the restdb.io REST API. Here, we'll show how simple it can be to:
- Fetch the hierarchy of all comments in sorted order.
- Fetch a comment and all its children.
- Fetch all children comments on a specific level.
- Use a restdb.io Page to query and show the results in HTML using only serverside HandlebarsJS tags.
Fetching the Hierarchy of All Comments in Correct Order
This is the simplest query and only requires that we sort on the thread
field.
https://hierarchies-fa4f.restdb.io/rest/comments?sort=thread
We'll not show you all the JSON objects from the query result here, but below at the end of the article, you can see the result of this query used on a dynamic HTML page hosted in the database.
Fetching a Comment and All Children
Let's check out a REST API query and the query result.
REST API Query
The $regex
query is perfect to query path strings. Here, we query all comments
starting with the parent comment's ID. We also add a sort parameter to get out the subtree in the correct order.
https://hierarchies-fa4f.restdb.io/rest/comments?q={"thread":{"$regex":"^\/5900a9a787cad85a000005ec"}&sort=thread
Query Result
[{
_id: "5900a9a787cad85a000005ec",
text: "The word serverless is meaningless",
thread: "/5900a9a787cad85a000005ec",
level: 1
},
{
_id: "5901b44587cad85a000009d5",
text: "What a headless comment! Disagree.",
thread: "/5900a9a787cad85a000005ec/5901b44587cad85a000009d5",
level: 2
},
{
_id: "5978557f922946410001fc5a",
text: "The word meaningless is without meaning",
thread: "/5900a9a787cad85a000005ec/5978557f922946410001fc5a",
level: 2
},
{
_id: "597856b2922946410001fe51",
text: "I would say (meaning)less is more",
thread: "/5900a9a787cad85a000005ec/5978557f922946410001fc5a/597856b2922946410001fe51",
level: 3
}]
Fetching a Specific Level Only
Just adding {..., "level": 2}
to our query will give us only child comments from level 2:
https://hierarchies-fa4f.restdb.io/rest/comments?q={"thread":{"$regex":"^\/5900a9a787cad85a000005ec"},"level":2}&sort=thread
...gives:
[{
_id: "5901b44587cad85a000009d5",
text: "What a headless comment! Disagree.",
thread: "/5900a9a787cad85a000005ec/5901b44587cad85a000009d5",
level: 2
},
{
_id: "5978557f922946410001fc5a",
text: "The word meaningless is without meaning",
thread: "/5900a9a787cad85a000005ec/5978557f922946410001fc5a",
level: 2
}]
Other Queries
There are, of course, many ways you can query the database. You can use operators like {$and: [...]}
to get out two levels for example. This is for you to try out. The core idea is that you can query hierarchies much more efficiently when you can use a path and a level as we've shown.
Displaying the Hierarchy Using a Little HTML in the Database
To conclude this article, we'll quickly show how you can run a server-side query to display data from the database. This can be useful for reporting and for setting up simple database-driven websites using HTML and no JavaScript. For more interactive web sites, you would, of course, use a framework like React, Angular, VueJS, or good old jQuery.
The code below shows our database query and all the CSS and HTML needed.
{{#context}}
{
"comments": {
"collection": "comments",
"query": {},
"hints": {"$orderby":{"thread":1}}
}
}
{{/context}}
<html>
<head>
<link href="https://maxcdn.bootstrapcdn.com/bootswatch/3.3.7/flatly/bootstrap.min.css" rel="stylesheet">
<style>
.media-heading{
margin-left:10px;
}
span.space + span.space {
margin-left:35px;
}
#main-container{
padding:15px;
}
.thread{
margin-top:35px;
}
</style>
</head>
<body>
<div id="main-container">
<h2>Threaded discussion output</h2>
<ul class="media-list thread">
{{#each comments}}
<li class="media">
<div class="media-left">
{{{_ 'repeat' '<span class="space"></span>' level}}}
</div>
<div class="media-body">
<div class="media-heading">{{text}}</div>
</div>
</li>
{{/each}}
</ul>
</div>
</body>
</html>
The screenshot below shows the output of our Page with comments and sub-comments indented correctly.
Summary
We have shown how a little bit of database-near code (a codehook) can automatically produce synthetic fields which make hierarchies much simpler to query. The threaded discussion example used here is just a demonstration. A real example would involve a way to actually add comments on a web page (POSTing using the restdb.io REST API and Ajax) and perhaps also authentication using JWT with Auth0. Watch out for a complete example on our demo site: restdb.site.
Please feel free to share this article if you enjoyed it!
Published at DZone with permission of Knut Martin Tornes, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments