Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Couchbase N1QL Query With Self-Referencing Hierarchy

DZone 's Guide to

Couchbase N1QL Query With Self-Referencing Hierarchy

What are hierarchical data structures? Check out a Couchbase N1QL query with self-referencing hierarchy.

· Database Zone ·
Free Resource

Couchbase N1QL

Let's look at a Couchbase N1QL query.

A data construct that often appears in business applications is the hierarchical data structure. Hierarchy captures the parent-child relationship often between the same object. For instance, a company structure captures the reporting line between employees. A business organization captures the relationship between parent companies and subsidiaries. Territory hierarchies in Sales. Book of accounts in financial applications.

Due to the self-referencing nature of hierarchy, querying the structure efficiently along with its associated data can be a challenge for RDBMSs, particularly from a performance perspective. In this article, I will discuss how a traditional RDBMS handles hierarchical queries, the challenges that it has to deal with, and how this issue can be similarly addressed with Couchbase N1QL and Couchbase GSI.

Hierarchy Structure in Applications

The main reason for gathering information in a hierarchical structure is to improve the understanding of the information. The company reporting structure is designed not only to help with how the organization is managed but also to provide a structure to measure and optimize the effectiveness of each group.

In a large organization, sales performance is often assessed not only at the individual level but at the sales team. In short, businesses organize information in a hierarchical structure so that they can get a better understanding of business performance. In order to achieve that goal, the business needs an efficient means to query the hierarchical data.

Representation of Company Hierarchy

While the database data model is capable of efficiently capturing the hierarchical structure, the difficulty arises when you need to query the hierarchical data and the related information.

Consider this requirement: Get a total sales order value for all sales reps who report up to ThomasH-Sales VP

While the data model is relatively simple, the hierarchical nature of the sales organization suggests an inherent dynamic structure in the reporting hierarchy. AjayW, who heads the Region1 territory, also manages the sales team members directly. Whereas in Region2, Liz L manages two Managers who then manages the sales team. This is typical in most application data hierarchies.

RDBMS Approach

In order to query hierarchical data, more established RDBMSs such as Oracle and SQL Server provide the CONNECT BY/START WITH construct to allow for a single query to recursively traverse the hierarchy employee structure.

Query: Sales orders generated by members who report up to ThomasH-Sales VP</strong></span>

SELECT e.EmpID, e.Name, e.ManagerID, sum(o.orderVal)
   FROM employee e
   INNER JOIN sales_order o ON o.EmpID = o.EmpId
   START WITH EmpID = 101
   CONNECT BY PRIOR EmpID = ManagerID
   GROUP BY  e.EmpID, e.Name, e.ManagerID;

While the above query may look simple, the query performance is difficult to improve with indexes due to the recursive nature of the CONNECT BY implementation. For this reason, this technique is not popular in application development for systems with a large volume of data. Instead, enterprise applications rely on a pre-flattened object structure for a more predictable query performance. The flattened hierarchy technique is described in the Couchbase N1QL section below.

Couchbase N1QL

In order to get the best query performance, N1QL applications should use the flattened hierarchy structure. The approach provides more predictable performance and is better at leveraging Couchbase GSI.

The diagram below shows an example of the flattening transformation of a self-referencing hierarchical structure, such as the employee object. I also include a Python code snippet below that you can use to flatten the hierarchical structure. The flatten_hierarchy function takes a self-referencing hierarchical JSON object and generates two new objects in the same keyspace but with different type values.

  • The object_hier structure works with aggregated BI queries where query results can be rolled up for each level.
  • The object_hier_level structure provides the same result as the CONNECT BY/START WITH clause, which returns all of the child objects for any given node. This is the object that we will use in our N1QL query to provide the query solution.
N1QL Query: Sales orders generated by members who report up to ThomasH-Sales VP


SELECT e.id,sum(a.value)
FROM crm a
INNER JOIN (
      SELECT uhl.id FROM crm uhl WHERE uhl.type ='_employee_hier_level'
     AND uhl.parent='101') e USE HASH(probe) ON a.owner = e.id
WHERE a.type='sales_order'
GROUP BY e.id

Recommended GS Index:

CREATE INDEX `crm_employee_hier_level` ON `crm`(`parent`) WHERE (`type` = "_employee_hier_level")

Notes:

  1. The main query retrieves all sales order in the crm bucket with type value = 'salesorder'
  2. The query performs a HASH JOIN with another query (N1QL 6.5 feature) that retrieves all the employee ids who report to user101, i.e ThomasH-SalesVP
  3. Additional covering indexes can also improve query performance
  4. The query uses N1QL 6.5 ANSI JOIN Support for Expression and Subquery Term
# Python code to flatten a JSON document in Couchbase bucket
from couchbase.cluster import Cluster
from couchbase.cluster import PasswordAuthenticator
from couchbase.exceptions import NotFoundError
from couchbase.bucket import Bucket
from couchbase.n1ql import N1QLQuery

def flatten_hierarchy(cb,bucketname,src_doc_type,num_hier_level,node_id_col, parent_id_col):
# Example: flatten_hierarchy(cb,args.bucket,'user',4,'id','managerid')
# cb - couchbase bucket handle
# bucketname - name of the bucket for the source and target documents 
# src_doc_type - the type value of the source document
# num_hier_level - specifies the number of levels that are needed. Should be the max depth of the hierarchy
# node_id - the field name in the document for the key node id
# parent_node_id - the parent field 
#
    gen_doc_type = '_'+src_doc_type+'_hier'
    if (num_hier_level > 1):
        qstr_ins = 'INSERT into '+bucketname+' (KEY UUID(), VALUE ndoc) '
        qstr_sel = 'SELECT { "type":"'+gen_doc_type+'"'     
        for i in range(1,num_hier_level+1):
            qstr_sel += ',"id'+str(i)+'":l'+str(i)+'.'+node_id_col
        qstr_sel += '} ndoc'     
        qstr_sel_one =  ' SELECT '+node_id_col+','+parent_id_col+' FROM '+bucketname+' WHERE type="'+src_doc_type+'"'
        for i in range(1,num_hier_level+1):
            if (i==1):
                qstr_sel += ' FROM ('+qstr_sel_one+') l'+str(i)
            else:
                qstr_sel += ' LEFT OUTER JOIN ('+qstr_sel_one
                qstr_sel += ') l'+str(i)+' ON l'+str(i-1)+'.'+parent_id_col+' = l'+str(i)+'.'+node_id_col
        try:
            #q = N1QLQuery(qstring)
            rows = cb.n1ql_query(qstr_ins+qstr_sel).execute()
        except Exception as e:
                print("query error",e)
        # generate connect by

    if (num_hier_level > 1):
        qstr_ins = 'INSERT into '+bucketname+' (KEY UUID(), VALUE ndoc) '
        qstr_sel = 'SELECT { "id":ll.child,"parent":ll.parent,"level":ll.level ,"type":"'+gen_doc_type+'_level" } ndoc FROM ('
        for i in range(1,num_hier_level):
            if (i>1):
                qstr_sel += 'UNION ALL '
            qstr_sel += 'SELECT id'+str(i+1)+' parent, id1 child,'+str(i)+' level FROM '+bucketname+' WHERE type="'+gen_doc_type+'" and id'+str(i+1)+' IS NOT NULL '
        qstr_sel += ') ll'
        try:
            rows = cb.n1ql_query(qstr_ins+qstr_sel).execute()
        except Exception as e:
            print("query error",e)
    return

Conclusion

We would love to hear from you on how you liked the 6.5 features and how it'll benefit your business going forward. Please share your feedback via the comments.


Further Reading

Using N1QL With Couchbase Eventing Functions

N1QL (Query) Basic Order of Execution

Topics:
database ,n1ql query ,self-referencing hierarchy ,chouchbase n1ql ,rdbms ,couchbase gsi

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}