Over a million developers have joined DZone.

Modelling Hierarchical Data with RavenDB

· Java Zone

Check out this 8-step guide to see how you can increase your productivity by skipping slow application redeploys and by implementing application profiling, as you code! Brought to you in partnership with ZeroTurnaround.

A very common requirement with any database engine is to be able to store and query hierarchical data. This usually comes up in the context of categories for products in an e-commerce website, in a site-structure context (this page needs to be a parent of that page) or when trying to model company hierarchies tree (for deciding on permissions, for example).

While RavenDB applications need to solve the same kind of problems, a good solution often times looks very different from what you'd expect.

The immediate solution seems quite obvious, especially if you've solved a similar problem before with other types of databases (relational, for example). A Category document (shown here as a POCO) will contain its parent ID, and using some advanced indexing techniques one can get standard hierarchical queries working:

public class Category
    public string Id { get; set; }
    public string ParentId { get; set; }
    public string Name { get; set; }


However, there's a significant indexing latency and performance hit involved using this technique. All-in-all, this is usually a method unfit for document databases, as it employs relational thinking and takes no advantage whatsoever of the features available provided by the document database.

Instead, try thinking of a hierarchy tree as an entity of its own.

With a document-database, hierarchies are usually best modeled in one document that defines the hierarchy. In our scenario of categories that would be to define the categories tree, where the categories themselves can be represented by standalone documents (and thus hold Name, Description etc, and allow for other collection to reference them), or not if you don't need them to exist separately (less common, and will somewhat bloat the tree document).

Modeled from code, a Category document would look something like this:

public class Category
    public string Id { get; set; }
    public string Name { get; set; }
    // other meta-data that you want to store per category, like image etc

And the hierarchy tree document can be serialized from a class like the following, where this class can have methods for making nodes in it easily accessible:

public class CategoriesHierarchyTree
    public class Node
       public string CategoryId { get; set; }
       public List<Node> Children { get; set; }

    public List<Node> RootCategories { get; set; }

    // various methods for looking up and updating tree structure

This approach of hierarchy-tree has several important advantages:

  1. One transactional scope - when the tree changes, the tree changes in one transaction, always. You cannot get affected by multiple concurrent changes to the tree since you can leverage optimistic concurrency when editing this one document. Using the approach you propose it is impossible to guarantee that therefore harder to guarantee the completeness and correctness of the hierarchy tree over time. If you think of a hierarchy as a tree, it actually makes a lot of sense to have each change lock the entire tree until it completes. The hierarchy tree is one entity.
  2. Caching - the entire hierarchy can be quickly and efficiently cached, even using aggressive caching which will minimize the times the server is accessed with queries on hierarchy.
  3. All operations are done entirely in-memory - since its one document, aka object, all queries on the hierarchy (whose the parent of, list of children etc) are made entirely in-memory and effectively cost close to nothing to perform. Using an index with Recurse() to answer such queries is order of magnitude costlier (network costs and computational). If performance is your biggest concern - this is definitely a winner.
  4. Multiple parents per category, no denormalization - if a category document is saved outside the hierarchy tree, like demonstrated above, you can effectively put a category under multiple parents without the need to denormalize. All category data is in one place, in a document outside of the tree, and the tree only holds a reference to the category.

I highly recommend going with this approach. It is a bit of a shift from the relational mindset, but its so worth it, even when the tree grows big.

The Java Zone is brought to you in partnership with ZeroTurnaround. Check out this 8-step guide to see how you can increase your productivity by skipping slow application redeploys and by implementing application profiling, as you code!


Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}