Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Joining NoSQL Documents: MongoDB Query Language vs. Couchbase N1QL

DZone's Guide to

Joining NoSQL Documents: MongoDB Query Language vs. Couchbase N1QL

When it comes to bringing NoSQL documents together, which is better: MongoDB or Couchbase? Let's see how their query languages stack up for joins.

· Database Zone
Free Resource

Learn how to move from MongoDB to Couchbase Server for consistent high performance in distributed environments at any scale.

One of the most frequent questions I receive when it comes to NoSQL is on the subject of joining data from multiple documents into a single query result. While this question is brought up more frequently from RDBMS developers, I also receive it from NoSQL developers.

When it comes to data joining, every database does it different, and some require it to be done through the application layer, rather than the database layer. We’re going to explore some data joining options between database technologies.

MongoDB is a popular NoSQL technology, so we’ll be seeing how much easier it is to join documents in Couchbase by comparison.

The Sample Data

For this example, we’ll be basing both MongoDB and Couchbase off two sample documents. Assume we’re working with a classic order and inventory example. For inventory, our documents might look something like this:

{
    "id": "product-1",
    "type": "product",
    "name": "Pokemon Red",
    "price": 29.99
}


While flat, the above document can properly explain one particular product. It has a unique id which will be involved during the join process. For orders, we might have a document that looks like the following:

{
    "id": "order-1",
    "type": "order",
    "products": [
        {
            "product_id": "product-1",
            "quantity": 2
        }
    ]
}


The goal here will be to join these two documents in a single query using both MongoDB and Couchbase. However, query language aside, these documents can always be joined via the application layer through multiple queries. This is not the result we’re after though.

Joining Documents with MongoDB and the $lookup Operator

In recent versions of MongoDB there is a $lookup operator that is part of the aggregation queries. Per the MongoDB documentation, this operator performs as the following:

Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing. The $lookup stage does an equality match between a field from the input documents with a field from the documents of the “joined” collection.

To use the $lookup operator, you’d have something like this:

db.collection.aggregate([
    {
       $lookup:
         {
           from: <collection to join>,
           localField: <field from the input documents>,
           foreignField: <field from the documents of the "from" collection>,
           as: <output array field>
         }
    }
])


Now this is great, but it doesn’t work on relationships found in arrays. This means that the $lookup operation cannot join the product_id found in the products array to another document. Instead the array must be “unwound” or “unnested” first which adds extra complexity to our query:

db.orders.aggregate([
    { $unwind: "$products" },
    {
        $lookup: {
            from: "products",
            localField: "products.product_id",
            foreignField: "_id",
            as: "productObjects"
        }
    }
])


The $unwind operator will flatten the array and then do a join on the now flat objects that were produced. The result of such query would look like this:

{
    "_id" : ObjectId("58a3869acbf64c4ace55e713"),
    "products" : {
        "product_id" : ObjectId("58a3851b2f14a900caa7a731"),
        "quantity" : 2
    },
    "productObjects" : [
        {
            "_id" : ObjectId("58a3851b2f14a900caa7a731"),
            "name" : "Pokemon Red",
            "price" : 29.99
        }
    ]
}


Had there been more than one reference in the array, there would have been more results returned. However, what is returned isn’t very attractive. We still have the old productsobject and now a productsObject array. Further manipulations to the data stream needs to happen.

The productsObject array should be “unwound” and then reconstructed to how we want it. This can be accomplished by doing the following:

db.orders.aggregate([
    { $unwind: "$products" },
    {
        $lookup: {
            from: "products",
            localField: "products.product_id",
            foreignField: "_id",
            as: "productObjects"
        }
    },
    { $unwind: "$productObjects"},
    {
        $project: {
            products: {
                "quantity": "$products.quantity",
                "name": "$productObjects.name",
                "price": "$productObjects.price"
            }
        }
    }
])


Notice that the aggregate query is now getting more complex. After doing the join, the result is “unwound” and then the result is reconstructed using the $project operator.

At this point further manipulations to the result can be made such as grouping the results so that the products objects become a single array again. Each manipulation to the data set requires more aggregation code which can easily become messy, complicated, and difficult to read.

This is where Couchbase N1QL becomes so much more pleasant to work with.

Using Couchbase and N1QL to Join NoSQL Documents

Let’s use the same document example that we used for MongoDB. This time we’re going to write SQL queries with N1QL to get the job done.

The first thing that comes to mind might be to use a JOIN in SQL. Our query might look something like this:

SELECT orders.*, product
FROM example AS orders
JOIN example AS product ON KEYS orders.products[*].product_id
WHERE orders.type = 'order'


In the above example, both the documents exist in the same Couchbase Bucket. A JOINagainst document ids happens based on the product_id values found in the productsarray. The above query would yield results that look like this:

[
  {
    "id": "order-1",
    "product": {
      "id": "product-1",
      "name": "Pokemon Red",
      "price": 29.99,
      "type": "product"
    },
    "products": [
      {
        "product_id": "product-1",
        "quantity": 2
      }
    ],
    "type": "order"
  }
]


Like with MongoDB, there will be a result for every item of the products array that matches. In fairness, while the N1QL version was easier to write, it wasn’t necessarily any more difficult than the MongoDB Query Language at this point. As we manipulate the data more, Couchbase becomes a lot easier by comparison.

For example, let’s say we wanted to clean up the results:

SELECT orders.id, orders.type, OBJECT_PUT(product, "quantity", products.quantity) AS product
FROM example AS orders
UNNEST orders.products AS products
JOIN example AS product ON KEYS products.product_id
WHERE orders.type = 'order'


There are some major differences in what we’re doing in the above, but minor differences in how we’re doing them. Instead of joining directly on the array, we are first flattening or “unnesting” the array, like what we saw in the MongoDB $unwind operator. The join is now happening on each of the flattened results. Finally, the quantity from the original object is added to the new object.

The result to the above query would look something like this:

[
  {
    "id": "order-1",
    "product": {
      "id": "product-1",
      "name": "Pokemon Red",
      "price": 29.99,
      "quantity": 2,
      "type": "product"
    },
    "type": "order"
  }
]


Let’s say that the original products array had more than one product reference in it. Instead of returning several objects based on the JOIN criteria we saw above, it might make sense to re-pack that original array.

SELECT orders.id, orders.type, ARRAY_AGG(OBJECT_PUT(product, "quantity", products.quantity)) AS products
FROM example AS orders
UNNEST orders.products AS products
JOIN example AS product ON KEYS products.product_id
WHERE orders.type = 'order'
GROUP BY orders


In the above query we’ve only added ARRAY_AGG and a GROUP BY, but as a result, each joined document shows up in the products array instead of the id value.

Don’t want to use an actual JOIN operator? Try using a SQL subquery instead.

Conclusion

Joining data in NoSQL is a very popular concern for developers that are seasoned RDBMS veterans. Because MongoDB is a very popular NoSQL technology, I thought it would be good to use as a comparison to how Couchbase handles document joining. For light operations, MongoDB’s $lookup operator is tolerable, but as queries become more complex, you may need to take a step back. With N1QL, writing complex queries that include joining operations become very easy and stay easy regardless on how complex the query is.

For more information on N1QL and Couchbase, visit the Couchbase Developer Portal.

Want to deliver a whole new level of customer experience? Learn how to make your move from MongoDB to Couchbase Server.

Topics:
nosql ,document database ,mongodb ,couchbase ,database

Published at DZone with permission of Nic Raboy, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}