Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Couchbase: Achieving Indexing for Dynamic Fields

DZone's Guide to

Couchbase: Achieving Indexing for Dynamic Fields

Let's take a look at a tutorial that explains how to achieve indexes for dynamic fields and see how to get the solution.

· Database Zone ·
Free Resource

Download the Scale-Out and High Availability whitepaper. Learn why leading enterprises choose the Couchbase NoSQL database over MongoDB™ after evaluating side by side.

The Problem

After deciding to use Couchbase 5.1 instead of Cassandra in a certain project, there were different challenges to face, although there were also many benefits and features that it provided.

One of the most challenging things that I faced was how I was going to index a sub-document with a dynamic key inside the main document. Look at the following example.

Suppose that we have a Couchbase bucket called "cars." This bucket is storing car dealership data, which has different types of cars. In this bucket, the car's information is stored as follows:

JSON document 1

JSON document 2

{
  "company": "Fiat",
  "model": "Punto",
  "types": {
    "fiat-punto-01": {
      "subModel": "classic",
      "engineData": {
        "cc": "1300",
        "hp": 60
      },
      "price": 10000
    },
    "fiat-punto-02": {
      "subModel": "evo",
      "engineData": {
        "cc": "1400",
        "hp": 100
      },
      "price": 14000
    }
  }
}
{
  "company": "Toyota",
  "model": "Corolla",
  "types": {
    "toyo-coro-01": {
      "subModel": "2018",
      "engineData": {
        "cc": "1400",
        "hp": 100
      },
      "price": 12000
    },
    "toyo-coro-02": {
      "subModel": "2018",
      "engineData": {
        "cc": "1500",
        "hp": 120
      },
      "price": 15000
    }
  }
}

So, what if we want to query this bucket by the "enginData". To do that, we need to get all sub-models dynamic IDs (for ex. "fiat-punto-2" and "toyo-coro-01") and their corresponding document IDs, which has the same engine data (for ex. enginData={"cc": "1400", "hp": 100}).

This looks complex, doesn't it?

This is an issue that had appeared while implementing one of the business requirements. Of course, one of the possible solutions is to restructure the car's JSON document to make it easier to create an index on the "engineData". But actually, due to some technical difficulties and also the fact that this structure has some more benefits for other business requirements, this solution is not an option.

Let's have a look at the solution on how to solve this issue using Couchbase global secondary index GSI.

The Solution

The issue with Couchbase GSI is that the field that you need to index should be a static field (just the opposite of dynamic). 

-- N1QL Syntax
CREATE INDEX index_name ON (index_key1, ..., index_keyN) USING GSI;

And this is not going to work in our case. However, after some searching, I found that there is something called "Array Indexing," which can be used to index the elements inside an array.

-- N1QL Syntax
CREATE INDEX [ index_name ] 
ON named_keyspace_ref ( index_key1, index_key2 , index_key3, ... )
 [ WHERE filter_expr ]
 [ USING GSI ]
 [ WITH { "nodes": [ "node_name" ], 
          "defer_build": true | false
        }
 ];

But wait, how can we use this type of index in our case? We need to transform the contents of "types" in the previous car's documents into an array.

Here, Couchbase object functions come to the rescue. After some searching in Couchbase object functions, I found there is a function called OBJECT_PAIRS, which actually returns an array of arrays of values that contain the attribute name and value pairs of the given object. So, the dynamic "sub-model" IDs will be converted to static fields now. Let's see an example.

If we run the following N1QL against the "car's" bucket:

-- N1QL Syntax
SELECT OBJECT_PAIRS(types) FROM cars USE KEYS ["1", "2"];

The result will be:

[
  {
    "$1": [
      {
        "name": "fiat-punto-01",
        "val": {
          "engineData": {
            "cc": "1300",
            "hp": 60
          },
          "price": 10000,
          "subModel": "classic"
        }
      },
      {
        "name": "fiat-punto-02",
        "val": {
          "engineData": {
            "cc": "1400",
            "hp": 100
          },
          "price": 14000,
          "subModel": "evo"
        }
      }
    ]
  },
  {
    "$1": [
      {
        "name": "toyo-coro-01",
        "val": {
          "engineData": {
            "cc": "1400",
            "hp": 100
          },
          "price": 12000,
          "subModel": "2018"
        }
      },
      {
        "name": "toyo-coro-02",
        "val": {
          "engineData": {
            "cc": "1500",
            "hp": 120
          },
          "price": 15000,
          "subModel": "2018"
        }
      }
    ]
  }
]

That's great, now we have the dynamic values converted to an array with "name" field equal to the dynamic ID and "value" field equal to the data of the given ID.

Then, we can use the array indexing on this array, so the index will be:

-- N1QL Syntax
CREATE INDEX idx_sub_models_id 
  ON `cars` ( DISTINCT ARRAY t.val.engineData FOR t IN object_pairs(`types`) END ) 
  using GSI;

This N1QL is doing the following.

  1. It uses the object_pairs  function to convert "types" into an array of arrays of "name" and "val".

  2. It uses Array Indexing to index the result array on "val.enginData"

After creating this index, we need to write the appropriate N1QL query to get the sub-model's IDs and the corresponding document ID for a given engine data. Let me jump directly to the N1QL and then I'll explain it.

-- N1QL Syntax
SELECT
  meta(cars).id AS carId,
  ARRAY t.name FOR t IN object_pairs(`types`) WHEN t.val.engineData = {"cc":"1400","hp":100} END AS carSubId
FROM cars
WHERE ANY t IN object_pairs(`types`) SATISFIES t.val.engineData = {"cc":"1400","hp":100} END;

What this query does:

  1. It selects the document ID as "carId".

  2. It selects the "name" as "carSubId" from the returned array from object_pairs function given a specific engineData in the WHEN clause.

  3. The second part of the select is using ARRAY t.name ... because this is called "Collection operations," which means we have a collection returned from object_pairs function, and we need to filter it using a specific condition, which is, in our case, the given "engineData".

  4. It uses WHERE clause to use the created index and to query the documents with the given "engineData".

The result of the previous query is:

[
  {
    "carId": "1",
    "carSubId": [
      "fiat-punto-02"
    ]
  },
  {
    "carId": "2",
    "carSubId": [
      "toyo-coro-01"
    ]
  }
]

Finally, we have what we were asking for.

One final explanation about the previous N1QL. We used the collections operations because without it, the query will not work correctly. This is because if the condition in the WHERE clause is true, the full document will be returned, not only the sub-model ID that satisfies the index and the query. So, we should filter the returned documents using this collections operation.

So, if we run the following query only:

-- N1QL Syntax
SELECT
  meta(cars).id AS carId,
  object_pairs(`types`)[*].name AS carSubId
FROM cars
WHERE ANY t IN object_pairs(`types`) SATISFIES t.val.engineData = {"cc":"1400","hp":100} END;

The result will be:

[
  {
    "carId": "1",
    "carSubId": [
      "fiat-punto-01",
      "fiat-punto-02"
    ]
  },
  {
    "carId": "2",
    "carSubId": [
      "toyo-coro-01",
      "toyo-coro-02"
    ]
  }
]

This is obviously wrong because full documents are returned since both documents are satisfying the index condition and there is no filtering for the sub-model IDs.

Long story short, combining "Array Indexing" with "Object Functions" enriches N1QL and is really useful for such complex queries.

Let me know your thoughts in the comments!

The Forrester Wave™: Big Data NoSQL report. See how the top NoSQL providers stack up. Download now.

Topics:
couchbase ,n1ql ,nosql ,database ,tutorial ,json

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}