DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Data
  4. Couchbase: Achieving Indexing for Dynamic Fields

Couchbase: Achieving Indexing for Dynamic Fields

Let's take a look at a tutorial that explains how to achieve indexes for dynamic fields and see how to get the solution.

Ayman Imam user avatar by
Ayman Imam
·
Aug. 28, 18 · Tutorial
Like (9)
Save
Tweet
Share
6.73K Views

Join the DZone community and get the full member experience.

Join For Free

The Problem

After deciding to use Couchbase 5.1 instead of Cassandra in a certain project, there were different challenges to face, although there were also many benefits and features that it provided.

One of the most challenging things that I faced was how I was going to index a sub-document with a dynamic key inside the main document. Look at the following example.

Suppose that we have a Couchbase bucket called "cars." This bucket is storing car dealership data, which has different types of cars. In this bucket, the car's information is stored as follows:

JSON document 1

JSON document 2

{
  "company": "Fiat",
  "model": "Punto",
  "types": {
    "fiat-punto-01": {
      "subModel": "classic",
      "engineData": {
        "cc": "1300",
        "hp": 60
      },
      "price": 10000
    },
    "fiat-punto-02": {
      "subModel": "evo",
      "engineData": {
        "cc": "1400",
        "hp": 100
      },
      "price": 14000
    }
  }
}
{
  "company": "Toyota",
  "model": "Corolla",
  "types": {
    "toyo-coro-01": {
      "subModel": "2018",
      "engineData": {
        "cc": "1400",
        "hp": 100
      },
      "price": 12000
    },
    "toyo-coro-02": {
      "subModel": "2018",
      "engineData": {
        "cc": "1500",
        "hp": 120
      },
      "price": 15000
    }
  }
}

So, what if we want to query this bucket by the "enginData". To do that, we need to get all sub-models dynamic IDs (for ex. "fiat-punto-2" and "toyo-coro-01") and their corresponding document IDs, which has the same engine data (for ex. enginData={"cc": "1400", "hp": 100}).

This looks complex, doesn't it?

This is an issue that had appeared while implementing one of the business requirements. Of course, one of the possible solutions is to restructure the car's JSON document to make it easier to create an index on the "engineData". But actually, due to some technical difficulties and also the fact that this structure has some more benefits for other business requirements, this solution is not an option.

Let's have a look at the solution on how to solve this issue using Couchbase global secondary index GSI.

The Solution

The issue with Couchbase GSI is that the field that you need to index should be a static field (just the opposite of dynamic). 

-- N1QL Syntax
CREATE INDEX index_name ON (index_key1, ..., index_keyN) USING GSI;

And this is not going to work in our case. However, after some searching, I found that there is something called "Array Indexing," which can be used to index the elements inside an array.

-- N1QL Syntax
CREATE INDEX [ index_name ] 
ON named_keyspace_ref ( index_key1, index_key2 , index_key3, ... )
 [ WHERE filter_expr ]
 [ USING GSI ]
 [ WITH { "nodes": [ "node_name" ], 
          "defer_build": true | false
        }
 ];

But wait, how can we use this type of index in our case? We need to transform the contents of "types" in the previous car's documents into an array.

Here, Couchbase object functions come to the rescue. After some searching in Couchbase object functions, I found there is a function called OBJECT_PAIRS, which actually returns an array of arrays of values that contain the attribute name and value pairs of the given object. So, the dynamic "sub-model" IDs will be converted to static fields now. Let's see an example.

If we run the following N1QL against the "car's" bucket:

-- N1QL Syntax
SELECT OBJECT_PAIRS(types) FROM cars USE KEYS ["1", "2"];

The result will be:

[
  {
    "$1": [
      {
        "name": "fiat-punto-01",
        "val": {
          "engineData": {
            "cc": "1300",
            "hp": 60
          },
          "price": 10000,
          "subModel": "classic"
        }
      },
      {
        "name": "fiat-punto-02",
        "val": {
          "engineData": {
            "cc": "1400",
            "hp": 100
          },
          "price": 14000,
          "subModel": "evo"
        }
      }
    ]
  },
  {
    "$1": [
      {
        "name": "toyo-coro-01",
        "val": {
          "engineData": {
            "cc": "1400",
            "hp": 100
          },
          "price": 12000,
          "subModel": "2018"
        }
      },
      {
        "name": "toyo-coro-02",
        "val": {
          "engineData": {
            "cc": "1500",
            "hp": 120
          },
          "price": 15000,
          "subModel": "2018"
        }
      }
    ]
  }
]

That's great, now we have the dynamic values converted to an array with "name" field equal to the dynamic ID and "value" field equal to the data of the given ID.

Then, we can use the array indexing on this array, so the index will be:

-- N1QL Syntax
CREATE INDEX idx_sub_models_id 
  ON `cars` ( DISTINCT ARRAY t.val.engineData FOR t IN object_pairs(`types`) END ) 
  using GSI;

This N1QL is doing the following.

  1. It uses the object_pairs  function to convert "types" into an array of arrays of "name" and "val".

  2. It uses Array Indexing to index the result array on "val.enginData"

After creating this index, we need to write the appropriate N1QL query to get the sub-model's IDs and the corresponding document ID for a given engine data. Let me jump directly to the N1QL and then I'll explain it.

-- N1QL Syntax
SELECT
  meta(cars).id AS carId,
  ARRAY t.name FOR t IN object_pairs(`types`) WHEN t.val.engineData = {"cc":"1400","hp":100} END AS carSubId
FROM cars
WHERE ANY t IN object_pairs(`types`) SATISFIES t.val.engineData = {"cc":"1400","hp":100} END;

What this query does:

  1. It selects the document ID as "carId".

  2. It selects the "name" as "carSubId" from the returned array from object_pairs function given a specific engineData in the WHEN clause.

  3. The second part of the select is using ARRAY t.name ... because this is called "Collection operations," which means we have a collection returned from object_pairs function, and we need to filter it using a specific condition, which is, in our case, the given "engineData".

  4. It uses WHERE clause to use the created index and to query the documents with the given "engineData".

The result of the previous query is:

[
  {
    "carId": "1",
    "carSubId": [
      "fiat-punto-02"
    ]
  },
  {
    "carId": "2",
    "carSubId": [
      "toyo-coro-01"
    ]
  }
]

Finally, we have what we were asking for.

One final explanation about the previous N1QL. We used the collections operations because without it, the query will not work correctly. This is because if the condition in the WHERE clause is true, the full document will be returned, not only the sub-model ID that satisfies the index and the query. So, we should filter the returned documents using this collections operation.

So, if we run the following query only:

-- N1QL Syntax
SELECT
  meta(cars).id AS carId,
  object_pairs(`types`)[*].name AS carSubId
FROM cars
WHERE ANY t IN object_pairs(`types`) SATISFIES t.val.engineData = {"cc":"1400","hp":100} END;

The result will be:

[
  {
    "carId": "1",
    "carSubId": [
      "fiat-punto-01",
      "fiat-punto-02"
    ]
  },
  {
    "carId": "2",
    "carSubId": [
      "toyo-coro-01",
      "toyo-coro-02"
    ]
  }
]

This is obviously wrong because full documents are returned since both documents are satisfying the index condition and there is no filtering for the sub-model IDs.

Long story short, combining "Array Indexing" with "Object Functions" enriches N1QL and is really useful for such complex queries.

Let me know your thoughts in the comments!

Database code style Data structure

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Utilize OpenAI API to Extract Information From PDF Files
  • Taming Cloud Costs With Infracost
  • Building a Scalable Search Architecture
  • How and Why You Should Start Automating DevOps

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: