DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • MongoDB to Couchbase: An Introduction to Developers and Experts
  • How to Store Text in PostgreSQL: Tips, Tricks, and Traps
  • MongoDB to Couchbase for Developers, Part 1: Architecture
  • Introduction to Couchbase for Oracle Developers and Experts: Part 2 - Database Objects

Trending

  • The Human Side of Logs: What Unstructured Data Is Trying to Tell You
  • Concourse CI/CD Pipeline: Webhook Triggers
  • Automatic Code Transformation With OpenRewrite
  • AI's Dilemma: When to Retrain and When to Unlearn?
  1. DZone
  2. Data Engineering
  3. Databases
  4. ElasticSearch: Parent and Child Joins — Game of Thrones Edition

ElasticSearch: Parent and Child Joins — Game of Thrones Edition

ElasticSearch is not a relational database, it is all about search efficiency and not storage efficiency.

By 
Sohan Ganapathy user avatar
Sohan Ganapathy
·
Jun. 17, 19 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
19.2K Views

Join the DZone community and get the full member experience.

Join For Free

In a relational database, a child table references the parent with a foreign key and this relationship is called a Join. The design typically involves normalizing the data.

ElasticSearch is not a relational database, it is all about search efficiency and not storage efficiency. The data stored is denormalized and is pretty much flat. What that means is joins cannot be across Indexes, ElasticSearch is all about speed and traditional joins would run too slow. So both the child and parent documents must be on the same Index and in the same Shard.

Image title

Example Parent/Child Relationship

Let’s consider two famous houses from the HBO series Game of Thrones (For those worried about spoilers, I have faked the isAlive status of the characters). The family tree depicted in Image 1 has four Parents and nine Children. Each character has a gender and an isAlive status.

Image 1: The Starks and Lannister family tree with Parent and Child relationships.

Creating the “Game_Of_Thrones” Index

The code below helps create an index for the above relationship. (Setup guide for Elastic Search). Starting ElasticSearch 7, a type is no longer required for indexes, unlike previous versions.

createIndex.sh — Create the game_of_thrones Index

Line 23: The relation_type, is a name for the join.

Line 24: The type join is a special field that creates parent/child relation within documents of the same index.

Line 25: Parent-child uses the Global Ordinals to speed up joins.

Line 26–28: The relations section defines a set of possible relations within the documents, each relation being a parent name and a child name.

Inserting the Parent Data

Let’s walk through the code for one parent insert before running a script to insert the other parents depicted on Image 1.

Create Eddard Stark

The above code creates a new document for Eddard Start and marks it as a parent document using, the relation_type field. A value parent is assigned to the name of the relation. Along with the relations, it also adds fields needed like house, gender, and isAlive.

One key thing to notice here is the routing query parameter. Each parent assigns its own name to the parameter. The routing field helps us control which shard the document is going to be indexed on. The shard is identified using the below equation:

shard = hash(routing_value) % number_of_primary_shards
We can insert the remaining parents using the script here.

Inserting the Children data

Similarly, let’s walk through one child insert before running a bulk insert of the 9 Children depicted on Image 1.

Create Arya Stark

In our example, Arya Stark is a child of Eddard Stark. Notice that we use the same routing query parameter that we used to create a record for Eddard. This is because of the restriction where both the child and parent documents must be on the same shard.

The join between this record and Eddard’s is made by the relation_type field, where we add the name of the relation as a child, making Arya Stark a child of the parent whose Id is “1” (The same Id we created Eddard with).

We can insert the remaining children using the script here.

Querying Our Data

Now the fun part of executing and understanding, the queries we can run on the relationship we just created.

Searching and Filtering Specific Parents

  • Get all children of Lyanna Stark: The parent_id query can be used to find child documents which belong to a particular parent.


Get all children of Lyanna Stark


Executing the above query gets the John Snow document.

{
    "took": 2,
    ..."hits": [{
        "_index": "game_of_thrones",
        "_type": "_doc",
        "_id": "10",
        "_routing": "Lyanna",
        "_source": {
            "name": "John",
            "house": "Snow",
            "gender": "Male",
            "isAlive": true,
            "relation_type": {
                "name": "child",
                "parent": "2"
            }
        }
    }]...
}
  • Get All children of Eddard who are alive: The bool and must query keywords can be used to fetch the records.


Get All children of Eddard who are alive


Executing the above query will get the records for Arya, Sansa, Bran, and Rickon Stark.

Has Child and Has Parent Queries

The query keywords has_child and has_parent help query data with parent-child relationships.

  • Get All parents who have daughters who are dead: The has_child, keyword helps us fetch all the parent records, where the children have filters.


Get All parents who have daughters who are dead


Executing the above query gets the record of Tywin Lannister, who is the only parent with a dead daughter Cersei.

  • Get All Children who's Parent has gender as Female: The has_parent, keyword helps us fetch all the child records, where the parents have filters.


Get All Children who’s Parent has gender as Female


Executing the above query gets the record of John Snow, whose parent is Lyanna Stark. All other parents being Male.

Having Multiple Children per Parent

Let us add Catelyn Stark as a wife to Eddard Stark, which is depicted in the below Image 2. Eddard now has Children and Wife documents attached.

Image 2: The Starks and Lannister family tree with Parent, Wife and Child relationships.

The Index can be changed using the code below:

Modify Index Adding a New Child to Parent  — Wife.

Line 9: We now have an array of relationships associated with the Parent which are “child” and “wife”.

Inserting a “Catelyn Stark” document, is similar to the child record we created earlier, this will use the same routing parameter we used on the parent routing=Eddard and use “wife” as the relation_type name.

Creating Catelyn Stark Record

Query the wife data:

  • Get the Lords who have a wife: The query uses the has_child keyword and filters by the type of “wife”


Get the Lords who have a wife


Executing the above query gets the record of Eddard Stark.

Multiple Levels of Relationship (Grandchildren)

Let us add Grandchildren to the Starks and Lannisters as depicted in the below Image 3.


Image 3: Adding grandchildren.

The Index needs to be recreated here. This is because of another restriction where it’s is possible to add a child to an existing element only if the element is already a parent. Since “child” type was not a parent when we created the index earlier, we need to drop the earlier index, create a new one with the below code and re-insert all the data.

Line 16: The child, is also made a parent here of the type grandchild. This lets us have the relationship PARENT → CHILD → GRANDCHILD.

Inserting Grandchildren documents is very similar to inserting child records.

In our example, “Ned Jr Something” is a child of Sansa Stark and a grandchild of Eddard Stark. Notice that we use the same routing query parameter that we used to create a record for Eddard. This is to ensure all the children associated with the super parent, Eddard, are indexed on the same shard.

The join between this record and Sansa’s is made by the relation_type field, where we add the name of the relation as a “grandchild” making “Ned Jr” a grandchild of the parent whose Id is “6” (The same Id we created Sansa with).

We can insert the remaining grand children using the bulk script here.

Querying GrandParent Data

  • Get All Grandparents who have grand-daughters:

Executing this query gets us the “Tywin Lannister” record, since he is the only grandparent with a granddaughter Myrcella, as depicted in Image 3.

Using multiple levels of relations to replicate a relational model is not recommended. Each level of relation adds an overhead at query time in terms of memory and computation. You should de-normalize your data if you care about performance.  —  elastic.co

Restrictions of joins in ElasticSearch

Now that we have seen the join feature in action, let’s go over the restrictions noticed above.

  • Parent and child documents must be indexed on the same shard
  • Only one join field mapping is allowed per index
  • An element can have multiple children but only one parent
  • It is possible to add a new relation to an existing join field
  • It is also possible to add a child to an existing element but only if the element is already a parent

Conclusion

Parent-child joins can be a useful technique for managing relationships when index-time performance is more important than search-time performance, but it comes at a significant cost. One must be aware of the tradeoffs like the physical storage constraint of parent and child document and added complexity. Another precaution is to avoid multi-layered parent-child relationship since this will consume more memory and computation.

Database Relational database Joins (concurrency library) Elasticsearch Document Data (computing)

Opinions expressed by DZone contributors are their own.

Related

  • MongoDB to Couchbase: An Introduction to Developers and Experts
  • How to Store Text in PostgreSQL: Tips, Tricks, and Traps
  • MongoDB to Couchbase for Developers, Part 1: Architecture
  • Introduction to Couchbase for Oracle Developers and Experts: Part 2 - Database Objects

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!