Over a million developers have joined DZone.

(Drill)ing Down Into Your Data

DZone's Guide to

(Drill)ing Down Into Your Data

Learn how you can query your disjointed databases in parallel, then combine, merge, and display your findings using Drill.

Free Resource

Learn how to move from MongoDB to Couchbase Server for consistent high performance in distributed environments at any scale.

Instead of using MongoDB as a single or a clustered data store, we may partition the data in independent MongoDB instances that are hosted remotely. Then we may use the UNION operator of Drill to join the results accordingly. 

Why do we need to do this? 

  • Because we may already have the data partitioned in different sources.

  • Due to the domain knowledge, we may do a better job in partitioning the data.

  • Even in a dumb partitioning, Drill scales and performs well.
  • There are some interesting research questions, leveraging locality of data to provide better and faster outputs than a clustered or distributed Mongo deployment.

In this post, we will see the simplest example of achieving this.

1. Define the Mongo Storage Plugins

For each of the Mongo Servers, define the storage plugin separately in Drill:

Multiple definition of Mongo Storage Plugin, pointing to various Mongo deployments

For example, above mongo3 is defined as below in http://localhost:8047/storage/mongo3

  "type": "mongo",
  "connection": "mongodb://",
  "enabled": true

2. Now Query Through the Query Browser

Querying from the multiple Mongo Deployments and UNION them to the results.

select last_name as id from mongo.employee.empinfo
union all
select first_name as id from mongo2.employee.empinfo
union all
select first_name as id from mongo3.employee.empinfo

Now you may execute that and get the results. Depending on the nature of the query and the partitioning and scale of the data, you may be able to experience performance benefits due to the data partitioning. How we actually partition the data in each of the MongoDB deployments, with related items co-located in a single partition, is a research question, and probably deserves another post.

data store ,drill ,mongodb ,database

Published at DZone with permission of Pradeeban Kathiravelu, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}