Over a million developers have joined DZone.

(Drill)ing Down Into Your Data

Learn how you can query your disjointed databases in parallel, then combine, merge, and display your findings using Drill.

Build fast, scale big with MongoDB Atlas, a hosted service for the leading NoSQL database. Try it now! Brought to you in partnership with MongoDB.

Instead of using MongoDB as a single or a clustered data store, we may partition the data in independent MongoDB instances that are hosted remotely. Then we may use the UNION operator of Drill to join the results accordingly. 

Why do we need to do this? 

  • Because we may already have the data partitioned in different sources.

  • Due to the domain knowledge, we may do a better job in partitioning the data.

  • Even in a dumb partitioning, Drill scales and performs well.
  • There are some interesting research questions, leveraging locality of data to provide better and faster outputs than a clustered or distributed Mongo deployment.

In this post, we will see the simplest example of achieving this.

1. Define the Mongo Storage Plugins

For each of the Mongo Servers, define the storage plugin separately in Drill:

Multiple definition of Mongo Storage Plugin, pointing to various Mongo deployments

For example, above mongo3 is defined as below in http://localhost:8047/storage/mongo3

  "type": "mongo",
  "connection": "mongodb://",
  "enabled": true

2. Now Query Through the Query Browser

Querying from the multiple Mongo Deployments and UNION them to the results.

select last_name as id from mongo.employee.empinfo
union all
select first_name as id from mongo2.employee.empinfo
union all
select first_name as id from mongo3.employee.empinfo

Now you may execute that and get the results. Depending on the nature of the query and the partitioning and scale of the data, you may be able to experience performance benefits due to the data partitioning. How we actually partition the data in each of the MongoDB deployments, with related items co-located in a single partition, is a research question, and probably deserves another post.

Now it's easier than ever to get started with MongoDB, the database that allows startups and enterprises alike to rapidly build planet-scale apps. Introducing MongoDB Atlas, the official hosted service for the database on AWS. Try it now! Brought to you in partnership with MongoDB.

data store,drill,mongodb,database

Published at DZone with permission of Pradeeban Kathiravelu, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}