Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

(Drill)ing Down Into Your Data

DZone's Guide to

(Drill)ing Down Into Your Data

Learn how you can query your disjointed databases in parallel, then combine, merge, and display your findings using Drill.

· Database Zone ·
Free Resource

RavenDB vs MongoDB: Which is Better? This White Paper compares the two leading NoSQL Document Databases on 9 features to find out which is the best solution for your next project.  

Instead of using MongoDB as a single or a clustered data store, we may partition the data in independent MongoDB instances that are hosted remotely. Then we may use the UNION operator of Drill to join the results accordingly. 

Why do we need to do this? 

  • Because we may already have the data partitioned in different sources.

  • Due to the domain knowledge, we may do a better job in partitioning the data.

  • Even in a dumb partitioning, Drill scales and performs well.
  • There are some interesting research questions, leveraging locality of data to provide better and faster outputs than a clustered or distributed Mongo deployment.

In this post, we will see the simplest example of achieving this.

1. Define the Mongo Storage Plugins

For each of the Mongo Servers, define the storage plugin separately in Drill:

Multiple definition of Mongo Storage Plugin, pointing to various Mongo deployments

For example, above mongo3 is defined as below in http://localhost:8047/storage/mongo3

{
  "type": "mongo",
  "connection": "mongodb://184.72.102.246:27017/",
  "enabled": true
}


2. Now Query Through the Query Browser

Querying from the multiple Mongo Deployments and UNION them to the results.

select last_name as id from mongo.employee.empinfo
union all
select first_name as id from mongo2.employee.empinfo
union all
select first_name as id from mongo3.employee.empinfo


Now you may execute that and get the results. Depending on the nature of the query and the partitioning and scale of the data, you may be able to experience performance benefits due to the data partitioning. How we actually partition the data in each of the MongoDB deployments, with related items co-located in a single partition, is a research question, and probably deserves another post.

Aggregations provide vital intelligence to the success of a business. Crush the challenge of providing real time aggregations for daily, weekly, and monthly totals without having to tie up your servers.

Topics:
data store ,drill ,mongodb ,database

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}