Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

(Drill)ing Down Into Your Data

DZone's Guide to

(Drill)ing Down Into Your Data

Learn how you can query your disjointed databases in parallel, then combine, merge, and display your findings using Drill.

Free Resource

Navigating today's database scaling options can be a nightmare. Explore the compromises involved in both traditional and new architectures.

Instead of using MongoDB as a single or a clustered data store, we may partition the data in independent MongoDB instances that are hosted remotely. Then we may use the UNION operator of Drill to join the results accordingly. 

Why do we need to do this? 

  • Because we may already have the data partitioned in different sources.

  • Due to the domain knowledge, we may do a better job in partitioning the data.

  • Even in a dumb partitioning, Drill scales and performs well.
  • There are some interesting research questions, leveraging locality of data to provide better and faster outputs than a clustered or distributed Mongo deployment.

In this post, we will see the simplest example of achieving this.

1. Define the Mongo Storage Plugins

For each of the Mongo Servers, define the storage plugin separately in Drill:

Multiple definition of Mongo Storage Plugin, pointing to various Mongo deployments

For example, above mongo3 is defined as below in http://localhost:8047/storage/mongo3

{
  "type": "mongo",
  "connection": "mongodb://184.72.102.246:27017/",
  "enabled": true
}


2. Now Query Through the Query Browser

Querying from the multiple Mongo Deployments and UNION them to the results.

select last_name as id from mongo.employee.empinfo
union all
select first_name as id from mongo2.employee.empinfo
union all
select first_name as id from mongo3.employee.empinfo


Now you may execute that and get the results. Depending on the nature of the query and the partitioning and scale of the data, you may be able to experience performance benefits due to the data partitioning. How we actually partition the data in each of the MongoDB deployments, with related items co-located in a single partition, is a research question, and probably deserves another post.

Understand your options for deploying a database across multiple data centers - without the headache.

Topics:
data store ,drill ,mongodb ,database

Published at DZone with permission of Pradeeban Kathiravelu, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}