Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Dealing With Jumbo Chunks in MongoDB

DZone's Guide to

Dealing With Jumbo Chunks in MongoDB

In this blog post, we will discuss how to deal with jumbo chunks in MongoDB. Check out the following examples and explanation of jumbo chunks for a better understanding.

· Database Zone
Free Resource

Learn how to create flexible schemas in a relational database using SQL for JSON.

In this blog post, we will discuss how to deal with jumbo chunks in MongoDB.

You are a MongoDB DBA, and your first task of the day is to remove a shard from your cluster. It sounds scary at first, but you know it is pretty easy. You can do it with a simple command:

db.runCommand({removeShard:"server1_set6"})

MongoDB then does its magic. It finds the chunks and databases and balances them across all other servers. You can go to sleep without any worry.

The next morning when you wake up, you check the status of that particular shard and you find the process is stuck:

"msg":"draining ongoing",
"state":"ongoing",
"remaining":{
"chunks":NumberLong(3),
"dbs":NumberLong(0)

There are three chunks that for some reason haven’t been migrated, so the removeShard command is stalled! Now, what do you do?

Find Chunks That Cannot Be Moved

We need to connect to mongos and check the catalog:

mongos>useconfig
switched todb config
mongos>db.chunks.find({shard:"server1_set6"})

The output will show three chunks, with minimum and maximum _id keys, along with the namespace where they belong. But, the last part of the output is what we really need to check:

[...]
"min":{
"_id":"17zx3j9i60180"
"max":{
"_id":"30td24p9sx9j0"
"shard":"server1_set6",
"jumbo":true

So, the chunk is marked as "jumbo." We have found the reason the balancer cannot move the chunk!

Jumbo Chunks and How to Deal With Them

So, what is a "jumbo chunk"? It is a chunk whose size exceeds the maximum amount specified in the chunk size configuration parameter (which has a default value of 64 MB). When the value is greater than the limit, the balancer won’t move it.

The way to remove the flag from that those chunks is to manually split them. There are two ways to do it:

  1. You can specify at what point to split the chunk, specifying the corresponding _id value. To do this, you really need to understand how your data is distributed and what the settings are for min and max in order to select a good splitting point.
  2. You can just tell MongoDB to split it by half, letting it decide which is the best possible _id. This is easier and less error prone.

To do it manually, you need to use sh.splitAt(). For example:

sh.splitAt("dbname",{_id:"19fr21z5sfg2j0"})

In this command, you are telling MongoDB to split the chunk in two using that _id as the cut point.

If you want MongoDB to find the best split point for you, use the sh.splitFind() command. In this particular case, you only need to specify a key (any key) that is part of the chunk you want to split. MongoDB will use that key to find that particular chunk, and then divide it into two parts using the _id that sits in the middle of the list.

sh.splitFind("dbname",{_id:"30td24p9sx9j0"})

Once the three chunks have been split, the jumbo flag is removed and the balancer can move them to a different server. removeShard will complete the process and you can drink a well-deserved coffee.

Create flexible schemas using dynamic columns for semi-structured data. Learn how.

Topics:
performance ,database ,mongo db

Published at DZone with permission of Miguel Angel Nieto. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}