How to Stop a Runaway Index Build in MongoDB
Oh no! You've triggered your index build on a large collection and now your cluster is unresponsive. No matter what you do, you can't stop it. What to do?!
Join the DZone community and get the full member experience.
Join For FreeIndex builds in MongoDB can have an adverse impact on the availability of your MongoDB cluster. If you trigger a foreground index build on a large collection on your production server, you may find that your cluster is unresponsive until the index build is complete. On a large collection, this could take several hours or days, as described in the perils of index building in MongoDB.
The recommended best practice is to trigger index builds in the background. However, on large collection indexes, we've seen multiple problems with this approach. In the case of a three-node cluster, both secondaries start building the index and stop responding to any requests. Consequently, the primary does not have quorum and moves to the secondary state, taking your cluster down. Also, the default index builds triggered from the command line are foreground index builds — making this a now widespread problem. In future releases, we're hopeful that this becomes background by default.
Once you've triggered an index, simply restarting the server does not solve our problem. MongoDB will pick up the index build from where it left off. If you were running a background index build previously after the restart, it now becomes a foreground index build, so in this case, the restart could make the problem worse.
If you've already triggered an index build, how do you stop it? Luckily, it's relatively easy to stop an index build.
Option 1: Kill the Index Build Process
Locate the index build process using db.currentOp()
and then kill the operation using db.killOp(<opid>)
. The index operation will look something like this:
{
"opid" : 820659355,
"active" : true,
"lockType" : "write",
....
"op" : "insert",
"ns" : "xxxx",
"query" : {
},
"client" : "xxxx",
"desc" : "conn",
"msg" : "index: (2/3) btree bottom up 292168587/398486401 64%"
}
If the node where the index is building does not respond to new connections or the killOp
does not work, use Option 2 below.
Option 2: Configuring noIndexBuildRetry and Restart
MongoDB provides a -noIndexBuildRetry
option, which instructs MongoDB to stop building incomplete indexes on restart.
This parameter doesn't appear to be supported by the config file, only as a parameter for the mongod
process. We don't prefer to run mongod
manually with this option because if you accidentally run the mongod
process as an elevated user (i.e. root), it ends up changing the permissions of all the files. Also, once run as root, we've had intermittent problems running the process as mongod
again.
A simpler option is to edit the /etc/init.d/mongod
file. Looks for this line:
OPTIONS=" -f $CONFIGFILE"
Replace with this line:
OPTIONS=" -f $CONFIGFILE --noIndexBuildRetry"
Detailed Steps
For the purposes of this discussion, we're providing instructions for CentOS/RedHat/Amazon Linux.
- Configure
-noIndexBuildRetry
. Add the-noIndexBuildRetry
option to all your data nodes, as explained above. - Restart all the nodes building the index. Look at the
mongod
log file for each data server and determine if it's building the index. If it is, then restart the server using service mongod restart. - Drop the incomplete index. Once all of the relevant nodes are restarted, look at the list of indexes and drop the incomplete index if you see it on the list.
- Remove
-noIndexBuildRetry
. Edit the/etc/init.d/mongod
file to remove the-noIndexBuildRetry
option that you added in Step 1 so we can revert back to the default behavior of resuming the index build.
Published at DZone with permission of Dharshan Rangegowda, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments