This is probably exclusively of interest to my coworkers, but MongoDB has a new fail points framework. Fail points make it easier to test things that are hard to fake, like page faults or network errors. Basically, you create a glorified boolean called a fail point, which you can turn on and off while mongod is running.

To show how this works, I’ll modify the humble “ping” command. “ping” is fairly simple:

> db.runCommand({"ping" : 1})
{ "ok" : 1 }

I’d like to make the response include a "pong" : 1 field, on command.

The ping command is defined in src/mongo/db/dbcommands_generic.cpp. To add a fail point, we first have to include the fail points header at the top of the file:

#include "repl/multicmd.h"
#include "server.h"
#include "mongo/util/fail_point_service.h" 
namespace mongo {

Then, below the namespace mongo line, add a declaration for the fail point:

namespace mongo {

Feel free to call your failpoint whatever you want.

The ping command is defined lower down in the file in a section that looks like this:

    class PingCommand : public Command {
        PingCommand() : Command( "ping" ) {}
        virtual bool slaveOk() const { return true; }
        virtual void help( stringstream &help ) const { help << "a way to check that the server is alive. responds immediately even if server is in a db lock."; }
        virtual LockType locktype() const { return NONE; }
        virtual bool requiresAuth() { return false; }
        virtual void addRequiredPrivileges(const std::string& dbname,
                                           const BSONObj& cmdObj,
                                           std::vector<Privilege>* out) {} // No auth required
        virtual bool run(const string& badns, BSONObj& cmdObj, int, string& errmsg, BSONObjBuilder& result, bool) {
            // IMPORTANT: Don't put anything in here that might lock db - including authentication
            return true;
    } pingCmd;

Now, in the run() method of the code above, you can trigger certain actions when the fail point is turned on:

        virtual bool run(const string& badns, BSONObj& cmdObj, int, string& errmsg, BSONObjBuilder& result, bool) {
            // IMPORTANT: Don't put anything in here that might lock db - including authentication
            if (MONGO_FAIL_POINT(pingPongPoint)) {
                result.append("pong", 1.0);
            return true;

Now recompile the database. By default, mongod doesn’t allow failpoints to be run. To even allow the possibility of fail points being triggered, you have to run mongod with the --setParameter enableTestCommands=1 option.

$ ./mongod --setParameter enableTestCommands=1

Note: as of this writing, you cannot enable failpoints with the setParameter command, you must start the database with this option.

The failpoint still isn’t turned on, so if you run db.runCommand({ping:1}), you can see that there’s still just the “ok” field. You can enable the fail point with the configureFailPoint command:

> db.adminCommand({"configureFailPoint" : 'pingPongPoint', "mode" : 'alwaysOn'})
{ "ok" : 1 }
> db.runCommand({ping:1})
{ "pong" : 1, "ok" : 1 }
> db.adminCommand({"configureFailPoint" : 'pingPongPoint', "mode" : 'off'})
{ "ok" : 1 }
> db.runCommand({ping:1})
{ "ok" : 1 }

Possible modes are "alwaysOn", "off", and {"times" : 37} (which would be on for the next 37 times the fail point is hit… obviously the value for “times” is configurable).

This is a derpy example, but I’ve found it super helpful for debugging concurrency issues where I need to force a thread to block until another thread has done something. You can do that with something like:

while (MONGO_FAIL_POINT(looper)) {

If you wanted to merely delay something, say, immitate a slow connection, you can use MONGO_FAIL_POINT_BLOCK to pass in information:

MONGO_FAIL_POINT_BLOCK(pingPongPoint, myDelay) {
    const BSONObj& data = myDelay.getData();

Then you’d pass in a delay as so:

> db.adminCommand({"configureFailPoint" : 'pingPongPoint', "mode" : 'alwaysOn', "data" : {"delay" : 5}})
{ "ok" : 1 }

Now, if you run the ping command, it’ll take 5 seconds to return.


