Over a million developers have joined DZone.

Save the Monkey: Reliably Writing to MongoDB

MongoDB replica sets claim “automatic failover” when a primary server goes down, and they live up to the claim, but handling failover in your application code takes some care. I’ll walk you through writing a failover-resistant application in Python using a new feature in PyMongo 2.1: the ReplicaSetConnection.

Setting the Scene

Mabel the Swimming Wonder Monkey is participating in your cutting-edge research on simian scuba diving. To keep her alive underwater, your application must measure how much oxygen she consumes each second and pipe the same amount of oxygen to her scuba gear. In this post, I’ll only cover writing reliably to Mongo. I’ll get to reading later.

MongoDB Setup

Since Mabel’s life is in your hands, you want a robust Mongo deployment. Set up a 3-node replica set. We’ll do this on your local machine using three TCP ports, but of course in production you’ll have each node on a separate machine:

$ mkdir db0 db1 db2
$ mongod --dbpath db0 --logpath db0/log --pidfilepath db0/pid --port 27017 --replSet foo --fork
$ mongod --dbpath db1 --logpath db1/log --pidfilepath db1/pid --port 27018 --replSet foo --fork
$ mongod --dbpath db2 --logpath db2/log --pidfilepath db2/pid --port 27019 --replSet foo --fork

(Make sure you don’t have any mongod processes running on those ports first.)

Now connect up the nodes in your replica set. My machine’s hostname is ‘emptysquare.local’; replace it with yours when you run the example:

$ hostname
emptysquare.local
$ mongo
> rs.initiate({
  _id: 'foo',
  members: [
    {_id: 0, host:'emptysquare.local:27017'},
    {_id: 1, host:'emptysquare.local:27018'},
    {_id: 2, host:'emptysquare.local:27019'}
  ]
})

 

The first _id, ‘foo’, must match the name you passed with –replSet on the command line, otherwise Mongo will complain. If everything’s correct, Mongo replies with, “Config now saved locally. Should come online in about a minute.” Run rs.status() a few times until you see that the replica set has come online—the first member’s stateStr will be “PRIMARY” and the other two members’ stateStrs will be “SECONDARY”. On my laptop this takes about 30 seconds.

Voilà: a bulletproof 3-node replica set! Let’s start the Mabel experiment.

Definitely Writing

Install PyMongo 2.1 and create a Python script called mabel.py with the following:

import datetime, random, time
import pymongo

mabel_db = pymongo.ReplicaSetConnection(
    'localhost:27017,localhost:27018,localhost:27019',
    replicaSet='foo'
).mabel

while True:
    time.sleep(1)
    mabel_db.breaths.insert({
        'time': datetime.datetime.utcnow(),
        'oxygen': random.random()
    }, safe=True)

    print 'wrote'

 mabel.py will record the amount of oxygen Mabel consumes (or, in our test, a random amount) and insert it into Mongo once per second. Run it:

$ python mabel.py
wrote
wrote
wrote

Now, what happens when our good-for-nothing sysadmin unplugs the primary server? Let’s simulate that in a separate terminal window by grabbing the primary’s process id and killing it:

$ kill `cat db0/pid`

Switching back to the first window, all is not well with our Python script:

Traceback (most recent call last):
  File "mabel.py", line 10, in <module>
    'oxygen': random.random()
  File "/Users/emptysquare/.virtualenvs/pymongo/mongo-python-driver/pymongo/collection.py", line 310, in insert
    continue_on_error, self.__uuid_subtype), safe)
  File "/Users/emptysquare/.virtualenvs/pymongo/mongo-python-driver/pymongo/replica_set_connection.py", line 738, in _send_message
    raise AutoReconnect(str(why))
pymongo.errors.AutoReconnect: [Errno 61] Connection refused

This is terrible. WTF happened to “automatic failover”? And why does PyMongo raise an AutoReconnect error rather than actually automatically reconnecting?

Well, automatic failover does work, in the sense that one of the secondaries will quickly take over as a new primary. Do rs.status() in the mongo shell to confirm that:

$ mongo --port 27018 # connect to one of the surviving mongod's
PRIMARY> rs.status()
// edited for readability ...
{
	"set" : "foo",
	"members" : [ {
			"_id" : 0,
			"name" : "emptysquare.local:27017",
			"stateStr" : "(not reachable/healthy)",
			"errmsg" : "socket exception"
		}, {
			"_id" : 1,
			"name" : "emptysquare.local:27018",
			"stateStr" : "PRIMARY"
		}, {
			"_id" : 2,
			"name" : "emptysquare.local:27019",
			"stateStr" : "SECONDARY",
		}
	]
}

 

Depending on which mongod took over as the primary, your output could be a little different. Regardless, there is a new primary, so why did our write fail? The answer is that PyMongo doesn’t try repeatedly to insert your document—it just tells you that the first attempt failed. It’s your application’s job to decide what to do about that. To explain why, let us indulge in a brief digression.

Brief Digression: Monkeys vs. Kittens

If what you’re inserting is voluminous but no single document is very important, like pictures of kittens or web analytics, then in the extremely rare event of a failover you might prefer to discard a few documents, rather than blocking your application while it waits for the new primary. Throwing an exception if the primary dies is often the right thing to do: You can notify your user that he should try uploading his kitten picture again in a few seconds once a new primary has been elected.

But if your updates are infrequent and tremendously valuable, like Mabel’s oxygen data, then your application should try very hard to write them. Only you know what’s best for your data, so PyMongo lets you decide. Let’s return from this digression and implement that.

Trying Hard to Write

Let’s bring up the mongod we just killed:

$ mongod --dbpath db0 --logpath db0/log --pidfilepath db0/pid --port 27017 --replSet foo --fork

And update mabel.py with the following armor-plated loop:

while True:
    time.sleep(1)
    data = {
        'time': datetime.datetime.utcnow(),
        'oxygen': random.random()
    }

    # Try for five minutes to recover from a failed primary
    for i in range(60):
        try:
            mabel_db.breaths.insert(data, safe=True)
            print 'wrote'
            break # Exit the retry loop
        except pymongo.errors.AutoReconnect, e:
            print 'Warning', e
            time.sleep(5)

 Now run python mabel.py, and again kill the primary. Do either “kill `cat db1/pid`” or “kill `cat db2/pid`”, depending on which mongod is the primary right now. mabel.py’s output will look like:

wrote
Warning [Errno 61] Connection refused
Warning emptysquare.local:27017: [Errno 61] Connection refused, emptysquare.local:27019: [Errno 61] Connection refused, emptysquare.local:27018: [Errno 61] Connection refused
Warning emptysquare.local:27017: not primary, emptysquare.local:27019: [Errno 61] Connection refused, emptysquare.local:27018: not primary
wrote
wrote
.
.
.

mabel.py goes through a few stages of grief when the primary dies, but in a few seconds it finds a new primary, inserts its data, and continues happily.

What About Duplicates?

Leaving monkeys and kittens aside, another reason PyMongo doesn’t automatically retry your inserts is the risk of duplication: If the first attempt caused an error, PyMongo can’t know if the error happened before Mongo wrote the data, or after. What if we end up writing Mabel’s oxygen data twice? Well, there’s a trick you can use to prevent this: generate the document id on the client.

Whenever you insert a document, Mongo checks if it has an “_id” field and if not, it generates an ObjectId for it. But you’re free to choose the new document’s id before you insert it, as long as the id is unique within the collection. You can use an ObjectId or any other type of data. In mabel.py you could use the timestamp as the document id, but I’ll show you the more generally applicable ObjectId approach:

from pymongo.objectid import ObjectId

while True:
    time.sleep(1)
    data = {
        '_id': ObjectId(),
        'time': datetime.datetime.utcnow(),
        'oxygen': random.random()
    }

    # Try for five minutes to recover from a failed primary
    for i in range(60):
        try:
            mabel_db.breaths.insert(data, safe=True)
            print 'wrote'
            break # Exit the retry loop
        except pymongo.errors.AutoReconnect, e:
            print 'Warning', e
            time.sleep(5)
        except pymongo.error.DuplicateKeyError:
            # It worked the first time
            pass

We set the document’s id to a newly-generated ObjectId in our Python code, before entering the retry loop. Then, if our insert succeeds just before the primary dies and we catch the AutoReconnect exception, then the next time we try to insert the document we’ll catch a DuplicateKeyError and we’ll know for sure that the insert succeeded. You can use this technique for safe, reliable writes in general.


Bibliography

Apocryphal story of Mabel, the Swimming Wonder Monkey

More likely true, very brutal story of 3 monkeys killed by a computer error

 

Source: http://emptysquare.net/blog/save-the-monkey-reliably-writing-to-mongodb/

 

Topics:

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}