Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Backing up MongoDB Instances on EBS

DZone's Guide to

Backing up MongoDB Instances on EBS

· Web Dev Zone ·
Free Resource

Deploying code to production can be filled with uncertainty. Reduce the risks, and deploy earlier and more often. Download this free guide to learn more. Brought to you in partnership with Rollbar.

Even with replica sets and journaling, it’s still a good idea to regularly back up your data. I’m actually frequently surprised to hear from relatively sophisticated services that don’t have regularly scheduled backups (both MongoDB users & others). I thought it’d be helpful to give a quick overview of our (simple) backup infrastructure at Fiesta; if anybody else wants to chime in about how they’re handling this stuff we’d love to hear from you!

Architecture

We run a single replica set on Amazon EC2. We have a passive node that we use for all of our backups. Since we’re on EBS, we can use Amazon’s snapshotting to take the actual backup, which will then be hosted on S3. All of our data is split between MongoDB and S3, so the focus of our backups is just MongoDB data.

Journaling

In 1.8, MongoDB introduced journaling. If journaling is enabled, it’s possible to take hot snapshots of a MongoDB data directory. We don’t currently run with journaling, so we need to be a bit more careful. The approach we take is to fsync and lock the passive node and then take the snapshot from there. That’s all handled by the code below.

The code

The following code is what actually handles taking a backup. The lock_and_backup() function gets run from what’s essentially a nightly cron job.

import subprocess

from pymongo import Connection
import settings


def do_ebs_backup():
    env = {"EC2_HOME": settings.ec2_home,
           "JAVA_HOME": settings.java_home}
    keyfile = EC2_KEY
    certfile = EC2_CERT
    args = ["-K", keyfile, "-C", certfile]

    # We use the 'backup' tag to find the volumes to back up
    out = subprocess.Popen([settings.ec2_tools + "bin/ec2-describe-volumes",
                            "-F", "tag:env=backup"] + args,
                           stdout=subprocess.PIPE,
                           env=env).communicate()[0]
    volumes = set([v for v in out.split() if v.startswith("vol-")])
    if not volumes:
        raise Exception("No volumes to backup?")

    # Create a snapshot for each volume we found above
    snaps = []
    for volume in volumes:
        snap = subprocess.Popen([settings.ec2_tools + "bin/ec2-create-snapshot"] + \
                                     args + [volume],
                                stdout=subprocess.PIPE,
                                env=env).communicate()[0]
        snaps.append(snap)
    return snaps


def lock_and_backup():
    conn = Connection(slave_okay=True)
    try:
        conn.admin.command("fsync", lock=True)
        do_ebs_backup()
    finally:
        conn.admin["$cmd.sys.unlock"].find_one()

We use the EC2 command-line tools to do the snapshot, and use Python’s subprocess module to interact with those tools. The important bit really happens in lock_and_backup(), where we run the “fsync” command and write-lock the server before actually taking the snapshot.


source: http://blog.fiesta.cc/post/13127613694/backing-up-mongodb-instances-on-ebs

Deploying code to production can be filled with uncertainty. Reduce the risks, and deploy earlier and more often. Download this free guide to learn more. Brought to you in partnership with Rollbar.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}