Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Backing up MongoDB Instances on EBS

DZone's Guide to

Backing up MongoDB Instances on EBS

· Web Dev Zone
Free Resource

Get deep insight into Node.js applications with real-time metrics, CPU profiling, and heap snapshots with N|Solid from NodeSource. Learn more.

Even with replica sets and journaling, it’s still a good idea to regularly back up your data. I’m actually frequently surprised to hear from relatively sophisticated services that don’t have regularly scheduled backups (both MongoDB users & others). I thought it’d be helpful to give a quick overview of our (simple) backup infrastructure at Fiesta; if anybody else wants to chime in about how they’re handling this stuff we’d love to hear from you!

Architecture

We run a single replica set on Amazon EC2. We have a passive node that we use for all of our backups. Since we’re on EBS, we can use Amazon’s snapshotting to take the actual backup, which will then be hosted on S3. All of our data is split between MongoDB and S3, so the focus of our backups is just MongoDB data.

Journaling

In 1.8, MongoDB introduced journaling. If journaling is enabled, it’s possible to take hot snapshots of a MongoDB data directory. We don’t currently run with journaling, so we need to be a bit more careful. The approach we take is to fsync and lock the passive node and then take the snapshot from there. That’s all handled by the code below.

The code

The following code is what actually handles taking a backup. The lock_and_backup() function gets run from what’s essentially a nightly cron job.

import subprocess

from pymongo import Connection
import settings


def do_ebs_backup():
    env = {"EC2_HOME": settings.ec2_home,
           "JAVA_HOME": settings.java_home}
    keyfile = EC2_KEY
    certfile = EC2_CERT
    args = ["-K", keyfile, "-C", certfile]

    # We use the 'backup' tag to find the volumes to back up
    out = subprocess.Popen([settings.ec2_tools + "bin/ec2-describe-volumes",
                            "-F", "tag:env=backup"] + args,
                           stdout=subprocess.PIPE,
                           env=env).communicate()[0]
    volumes = set([v for v in out.split() if v.startswith("vol-")])
    if not volumes:
        raise Exception("No volumes to backup?")

    # Create a snapshot for each volume we found above
    snaps = []
    for volume in volumes:
        snap = subprocess.Popen([settings.ec2_tools + "bin/ec2-create-snapshot"] + \
                                     args + [volume],
                                stdout=subprocess.PIPE,
                                env=env).communicate()[0]
        snaps.append(snap)
    return snaps


def lock_and_backup():
    conn = Connection(slave_okay=True)
    try:
        conn.admin.command("fsync", lock=True)
        do_ebs_backup()
    finally:
        conn.admin["$cmd.sys.unlock"].find_one()

We use the EC2 command-line tools to do the snapshot, and use Python’s subprocess module to interact with those tools. The important bit really happens in lock_and_backup(), where we run the “fsync” command and write-lock the server before actually taking the snapshot.


source: http://blog.fiesta.cc/post/13127613694/backing-up-mongodb-instances-on-ebs

Node.js application metrics sent directly to any statsd-compliant system. Get N|Solid

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}