Backing up MongoDB Instances on EBS
Join the DZone community and get the full member experience.
Join For FreeEven with replica sets and journaling, it’s still a good idea to regularly back up your data. I’m actually frequently surprised to hear from relatively sophisticated services that don’t have regularly scheduled backups (both MongoDB users & others). I thought it’d be helpful to give a quick overview of our (simple) backup infrastructure at Fiesta; if anybody else wants to chime in about how they’re handling this stuff we’d love to hear from you!
Architecture
We run a single replica set on Amazon EC2. We have a passive node that we use for all of our backups. Since we’re on EBS, we can use Amazon’s snapshotting to take the actual backup, which will then be hosted on S3. All of our data is split between MongoDB and S3, so the focus of our backups is just MongoDB data.
Journaling
In 1.8, MongoDB introduced journaling. If journaling is enabled, it’s possible to take hot snapshots of a MongoDB data directory. We don’t currently run with journaling, so we need to be a bit more careful. The approach we take is to fsync and lock the passive node and then take the snapshot from there. That’s all handled by the code below.
The code
The following code is what actually handles taking a backup. The lock_and_backup() function gets run from what’s essentially a nightly cron job.
import subprocess from pymongo import Connection import settings def do_ebs_backup(): env = {"EC2_HOME": settings.ec2_home, "JAVA_HOME": settings.java_home} keyfile = EC2_KEY certfile = EC2_CERT args = ["-K", keyfile, "-C", certfile] # We use the 'backup' tag to find the volumes to back up out = subprocess.Popen([settings.ec2_tools + "bin/ec2-describe-volumes", "-F", "tag:env=backup"] + args, stdout=subprocess.PIPE, env=env).communicate()[0] volumes = set([v for v in out.split() if v.startswith("vol-")]) if not volumes: raise Exception("No volumes to backup?") # Create a snapshot for each volume we found above snaps = [] for volume in volumes: snap = subprocess.Popen([settings.ec2_tools + "bin/ec2-create-snapshot"] + \ args + [volume], stdout=subprocess.PIPE, env=env).communicate()[0] snaps.append(snap) return snaps def lock_and_backup(): conn = Connection(slave_okay=True) try: conn.admin.command("fsync", lock=True) do_ebs_backup() finally: conn.admin["$cmd.sys.unlock"].find_one()
We use the EC2 command-line tools to do the snapshot, and use
Python’s subprocess module to interact with those tools. The important
bit really happens in lock_and_backup(), where we run the “fsync”
command and write-lock the server before actually taking the snapshot.
source: http://blog.fiesta.cc/post/13127613694/backing-up-mongodb-instances-on-ebs
Opinions expressed by DZone contributors are their own.
Comments