Personal Off-Site Backups
Unlike many, I’m actually a good boy and do backups of my personal data (for which I can mostly thank my obsessive-compulsive side). However, up until now I’ve been remiss in my duties to also take these backups off-site in case of fire, theft, acts of god or gods etc. Without a tape system or rotation of hard drives (not to mention an actual “off-site” site to store them), this ends up being a little tricky to pull off.
Some of my coworkers and colleagues make use of various online backup services, a lot of which are full-service offerings with a custom client or fixed workflow for performing the backups. At least one person I know backs up (or used to) to Amazon S3 directly; but even in the cheapest of their regions, the cost is significant for what could remain an effectively cold backup. It may be somewhat easier to swallow now that they have recently reduced their pricing across the board.
Glacier is a really interesting offering from Amazon that I’ve been playing with a bit recently, and while its price point is squarely aimed at businesses who want to back up really large amounts of data, it also makes a lot of sense for personal backups. Initially the interface was somewhat similar to what you would expect from a tape system – collect your files together as a vaguely linear archive and upload it with some checksum information. I was considering writing a small backup tool that would make backing up to Glacier reasonably simple but didn’t quite get around to it in time.
Fortunately for me, waiting paid off as they recently added support for transitioning S3 objects to Glacier automatically. This means you get to use the regular S3 interface for uploading and downloading individual objects/files, but allow the automatic archival mechanism to move them into Glacier for long-term storage. This actually makes the task of performing cost-effective remote backups ridiculously trivial but I still wrote a small tool to automate it a little bit.
Hence, glacier_backup. It just uses a bit of Ruby, the Amazon Ruby SDK (which is a very nice library, incidentally), ActiveRecord and progressbar. Basically, it just traverses directories you configure it with and uploads any readable file there to S3, after setting up a bucket of your choosing and setting a policy to transition all objects to Glacier immediately. Some metadata is stored locally using ActiveRecord, not because it is necessary (you can store a wealth of metadata on S3 objects themselves), but each S3 request costs something, so it’s helpful to avoid making requests if it is not necessary.
It’s not an amazing bit of code but it gets the job done, and it is somewhat satisfying to see the progress bar flying past as it archives my personal files up to the cloud. Give it a try, if you have a need for remote backups. Pull requests or features/issues are of course welcome, and I hope you find it useful!