Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

5 Reasons to Back Up Your Hadoop Environment

DZone's Guide to

5 Reasons to Back Up Your Hadoop Environment

Lots of people think they don't need backup and recovery for their Hadoop platforms. But that couldn't be further from the truth.

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

Since announcing support for HDFS with the release of RecoverX 2.0 earlier this year, one of the most frequent comments we get from customers and prospects is, "I don’t need backup and recovery for my Hadoop platforms."

Unfortunately, that could not be further from the truth.

Although HDFS filesystems offer replication and local snapshots, they lack the point-in-time backup and recovery capabilities required to achieve and maintain enterprise-grade data protection. As enterprises increasingly rely on big data applications for decision support and customer analytics, it’s critically important to understand the need for backup and recovery of Hadoop environments.

Here’s a quick primer on fives reasons you need to backup your Hadoop environment!

1. Replication Isn't the Same as Point-in-Time Backup

Replication provides high-availability, but it doesn't provide any protection from logical or human error that can result in data loss and that can ultimately result in a lack of meeting compliance and governance standards.

2. Data Loss Is as Real as It Always Has Been

Studies suggest that more than 70 percent of data loss events are triggered due to human errors. Furthermore, filesystems such as HDFS do not offer protection from such accidental deletion of data.

3. Reconstruction of Data Is Too Expensive

It is theoretically possible to reconstruct data from the respective data sources. However, in practice, the data itself is either lost at the source or the reconstruction of the data takes weeks or months.

4. Application Downtime Should Be Minimized

There is no value in data when it the data cannot be accessed. Granular file-level recovery is essential when it to comes minimizing any major application downtime.

5. Cost

Big data is… big — with data lakes quickly growing to multi-petabyte scale. And enterprise backup and recovery can also enable organizations to archive data to some cost-effective object storage systems.

To help spread the word about the critical need for enterprise backup and recovery for Hadoop platforms, I recently contributed an article to InsideBigData. You can read that article in its entirety here.

If you’d like to learn more about how Datos IO supports Hadoop backup and recovery, check out these useful resources:

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

Topics:
big data ,hadoop ,backup ,recovery

Published at DZone with permission of Peter Smails, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}