Big Data and Open-Source Protection
What's the right solution for your business?
Join the DZone community and get the full member experience.Join For Free
Why is this important? Big data is driving significant open-source database adoption. In conjunction with digital transformation initiatives, enterprises are pursuing hybrid multi-cloud infrastructure to avoid vendor lock-in, maintain agility, and reduce cost.
NoSQL/open-source databases are now the norm. 78 percent of organizations report using NoSQL or open source databases. 50 percent of organizations are using Hadoop/Apache HBase. Here's how the databases currently in use stack up:
- MySQL = 59%
- MongoDB = 54%
- PostgreSQL = 27%
- Cassandra = 19%
- MariaDB = 15%
Data protection challenges are a function of big data analytics (40 percent identified), open source databases (28 percent), virtualized environments (16 percent), public cloud (12 percent), hyper-converged infrastructure (4 percent).
Governance has become an issue for corporate management. As such, business strategy now includes policy management, risk identification, evaluation, reporting/auditing, compliance, and measures to ensure conformity of policies and laws. You need to think about how governance and risk fit together to meet compliance requirements. Are you able to remove data if asked?
With the critically of data to the organization, companies have to think about natural disasters with a regional impact like hurricanes, tornadoes, floods, and fires. They must have back-ups available in the event of a natural disaster.
Likewise, there are localized business impacts. Unplanned outages as a result of a power outage, network outage, software error, or cybersecurity incident/ransomware. Attacks are becoming more frequent and according to the FBI, ransomware attacks cost companies an average of $2.3 million.
There are a number of data protection strategy considerations including downtime, mean time to recover, protection/recovery tasks, budget/cost, downtime – site down, customers go elsewhere, infrastructure – how much redundancy, storage, complexity, management – how many DBAs, network administrators, infrastructure – how big are the pipes, how to handle scale.
Mark suggests thinking about two ends of the spectrum - recovery point objective (RPO) and recovery time objective (RTO). For the recovery point, identify when the incident occurred, how far back you have to go to recover (i.e., the last time you had usable data). For recovery time, how long will it take to make the system fully operational again? You need to evaluate priorities from a business application perspective.
Mark ran through the three types of protection recovery to evaluate:
This may be a script or plug-in a drive.
Fewer infrastructure resources
Less planning is required
Upfront costs are lower.
You are not protected against human error or malicious activity
The data you lose may be unrecoverable or unable to be recreated
There is no compliance verification
Customer confidence will be low
RTO is high and there is little to no RPO
There's no long-term retention of data
It's very time-consuming for personnel to be doing repetitive, manual processes
You are able to build intelligence into capabilities snapshotting a local point-in-time look at the data and you achieve replication by sending data to one host/location to another.
- Multiple copies of the data
- Minimal or no data loss
- Short RTO and RPO
- Failover/failback (automatic or manual)
- Fast restore from snapshot
- Does not protect against human error or malicious activity
- High hardware and network cost – double storage space and pipelines are required to move the data around
- High maintenance costs
- Higher complexity
- Snapshot retention requirements
- Susceptible to data corruption
- No long-term retention capabilities
- Planning and design consideration is necessary
Dedicated backup software automates a lot of the process like setting up the policies and procedures of your data protection strategy. The software can cover a lot of different platforms – operating systems, file systems, and databases.
- Long-term retention
- Resources based on use
- Compliance verification
- Confidence in recoverability
- RTO/RPO based on needs
- Can leverage snapshots
- Protects against human error or malicious activities
- Initial investment - time and capital
- Planning and design consideration
- Media failure is still a possibility over the long-term
- Cost to manage – training and education
When making the decision on what's the right strategy for your business, ask how much is our data worth and how much are you willing to risk?
The mean cost of an unplanned outage is $4,851 per minute or $530,000 per hour (Ponemon – “The Cost of Data Center Outages”). The average outage is 3.5 hours. Can your company afford a $1.8 million hit to your bottom line?
Opinions expressed by DZone contributors are their own.