Data Protection for Operational Reporting Platforms
As enterprises onboard business-critical applications on this next-generation technology stack, they need to prepare for failures or errors that may disrupt customer-facing applications.
Join the DZone community and get the full member experience.Join For Free
Making real-time business decisions based on data from social, mobile and cloud platforms is a requirement in today’s world. That’s why an ecosystem of next-generation operational reporting platforms such as Spark, Apache Cassandra, Kafka, Docker, Mesos, Marathon, and more have come to life. Enterprises are using these technologies to re-architect their data storage and processing frameworks in order to achieve lower costs, scalability and faster response times. However, as enterprises onboard business-critical applications on this next-generation technology stack, they need to prepare for failures or errors that may disrupt customer-facing applications.
One of the key benefits that operational reporting platforms provide is real-time visibility, which allows businesses to make strategic decisions based on actionable insights. However, the new applications generate a large amount of data and archaic operational reporting platforms (those based on traditional data warehouses, data marts, and ETL logic) are unable to catch up and, as a result:
- Business decisions are based on stale data
- Fragmented data increases the risk of errors
- Multiple copies of data increases storage costs
These issues have driven many progressive, data-centric enterprises to re-architect their operational reporting platforms. In this new architecture, Kafka serves as a persistent message bus that ingests data from relational stores and provides failure resiliency and message buffering. Spark or Spark Streaming may be used for transformation, aggregation, and other lightweight stream processing. These processing nodes may be deployed in Docker containers. Finally, Apache Cassandra serves as a distributed data storage layer that provides linear scalability and high-availability. Kafka also enables the ability to add multiple data consumers such as Hadoop/HBase through Flafka.
However, as enterprises adopt operational reporting platforms, they must ensure that bad data feeds and human error do not result in data loss. Apache Cassandra natively replicates data but does not provide the capability to go back in time to recover data efficiently and at scale. That’s where Datos IO RecoverX comes in. Datos IO RecoverX, a scale-out data protection software, is an important element of this next-generation distributed data services architecture.
As enterprises onboard their critical applications to achieve development agility, scalability, and lower operating costs, they are also investing to protect their next-generation applications from data loss, and rightly so. If you are building your next-generation infrastructure around distributed databases, please get in touch with our experts who can guide you through different techniques to ensure data protection.
Published at DZone with permission of Jeannie Liou, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.