When Big Data is Slow
When Big Data is Slow
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
The key to being successful in big data initiatives is being able to manage the speed, scale and structure at sub-millisecond speed.
Big Data is a big term. It encompasses concepts about data types, dozens of different technologies to manage those data types and the eco-system around all those technologies. And everything in it moves fast!
Big data is quickly evolving. A classic big data solution, the most common big data technology architecture in use today, relies on importing and exporting data (typically into Hadoop) via batch processes. While this has yielded tremendous business results in the form of better customer insight and predictive analysis, it is not a real time solution. It is slow.
As technology advances at an ever-increasing rate, so are best practices for big data solutions: a modern big data solution relies on real-time data processing via stream processing. A modern big data solution leverages integration with Elasticsearch, Storm, and more. It enables real-time analysis and search while meeting operational requirements. In order to enable real-time analysis and search, a modern big data solution requires a high performance NoSQL database that is scalable. The NoSQL database must fulfill operational requirements while meeting the performance requirements necessary to enable real-time analysis and search.
A modern big data solution is only as fast as its slowest component. That brings us to a recent announcement by Mongo and Cloudera. While we applaud every effort to help customers understand best practices for big data architecture, we also must address which NoSQL solution is the right piece to enable a truly, fast big data architecture. A scalable, high performance NoSQL database ensures that the operational database will not be the slowest component. A NoSQL database that’s difficult to scale and relies on database wide locks will fail to leverage the potential a modern big data solution. This is the difference between MongoDB and Couchbase Server. Sure, MongoDB can be a part of classic big data solutions: these were not designed for real time analytics and don’t need the speed that a modern big data solution requires. Couchbase Server can be a part of both classic big data solutions and modern big data solutions.
A classic big data solution, which we mentioned earlier, is in use at many organizations today. It typically relies on integration with Hadoop. Couchbase Server integrates with Hadoop via a Cloudera certified Sqoop connector (link).
Matt Asay cited a classic big data use case where Hadoop analyzes the crowd and a NoSQL database interacts with the individuals. The individual interactions are fed to Hadoop and the crowd analysis is fed to the NoSQL database. For Couchbase, this isn’t just a use case. It’s a customer reference. AOL leverages Hadoop and Couchbase Server in a classic big data solution to enable intelligent advertising (link).
LivePerson leverages Hadoop, Storm and Couchbase Server in its modern big data solution (link). The LivePerson architecture leverages both batch-oriented processing and real-time processing. LivePerson considered NoSQL databases from Couchbase, MongoDB, and DataStax. However, only Couchbase Server was able to meet their high throughput requirements.
Published at DZone with permission of Don Pinto , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.