Over a million developers have joined DZone.

MongoDB + Big Data, Almost There

DZone's Guide to

MongoDB + Big Data, Almost There

· Big Data Zone
Free Resource

Access NoSQL and Big Data through SQL using standard drivers (ODBC, JDBC, ADO.NET). Free Download 

When it comes to big data, MongoDB is almost there. It's a start, and that's a good thing.

Discuss on Hacker News
Discuss on Reddit


I joined Couchbase in December of 2013. However, I've been passionate about big data technology for years. I started writing about NoSQL in September of 2009 (link), and I wrote about big data while I was a Red Hatter (link).

I thought MongoDB positioned itself as a big data solution. I attended a Gartner presentation on big data and NoSQL. There was a NoSQL database listed on the big data ecosystem slide. It was MongoDB. There is a NoSQL database listed on the Wikipedia big data page. It's MongoDB. Honestly, I'm surprised Matt was "blistered" by people who thought MongoDB and Hadoop were competitors (link).

Couchbase Server is not positioned as a big data solution. However, there is Cloudera certified Hadoop connector for it (link). We have customers leveraging Couchbase Server with Hadoop. We took a thoughtful approach to big data.

  • Hadoop is the foundation of big data solutions.
  • Couchbase Server is a NoSQL database.
  • Couchbase Server is not an alternative to Hadoop.
  • There is a place for NoSQL in big data.

MongoDB + Cloudera

As of April 29th, 2014, I don't think MongoDB is positioning itself as a big data solution. In fact, they're planning proper integration with Hadoop. That’s a great thing for the NoSQL / big data community. However, it represents a first generation big data solution. It relies on importing and exporting data via batch processes. In a first generation big data solution, operational performance and scalability requirements are not a concern. Nor are the requirements for real-time analysis.

Matt cited a use case in which Hadoop analyzes the crowd and a NoSQL database interacts with the individuals. The individual interactions are fed to Hadoop, and the crowd analysis is fed back to the NoSQL database. For Couchbase, this isn't just a use case. It's a customer reference. AOL leverages Hadoop and Couchbase Server to enable targeted advertising (link).

Big Data Central

Big Data Central went live on April 14th, 2014.

We believe the role of NoSQL is to enable the enterprise to meet both operational and analytical requirements, both offline and in real-time. It's to enable second generation big data solutions. The Hadoop ecosystem fulfills analytical requirements. NoSQL fulfills operational requirements. A second generation big data solution relies on integration with Elasticsearch, Storm, and more. It enables real-time analysis and search while meeting operational requirements. It requires a scalable, high performance NoSQL database.

LivePerson has integrated Hadoop, Storm and Couchbase Server to create a second generation big data solution (link). The architecture includes both batch-oriented processing and real-time processing. LivePerson evaluated NoSQL databases from Couchbase, MongoDB, and DataStax. However, only Couchbase Server was able to meet high throughput requirements.

A NoSQL database that is limited to a single lock per database per node (link) and / or difficult to scale will fail to enable a second generation big data solution. That's the difference between MongoDB and Couchbase Server. MongoDB is suitable for first generation big data solutions. Couchbase Server is ideal for both first generation and second generation big data solutions.

The fastest databases need the fastest drivers - learn how you can leverage CData Drivers for high performance NoSQL & Big Data Access.


Published at DZone with permission of Don Pinto, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}