Open Source NoSQL Databases

DZone 's Guide to

Open Source NoSQL Databases

· Java Zone ·
Free Resource
For almost a year now, the idea of " NoSQL" has been spreading due to the demand for relational database alternatives.  Maybe the biggest motivation behind NoSQL is scalability.  Relational databases don't lend themselves well to the kind of horizontal scalability that's required for large-scale social networking or cloud applications, and ORMs can abstract away impedance mismatch only so much.  In other cases, companies just don't need as many of the complex features and rigid schemas provided by relational databases.  Most people are not suggesting that we all ditch the RDBMS, in fact, many companies don't really need to switch.  Relational databases will probably be necessary for many applications years and years from now.  In essence, NoSQL is a movement that aims to reexamine the way we structure data and draw attention to innovation in hopes of finding the solution to the next generation's data persistence problems.  

Here are some of the better known open source data stores/models labeled as "NoSQL":

CouchDB - Document Store

  • Maps keys to data
  • It provides a RESTful JSON API and is written in Erlang
  • You can upload functions to index data and then you can call those functions
  • Has a very simple REST interface
  • Provides an innovative replication strategy - nodes can reconnect, sync, and reconcile differences after being disconnected for long periods of time  
  • Enables new distributed types of applications and data

MongoDB - Document Store

  • Free-form key-value-like data store with good performance
  • Powerful, expansive query model
  • Usability rivals that of Redis
  • Good for complex data storage needs.
  • Production-quality sharding capabilities

Neo4j - GraphDB

  • Disk-based
  • Has a restricted, single-threaded model for graph traversal
  • Has optional layers to expose Neo4j as an RDF store
  • Can handle graphs of several billion nodes, relationships, or properties on a single machine
  • Released under a dual license - free for non-commercial use 

Apache Hbase - Wide Column Store/Column Families

  • Built on top of Hadoop, which has functionality similar to Google's GFS and MapReduce systems
  • Hadoop's HDFS provides a mechanism that reliably stores and organizes large amounts of data
  • Random access performance is on par with MySQL
  • Has a high performance Thrift gateway
  • Cascading source and sink modules

- Key Value/Tuple Store

  • Provides a rich API and does more operations in memory, using disk only periodically.
  • It's extremely fast
  • Lets you append a value to the end of a list of items that's already been stored on a key.
  • Has atomic operations, making it a best-of-breed tally server.

Memcached - Key Value/Tuple Store

  • High-performance, distributed memory object caching
  • Free and open source
  • Generic and agnostic to the objects/strings it caches
  • It's all in-memory data
  • Simple yet elegant design enables easy development and deployment
  • Language neutral caching scheme.
  • Most of the large properties on the web are using it now, except for Microsoft

Project Voldemort - Eventually Consistent Key Value Store

  • Used by LinkedIn
  • Handles server failure transparently
  • Pluggable serialization supports rich keys and values including lists and tuples with named fields
  • Supports common serialization frameworks including Protocol Buffers, Thrift, and Java Serialization
  • Data items are versioned
  • Supports pluggable data placement strategies
  • Memory caching and the storage system are combined

Tokyo Cabinet and Tokyo Tyrant - Key Value/Tuple Store

  • Supports hashtable mode, b-tree mode, and table mode
  • It's fast and straightforward
  • Good for small to medium-sized amounts of data that require rapid updating and can be easily modeled in terms of keys and values

- Wide Column Store/Column Families

  • First developed by Facebook
  • SuperColumns can turn a simple key-value architecture into an architecture that handles sorted lists, based on an index specified by the user.
  • Can scale from one node to several thousand nodes clustered in different data centers.
  • Can be tuned for more consistency or availability
  • Smooth node replacement if one goes down

Some other well known NoSQL-style data stores that are closed source include Google BigTable and Amazon SimpleDBGigaSpaces is a popular space-based Grid solution that has NoSQL qualities.

Check out this informative post on NoSQL patterns.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}