ORM frameworks help object-oriented developers when they want to interact with relational databases. There are many excellent ORM frameworks for relational databases, such as Hibernate and Apache OpenJPA, and some of them are really good.
Nowadays, big data is emerging and more and more people develop applications which runs on big data. There have been different kinds of NoSQL databases to store such size of data, i.e. column stores and document stores.
Despite the fact that ORM frameworks solve many problems, even they have drawbacks on the relational databases side, yet the situation is different for NoSQL databases because they do not have a common standard.
Apache Gora aims to give users easy-to-use in-memory data model and persistence for big data frameworks with data store specific mappings. The overall goal for Apache Gora is to become the standard data representation and persistence framework for big data.
Gora supports persisting to column stores, key value stores, document stores and RDBMSs, and analyzing the data with extensive Apache Hadoop MapReduce support.
Gora uses Apache Avro and depends on mapping files, which are specific to each data store. Unlike other OTD (Object-to-Datastore) mapping implementations, in Gora the data bean to data store specific schema mapping is explicit. This has the advantage that, when using data models such as HBase and Cassandra, you can always know how the values are persisted.
The Roadmap of Apache Gora
• Data Persistence: Persisting objects to Column stores such as HBase, Cassandra, Hypertable; key-value stores such as Voldermort, Redis, etc; SQL databases, such as MySQL, HSQLDB, flat files in local file system or Hadoop HDFS.
• Data Access: An easy to use Java-friendly common API for accessing the data regardless of its location.
• Indexing: Persisting objects to Lucene and Solr indexes, accessing/querying the data with Gora API.
• Analysis: Accessing the data and making analysis through adapters for Apache Pig, Apache Hive and Cascading
• MapReduce support: Out-of-the-box and extensive MapReduce (Apache Hadoop) support for data in the data store.
What are Differences Between Apache Gora and Current Solutions?
• Gora is specially focused at NoSQL data stores, but also has limited support for SQL databases.
• The main use case for Gora is to access/analyze big data using Hadoop.
• Gora uses Avro for bean definition, not byte code enhancement or annotations.
• Object-to-data store mappings are backend specific, so that full data model can be utilized.
• Gora is simple since it ignores complex SQL mappings.
• Gora will support persistence, indexing and analysis of data, using Pig, Lucene, Hive, etc.
Supported Datastores by Apache Gora:
• Apache Accumulo
• Apache Cassandra
• Amazon DynamoDB
• Apache HBase
• Apache Solr
Apache Spark is a shining project for big data developers. Spark provides a faster and more general data processing platform. Spark lets you run programs up to 100x faster in memory, or 10x faster on disk, than Hadoop. Currently, Gora doesn’t support Spark and during my GSoC period; I’m implementing Spark backend for Apache Gora to fill that gap.