Jumping from MySQL to Cassandra: A Success Story
Join the DZone community and get the full member experience.
Join For FreeToday I’m gonna share with you my experience when I started with
Apache Cassandra…One of the most complicated steps to learn any NoSql
stuff, is to take away of your mind the normalization principles and
those relational DB structures. Relational databases are designed to
persist normalized data and without duplicated data. Well, one of the
main changes here is that you need to think or design for your queries,
in what your reports or finder methods want, and build a the persistent
structure as it need.
Cents of web pages, books, papers treat about What Cassandra is, What
Hazelcast is, What Hadoop, MemcacheDB, MongoDB, etc….But none of them
treat about HOW TO migrate my data from a relational DB to one of them.
We wanted to migrate the persistent data of two our modules, Turmeric SOA Monitoring and Turmeric SOA Rate Limiting
data. In Turmeric we use MySql as relational database. After a week
reading and analyzing several NoSql options we decided for Cassandra.
<— I hope to write another post about the whys…. btw, I highly
recommended this reading: Cassandra: The Definitive Guide
From Relational tables to Keyspaces
The big deal now is How to migrate them. Well this is what we did:
Following an Agile best practice, if something is to hard or complex,
just, break it in small challenges. After all we still had a good gap
for a MMF (“Minimal Marketable Feature”, refer to Software by Numbers. So:
Step 1: Move our Relational DB tables to Cassandra Colum Family
Step 2: Customize our new Column Families in order to have all needed data without a like JOIN operators
Step 3: Explode those Column Families as finder and query method needs. Typically a finder or query method should use 1 Column Family
Step 4: Customize Creators and Updater methods
according previous changes. Don’t be scared if you are saving duplicated
data. Keep in mind, “think for your queries!, forget to normalization
rules.”
Step 5: while (!pleased) -> do step 3 and 4
A Cassandra DAO
Now, the hardest step is #1. Don’t panic, we developed a kind of
generic (in fact it uses Java Generics) Cassandra DAO for your
migration. As all this work was needed for the project I’m actually
working on, you will find it as a submodule of TurmericSOA, but
following the Apache License you can use it through your Maven
dependency file.
<dependency> <groupId>org.ebayopensource.turmeric.utils</groupId> <artifactId>turmeric-utils-cassandra</artifactId> <version>1.2.0.0-SNAPSHOT</version> <type>jar</type> </dependency>
Features
- 100% Java code
- It can runs an Embedded Cassandra Service or just talk to your external Cassandra Service
- Uses Hector library as Java Cassandra client
- Dynamically [Super] Column Family creation
- Key Types and Data Types defined at runtime with the use of Generics
- Main CRUD methods supported:
boolean containsKey(KeyType key); void delete(KeyType key); T find(KeyType key); Map> findItems(final List keys, final Long rangeFrom, final Long rangeTo); Set findItems(final List keys, final String rangeFrom, final String rangeTo); Set getKeys(); void save(KeyType key, T model);
Main Classes
This util package contains the following package and classes:
org.ebayopensource.turmeric.utils.cassandra.service
- CassandraManager: initialize a static EmbeddedCassandraService instance based on yaml configuration file
org.ebayopensource.turmeric.utils.cassandra.hector
- HectorManager: Manages the keyspace and column family creation and reading. It uses Hector Api
- HectorHelper: Includes some utility methods based on Java Reflection and Java Generics. IE: retrieving the field names from a POJO which are used as column names in cassandra keyspaces
org.ebayopensource.turmeric.utils.cassandra.dao
- AbstractColumnFamilyDao: As it is called, this should be a base class that every dao should extends. It defines and implements basic DAO operation with the use of Hector Api.
Configuration files
- log4j.properties: Log4j properties files
- cassandra.yaml: Storage configuration file. For more info: storage configuration setup.
Here is the directory structure of the configuration files:
META-INF/ security/ config/ cassandra/ cassandra.properties
An example of this property file:
cassandra-cluster-name=TurmericCluster cassandra-host-ip=127.0.0.1 cassandra-rpc-port=9160 cassandra-my-keyspace=My-keyspace #column families cassandra-foo-column-family=foo cassandra-bar-column-family=bar
How to use it….
It is very intuitive. Lets suppose we have a Foo table in our relational DB, ie MySql.
So:
Create the BaseDao interface
public interface BaseDao { public void delete(String key); public Set getKeys(); public boolean containsKey(String key); public void save(String key, FooPojoClass fooPojo); public FooPojoClass find(String key); }
Create the FooDao interface
public interface FooDao extends BaseDao { }
Create the FooDao implementation
public class FooDaoImpl extends AbstractColumnFamilyDao implements FooDao { public FooDaoImpl(final String clusterName, final String host, final String keySpace, final String cf, final Class kTypeClass) { super(clusterName, host, keySpace, kTypeClass, FooPojo.class, cf); } }
… in your code
//initiates an embedded Cassandra Service CassandraManager.initialize(); //creates our Foo Column Family FooDao fooDao = new FooDaoImpl("myCluster", "127.0.0.1", "myKeyspace", "myColumnFamilyName", String.class);
and voilà, you have your relational table migrated as a Cassandra column family!!!
Anyways your can surf at UT classes to see how are they implemented…
enjoy it!!!
Source: http://itsecrets.wordpress.com/2012/01/12/jumping-from-mysql-to-cassandra-a-success-story/
Opinions expressed by DZone contributors are their own.
Comments