Cassandra Is Lovely, But Sometimes You Need RDBMS (And This AOP Trigger Mechanism Can Help)
Join the DZone community and get the full member experience.Join For Free
Initially we took a batch approach to the problem, relying on Hadoop and Map/Reduce jobs to keep the external systems up to date. We would perform map/reduce jobs over column families to bulk update the external systems. This had obvious draw backs. Until the batch process completes, the index and the RDBMS are out of synch with Cassandra. Additionally, we would run over large portions of the column family even though only a small number of records had changed.
To keep the other systems synchronized, we could have complicated the cassandra clients, embedding the logic to orchestrate updates to all of the relevant systems, but that seemed like a nightmare. In the end, we decided to go for real-time trigger-like functionality. This removes the burden off of the client and allows us to keep other systems in synch in near real-time.
Maxim Grinev came to the same conclusion and submitted a patch to Cassandra, which triggered a lengthy discussion. (pun intended)
In the end, we decided to implement our own trigger mechanism using Aspect-Oriented Programming (AOP). Our mechanism is roughly based on Jonathan Ellis's Crack-Smoking Commit Log (CSCL). For each column family mutation, we write to a commit log. The log entries are then processed asynchronously by the triggers. Upon successful execution, the log entry is removed. We've released the project at github: https://github.com/hmsonline/cassandra-triggers
The design is certainly heavy and the documentation is still a bit rough around the edges, but its small amount of code and it is working like a champ. We've setup installation and configuration instructions. Let us know if you have any trouble getting started.
Published at DZone with permission of Brian O' Neill, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.