If you have recently come to Cassandra and you are happily using CQL, this blog post will probably only serve to confuse you. You may want to stop here.
If however, you:
- Are using Cassandra as a BigTable and care about the underlying persistence layer
- Started using CQL, but wonder why it has crazy limitations that aren't found in SQL
- Are accustomed to Java APIs and are trying to reconcile that with the advent of CQL
Now, I don't know the full history of all of it, but Cassandra has struggled with client apis from the beginning. There are remnants of that struggle littered throughout the code base, and I believe there were a couple false starts along the way. (cassandra$ grep -r "avro" *)
Specifically, if it looks, tastes and smells like SQL, people will expect SQL like behavior. People (and systems) expect to be able to construct arbitrary WHERE clauses, and JOINs. With those expectations, and without an understanding of the underlying storage model, features, functions and even performance might not align well with users expectations. (IMHO, never a good thing) We may find ourselves explaining BigTable concepts anyway, just to explain why JOINs aren't welcome here.
(or we can just point people to Dean Hiller and playORM, so he can explain why they are. =)
Also, I think we want to be careful not to hide the "simple" structures of the BigTable. If it becomes cumbersome to interact with the BigTable (the Maps), we'll end up alienating the portion of the community that came to Cassandra for simple dynamic/flexible schemas. That flexibility and simplicity allowed us to accomodate vast Varieties of data, one of the pillars in the 3 V's of BigData. We don't want to lose it.
For more and discussion on this, follow and chime in on:
With those caveats in mind, I'm on board with CQL. I intend to embrace it whole heartedly, especially given the enhancements coming down the pipe. IMHO, CQL is more than a SQL veneer for Cassandra. It is the foundation for future feature enhancements. And although Thrift will be around for a long, long, long, long time, Thrift RPC will begin to fall behind CQL. There is already evidence of that as CQL is going to provide first-class support for operations on collections in 1.2, with only limited support (via JSON) in Thrift. See:
With that conclusion, we've got our work cut out for us. All of the enhancements we've developed for Cassandra were built on Thrift (either as AOP against the Thrift API, or directly consuming it to enable embedding). This includes: cassandra-indexing, cassandra-triggers, and Virgil. For each, we need to find a path forward that embraces CQL, keeping in mind that CQL is built on an entirely new protocol, listening on an entirely different port. Additionally, I'm looking to develop the Spring Data integration layer on CQL.
Anyone looking to get involved, we'd love a hand in migrating Virgil and Cassandra-Triggers forward and creating the initial Spring Data integration!
I'm excited about the future prospects of CQL, but it will take everyone's involvement to ensure that we get the best of all worlds from it, and that we don't lose anything in the transition.