How to Persist Instances in Cassandra Using Hector and Scala
Join the DZone community and get the full member experience.
Join For FreeI am going to show you how I went about solving how to persist instances in Cassandra, using Hector, in the function insert(instance)
in Scala. To work out how these instances are going to be persisted, I
shall use type classes. More than just the code, I will explain &
show every step of my design.
Crash course in Cassandra
Cassandra is a schema-free database; to understand it, here are the most important concepts and their loose mapping to the relational databases counterparts:
- keyspace – schema; database
- column family – table, with key and rows
- key – primary key
- row – collection of columns; the rows in the column family may have completely different columns
- column – column
When inserting data into Cassandra, we must be able to serialise the data to be inserted. To do that, we must know the key type and the names and types of all columns.
Back to Scala
Let’s turn back to our insert(instance)
function. Intuitively, we would understand what should happen if we were to insert a simple case class:
case class User(username: String, password: String, firstName: String, lastName: String, id: UUID)
Calling insert(User("janm", "yeah right, like I'd tell ya!", "Jan", "Machacek", UUID.randomUUID))
, we would expect a new row with key equal to the value of id
; with four columns (username -> janm
, password -> ...
, …) in the column family user
.
It seems that I need to be able to (ignoring setting the column values, which I shall leave as exercise for the readers):
- extract column family, given the instance
- extract the key, given the instance
- obtain the serializer for the extracted key
- set the columns, given the extracted key, column family and
Mutator
Let’s turn this into Scala code
package object hector { type KeySerializer[K] = () => Serializer[K] type KeyExtractor[A, K] = A => K type ColumnFamilyExtractor[A] = A => String type ColumnExtractor[A, K] = A => (K, String, Mutator[K]) => Unit }
Now that I these types, let’s have them given to the insert
function implicitly:
trait Hector { def keyspace: Keyspace def insert[A, Key](instance: A) (implicit keySerializer: KeySerializer[Key], keyExtractor: KeyExtractor[A, Key], columnFamilyExtractor: ColumnFamilyExtractor[A], columnExtractor: ColumnExtractor[A, Key]) { val mutator = HFactory.createMutator(keyspace, keySerializer()) val key = keyExtractor(instance) val columnFamily = columnFamilyExtractor(instance) columnExtractor(instance)(key, columnFamily, mutator) mutator.execute() } }
The insert
function in the Hector
trait uses the Hector Java API for Cassandra; the Keyspace
instance is a reference to the keyspace in Cassandra.
In the first line of the insert
function, we create a Mutator[K]
for the keyspace, supplying the Serializer[K]
we obtained from the keySerializer
. Next, we use the keyExtractor
to extract the value of the key K
from the instance
; then we extract the name of the column family from the same instance
. Finally, we obtain a function that calls addInsertion
of the Mutator[K]
; and we complete the body by executing the queued insertions.
Simples!
The type classes
To use the Hector
trait to insert some instances, I need instances of the type classes KeySerializer[K]
, KeyExtractor[A, K]
, ColumnFamilyExtractor[A]
and ColumnExtractor[A, K]
for the types that I am inserting. Eh–what?
case class User(username: String, password: String, firstName: String, lastName: String, id: UUID) object Main extends App with Hector { def keyspace = // connect to the keyspace insert(User("janm", "yeah right, like I'd tell ya!", "Jan", "Machacek", UUID.randomUUID)) }
Without the instances of the type classes, the code will not compile:
there are no implicit values that are assignable to the implicit
parameters of the insert
function.
Home for the type classes
Before we jump into implementing the instances of the type classes,
we must decide where they “live”. Because we want our code to be
flexible, a good place for the type class instances are traits; traits
that you can mix in wherever you use the Hector
trait. All because in one case, you are inserting a case class
with id: UUID
field; in another case, you are inserting a class
with @Id key: String
getter. I would like to have the flexibility to write:
object Main extends App with Hector with UUIDKeySerializer with UUIDIdKeyExtractor with ... { }
Or, in the second case
object Main extends App with Hector with StringKeySerializer with StringAnnotatedKeyExtractor with ... { }
So, let’s implement instances of these type classes.
Instance of the KeySerializer[K]
We shall implement a KeySerializer[K]
for UUID
as K
:
trait UUIDKeySerializer { implicit object UUIDKeySerializer extends KeySerializer[UUID] { def apply() = UUIDSerializer.get() } }
So, for a key of type UUID
, the compiler can find implicit value of KeyExtractor[UUID]
and supply it to the call of the insert
function. The next task is to be able to extract the value of the key.
Instance of the KeyExtractor[A, K]
We are calling insert(User(..., UUID.randomUUID))
; and the key type is UUID
. We shall implement instance of KeyExtractor[A <: {def id: K}, K]
, which extracts key of type K
from some type A
, which contains getter called id
returning K
. We then further specialise the type class instance into KeyExtractor[A <: {def id: UUID}, UUID]
to match our case class.
trait IdKeyExtractor { class IdKeyExtractor[A <: {def id: K}, K] extends KeyExtractor[A, K] { def apply(value: A) = value.id } } trait UUIDIdKeyExtractor extends IdKeyExtractor { implicit def UUIDIdKeyExtractor[A <: {def id: UUID}] = new IDKeyExtractor[A, UUID] }
Excellent–when I call insert(User(..., UUID.randomUUID))
, the compiler will find that the only possible type class instance for the KeyExtractor[A, K]
is the UUIDIdKeyExtractor[User, UUID]
(meaning that the type of the K
is now UUID
), which means that the only applicable type class instance for KeySerializer[K]
is the UUIDKeySerializer
. Onwards!
Instance of the ColumnFamilyExtractor[A]
Before we can insert rows (with keys and columns), we must know the
name of the column family. For simplicity, let’s take an approach
similar to JPA and use the [simple] type name of the instances we’re
inserting as the column family name. In our case, we’re inserting
instances of User
, so the column family should be user
. The instance of the ColumnFamilyExtractor[A]
is therefore:
trait TypeNameColumnFamilyExtractor { implicit object TypeNameColumnFamilyExtractor extends ColumnFamilyExtractor[AnyRef] { def apply(v1: AnyRef) = v1.getClass.getSimpleName.toLowerCase } }
So, the compiler knows how to get its hands on instance of KeyExtractor[A, K]
, KeySerializer[K]
, ColumnFamilyExtractor[A]
where A
is User
and K
is UUID
. Now we have to set the column values.
Instance of the ColumnExtractor[A, K]
I will simply outline the implementation and leave the details to the
curious readers–not because the implementation is difficult, but
because this blog post is getting rather too long. Anyway, a skeleton of
an instance of ColumnExtractor[A, K]
is for case classes is:
trait ProductColumnExtractor { implicit def ProductColumnExtractor[K] = new ColumnExtractor[Product, K] { def apply(value: Product) = { (key: K, columnFamily: String, mutator: Mutator[K]) => // TODO: extract the values and serialize them for-all-fields { val fieldValue = /// val fieldName = /// // as an example for String columns, you could call mutator.addInsertion(key, columnFamily, HFactory.createStringColumn(fieldName, fieldValue)) } () } } }
Usage
This completes the instances of the type classes I need to insert the User
instances; all I need to do is to mix in the appropriate traits that contain the correct type class instances.
case class User(username: String, password: String, firstName: String, lastName: String, id: UUID) object Main extends App with Hector with UUIDIdKeyExtractor with UUIDKeySerializer with TypeNameColumnFamilyExtractor with ProductColumnExtractor { def keyspace = // connect to the keyspace insert(User("janm", "yeah right, like I'd tell ya!", "Jan", "Machacek", UUID.randomUUID)) }
What has all this achieved? Well, I have compile-time verification of all types I am inserting; and if I decide to insert a value for which I have no instances of the type classes, I will get a compiler error! This is much better than discovering that something fails at runtime.
Parting gift
Naturally, this code will make its way to my GitHub account at https://github.com/janm399 in the next few days, but I shall give you an example of where I have used this very code in an Akka actor (with the Configuration Akka pattern):
class UserActor extends Actor with Configured with Hector with UUIDIdKeyExtractor with UUIDKeySerializer with TypeNameColumnFamilyExtractor with ProductColumnExtractor { def keyspace = configured[Keyspace] protected def receive = { case Register(user) => // business logic left to readers' imagination! insert(user) } }
Finally, because I have many actors that want the same instances of the type classes, I have DefaultHector
trait, which is:
trait DefaultHector extends Hector with UUIDIdKeyExtractor with UUIDKeySerializer with TypeNameColumnFamilyExtractor with ProductColumnExtractor
And it is the DefaultHector
trait that I mix in to my actors… But that’s for another blog post!
Opinions expressed by DZone contributors are their own.
Comments