Over a million developers have joined DZone.

Cassandra: Intro to Designing Tables Access, Query and so on.

DZone's Guide to

Cassandra: Intro to Designing Tables Access, Query and so on.

· Database Zone ·
Free Resource

RavenDB vs MongoDB: Which is Better? This White Paper compares the two leading NoSQL Document Databases on 9 features to find out which is the best solution for your next project.  


Cassandra is a fast NoSQL data store that has the distributed capabilities of Amazon's DynamoDB and Google's BigTable data structure. 

Cassandra is a scalable, high available and partition tolerant database that had focused  specifically with 'AP' of CAP theorem, however, 'C' can be achieved with higher performance penalty, by getting a quorum of all the nodes participating in the cluster. 

I think that is already some background, so we should really start on the important stuff of this write-up.

Designing Tables

NoSQL practices will tell you that you need to know how will query your records before even design the tables. It means that we need to know what and how we're going to access that data. 

Assuming we have a simple data set:


You would want to be able to access the data by id, color, bodytype, year, model and brand. 

If we distinguish, that this is how I will be accessing the data then we need to decide a couple of things.

1) Do we need access this dataset, using 1 or the attributes or really a combination of them?

2) Do I have a way to get other attributes elsewhere and create a combination keys?

For sake of this writeup assume that we selected 1) above because that is what we really need to do. Then we need to create Cassandra's cluster keys.

NOTE: That I have jumped from creating keyspace and so, as I've assume you've done those before. 

01. CREATE TABLE Vehicle (
02.     id timeuuid,
03.     name text,
04.     brand text,
05.     model text,
06.     body_type text,
07.     trim text,
08.     color text,
09.     year text,
10.     wheelsize int,
11.     date_built timestamp,
12.     PRIMARY KEY ((id), color, body_type, year, model, brand ))
13.     WITH CLUSTERING ORDER BY (color asc, body_type ascyear asc, model asc, brand asc);

Lets describe what these is, we told Cassandra that we are going to have a table that will store vehicle attributes that are group by color, body_type, year, model and brand. We also said to store them ascending. 

So everytime we access Vehicle table we'll say via CQL

1. SELECT * FROM Keyspace.Vehicle where id='er34SRFer' and color='blue'
2.                            and body_type='sedan' and model='Camry' and brand='Toyota';

Now, we can't always provide all the keys, id, can be ommited but the other part of cluster keys must be provided, otherwise cassandra will prevent you from reading the data. Also, cassandra will allow you to do a lookup with only the id as the retriction attribute filter.

1. SELECT * FROM Keyspace.Vehicle where id='er34SRFer' ;

That is because id here is a partition key, so cassandra treat this as unique data row for that record.

If you want to access by specific cluster key attribute that are not restrictive, then you'll need to create a lookup table. Doing so, is designed using inverted index, which is the common technique to solve this problem.

Let's say you want to do a lookup by color, then we need to create the following table.

1. CREATE TABLE Vehicle_by_Color (
2.     color text,
3.     ids set<text>,
4.     PRIMARY KEY (color)
5.  );

With this table, I can keep all the ids of vehicle, and my key will be the 'color' column.

So, it my client code, what I do is.

1. public List<Vehicle> findByColor(String color) {
2.      Vehicle_by_Color ret = Cassandra.executeCQL( " SELECT * FROM Keyspace.VEHICLE_BY_COLOR where color='black'");
4.     //now find all the Vehicles matching this criteria.
5.     List<Vehicle> allVehicleInThisColor = Cassandra.executeCQL("SELECT * FROM Keyspace.VEHICLE where id in ( convertToInsCommaSeparated(ret.getIds()) );
7.    return allVehicleInThisColor;
8. }

Now, I don't have to provide all the cluster keys.


NoSQL design is not similar to relational world. It really is breaking the rules of relational world about normalizing records set and so. Here, we've duplicated data, we have created lookup table(s) to store additional data just to get to the records we want. Also, all data are mostly collapse on a table, because attributes are aggregated to the master record Vehicle. 

I hope this has help you designing your first Cassandra table. We've made few mistakes on desiging Cassandra tables in the past, as we've also added indexes, which is another set of complexity on its own. If you want to know more about that I suggest you read Cassandra's documentation. 

Overall we are very happy to have selected Cassandra as it performs really well with the data and load we've throwed at it. For us we want to push our Cassandra install to next level of adding more data, node and running Spark to run aggregations and calculations. More to come ... 

Get comfortable using NoSQL in a free, self-directed learning course provided by RavenDB. Learn to create fully-functional real-world programs on NoSQL Databases. Register today.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}