Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Cassandra: Intro to Designing Tables Access, Query and so on.

DZone's Guide to

Cassandra: Intro to Designing Tables Access, Query and so on.

· Database Zone
Free Resource

Whether you work in SQL Server Management Studio or Visual Studio, Redgate tools integrate with your existing infrastructure, enabling you to align DevOps for your applications with DevOps for your SQL Server databases. Discover true Database DevOps, brought to you in partnership with Redgate.

Background

Cassandra is a fast NoSQL data store that has the distributed capabilities of Amazon's DynamoDB and Google's BigTable data structure. 

Cassandra is a scalable, high available and partition tolerant database that had focused  specifically with 'AP' of CAP theorem, however, 'C' can be achieved with higher performance penalty, by getting a quorum of all the nodes participating in the cluster. 

I think that is already some background, so we should really start on the important stuff of this write-up.


Designing Tables

NoSQL practices will tell you that you need to know how will query your records before even design the tables. It means that we need to know what and how we're going to access that data. 

Assuming we have a simple data set:

Vehicle
 id
Name
Brand
Model
BodyType
Trim
Year
 color
 wheelSize
date_built 

You would want to be able to access the data by id, color, bodytype, year, model and brand. 

If we distinguish, that this is how I will be accessing the data then we need to decide a couple of things.

1) Do we need access this dataset, using 1 or the attributes or really a combination of them?

2) Do I have a way to get other attributes elsewhere and create a combination keys?

For sake of this writeup assume that we selected 1) above because that is what we really need to do. Then we need to create Cassandra's cluster keys.

NOTE: That I have jumped from creating keyspace and so, as I've assume you've done those before. 

01. CREATE TABLE Vehicle (
02.     id timeuuid,
03.     name text,
04.     brand text,
05.     model text,
06.     body_type text,
07.     trim text,
08.     color text,
09.     year text,
10.     wheelsize int,
11.     date_built timestamp,
12.     PRIMARY KEY ((id), color, body_type, year, model, brand ))
13.     WITH CLUSTERING ORDER BY (color asc, body_type ascyear asc, model asc, brand asc);

Lets describe what these is, we told Cassandra that we are going to have a table that will store vehicle attributes that are group by color, body_type, year, model and brand. We also said to store them ascending. 

So everytime we access Vehicle table we'll say via CQL

1. SELECT * FROM Keyspace.Vehicle where id='er34SRFer' and color='blue'
2.                            and body_type='sedan' and model='Camry' and brand='Toyota';

Now, we can't always provide all the keys, id, can be ommited but the other part of cluster keys must be provided, otherwise cassandra will prevent you from reading the data. Also, cassandra will allow you to do a lookup with only the id as the retriction attribute filter.

1. SELECT * FROM Keyspace.Vehicle where id='er34SRFer' ;

That is because id here is a partition key, so cassandra treat this as unique data row for that record.

If you want to access by specific cluster key attribute that are not restrictive, then you'll need to create a lookup table. Doing so, is designed using inverted index, which is the common technique to solve this problem.

Let's say you want to do a lookup by color, then we need to create the following table.

1. CREATE TABLE Vehicle_by_Color (
2.     color text,
3.     ids set<text>,
4.     PRIMARY KEY (color)
5.  );

With this table, I can keep all the ids of vehicle, and my key will be the 'color' column.

So, it my client code, what I do is.

1. public List<Vehicle> findByColor(String color) {
2.      Vehicle_by_Color ret = Cassandra.executeCQL( " SELECT * FROM Keyspace.VEHICLE_BY_COLOR where color='black'");
3.  
4.     //now find all the Vehicles matching this criteria.
5.     List<Vehicle> allVehicleInThisColor = Cassandra.executeCQL("SELECT * FROM Keyspace.VEHICLE where id in ( convertToInsCommaSeparated(ret.getIds()) );
6.     
7.    return allVehicleInThisColor;
8. }

Now, I don't have to provide all the cluster keys.

Summary

NoSQL design is not similar to relational world. It really is breaking the rules of relational world about normalizing records set and so. Here, we've duplicated data, we have created lookup table(s) to store additional data just to get to the records we want. Also, all data are mostly collapse on a table, because attributes are aggregated to the master record Vehicle. 

I hope this has help you designing your first Cassandra table. We've made few mistakes on desiging Cassandra tables in the past, as we've also added indexes, which is another set of complexity on its own. If you want to know more about that I suggest you read Cassandra's documentation. 

Overall we are very happy to have selected Cassandra as it performs really well with the data and load we've throwed at it. For us we want to push our Cassandra install to next level of adding more data, node and running Spark to run aggregations and calculations. More to come ... 

It’s easier than you think to extend DevOps practices to SQL Server with Redgate tools. Discover how to introduce true Database DevOps, brought to you in partnership with Redgate

Topics:

Published at DZone with permission of Melvin Ramos. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}