Over a million developers have joined DZone.

An Introduction To Cassandra: The Data Model

DZone's Guide to

An Introduction To Cassandra: The Data Model

· Java Zone
Free Resource

Download Microservices for Java Developers: A hands-on introduction to frameworks and containers. Brought to you in partnership with Red Hat.

I'm fairly new to the whole NoSQL game, and one thing I keep hearing is how great Cassandra  is. Built by Facebook and open sourced in 2008, Cassandra is probably the most popular NoSQL implementation: "A massively scalable, decentralized, structured data store". Cassandra takes it's distribution features from Dynamo and the data model from BigTable.

Before we look at using Cassandra, we first need to understand the data model. For developers new to Cassandra, coming from a relational database background,  the data model can be a bit confusing. Here's a summary of how the Cassandra data model is composed:


A Column is the most basic element in Cassandra: a simple tuple that contains a name, value and timestamp. All values are set by the client. That's an important consideration for the timestamp,as it means you'll need clock synchronization.


A SuperColumn is a column that stores an associative array of columns. You could think of it as similar to a HashMap in Java, with an identifying column (name) that stores a list of columns inside (value). The key difference between a Column and a SuperColumn is that the value of a Column is a string, where the value of a SuperColumn is a map of Columns. Note that SuperColumns have no timestamp, just a name and a value.


A ColumnFamily hold a number of Rows, a sorted map that matches column names to column values.  A row is a set of columns, similar to the table concept from relational databases. The column family holds an ordered list of columns which you can reference by column name.

The ColumnFamily can be of two types, Standard or Super. Standard ColumnFamilys contain a map of normal columns,


meanwhile Super ColumnFamily's contain rows of SuperColumns.


KeySpaces are the largest container, with an ordered list of ColumnFamilies, similar to a database in RDMS. The KeySpace is normally named after the application.

Multiple KeySpaces reside in clusters, the machines/nodes in a Cassandra instance. 


For another summary of the Cassandra data model, check out the (nicely titled) "WTF is a SuperColumn".

In the next article in this introduction series, we'll move onto the good stuff: using Cassandra in Java.

Download Building Reactive Microservices in Java: Asynchronous and Event-Based Application Design. Brought to you in partnership with Red Hat


Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}