Platinum Partner
architects,bigdata,theory,tips and tricks,tools & methods

Understanding How CQL3 Maps to Cassandra's Internal Data Structure

This is just a quick reference for now. Later I hope to flesh out the details and discuss some of the cooler features present here:

Maps

Defining:

cqlsh:test> CREATE TABLE phonelists (... user text PRIMARY KEY,... phoneNumbers map<text,text>);

Inserting:

cqlsh:test> INSERT INTO phonelists (user, phonenumbers)... VALUES ('john',{'patricia':'555-4326','doug':'555-1579'});
cqlsh:test> INSERT INTO phonelists (user, phonenumbers)... VALUES ('scott',{'bill':'555-7382','patricia':'555-4326','jane':'555-8743'});

Retrieving:

cqlsh:test> SELECT * FROM phonelists;

 user  | phonenumbers
-------+------------------------------------------------------
 scott |{bill:555-7382, jane:555-8743, patricia:555-4326}
  john |{doug:555-1579, patricia:555-4326}

Updating:

cqlsh:test> UPDATE phonelists  
        ... SET phonenumbers = phonenumbers +{'daniel':'555-0453'}... WHERE user='john';

The internal representation:

[default@test] list phonelists;-------------------RowKey: scott
=>(column=, value=, timestamp=1374684062860000)=>(column=phonenumbers:bill, value='555-7382', timestamp=1374684062860000)=>(column=phonenumbers:jane, value='555-8743', timestamp=1374684062860000)=>(column=phonenumbers:patricia, value='555-4326', timestamp=1374684062860000)-------------------RowKey: john
=>(column=, value=, timestamp=1374683971220000)=>(column=phonenumbers:doug, value='555-1579', timestamp=1374683971220000)=>(column=phonenumbers:patricia, value='555-4326', timestamp=1374683971220000)

Note that the above text components, e.g. ‘doug’ and ’555-1579′, are returned by cassandra-cli in their hex encoding. I’ve taken the liberty to decode them so that the print out is more understandable. I will do this throughout the post.

Lists

Defining:

cqlsh:test> CREATE TABLE friendlists (... user text PRIMARY KEY,... friends list <text>...);

Inserting:

cqlsh:test> INSERT INTO friendlists (user, friends)... VALUES ('john',['doug','patricia','scott']);
cqlsh:test> INSERT INTO friendlists (user, friends)... VALUES ('patricia',['john','lucifer']);

Retrieving:

cqlsh:test> SELECT * FROM friendlists;

 user     | friends
----------+-------------------------
     john |[doug, patricia, scott]
 patricia |[john, lucifer]

Updating:

cqlsh:test> UPDATE friendlists  
        ... SET friends = friends +['matt','eric']... WHERE user='john';
cqlsh:test> UPDATE friendlists  
        ... SET friends = friends -['lucifer']... WHERE user='patricia';

The internal representation:

[default@test] list friendlists;Usingdefault limit of 100Usingdefault column limit of 100-------------------RowKey: john
=>(column=, value=, timestamp=1374687324950000)=>(column=friends:26017c10f48711e2801fdf9895e5d0f8, value='doug', timestamp=1374687206993000)=>(column=friends:26017c11f48711e2801fdf9895e5d0f8, value='patricia', timestamp=1374687206993000)=>(column=friends:26017c12f48711e2801fdf9895e5d0f8, value='scott', timestamp=1374687206993000)=>(column=friends:6c504b60f48711e2801fdf9895e5d0f8, value='matt', timestamp=1374687324950000)=>(column=friends:6c504b61f48711e2801fdf9895e5d0f8, value='eric', timestamp=1374687324950000)-------------------RowKey: patricia
=>(column=, value=, timestamp=1374687352290000)=>(column=friends:3b817b80f48711e2801fdf9895e5d0f8, value='john', timestamp=1374687243064000)

Here the internal column name is more complicated because a UUID is appended to the name of the CQL field “friend”. This is used to keep track of the order of items in the list.

To be determined: Does a list item delete take more time than a list insert? I suspect so – I don’t see how Cassandra can delete an element in the list without reading in all the elements in the list and then deleting column that has the value indicated in the delete.

Set

Defining:

cqlsh:test> CREATE TABLE friendsets (... user text PRIMARY KEY,... friends set<text>...);

Inserting:

cqlsh:test> INSERT INTO friendsets (user, friends)... VALUES ('john',{'doug','patricia','scott'});
cqlsh:test> INSERT INTO friendsets (user, friends)... VALUES ('patricia',{'john','lucifer'});

Retrieving:

cqlsh:test> SELECT * FROM friendsets;

 user     | friends
----------+-------------------------
     john |{doug, patricia, scott}
 patricia |{john, lucifer}

Updating:

cqlsh:test> UPDATE friendsets  
        ... SET friends = friends +{'matt','eric'}... WHERE user='john';
cqlsh:test> UPDATE friendsets  
        ... SET friends = friends -{'lucifer'}... WHERE user='patricia';

The internal representation:

[default@test] list friendsets;Usingdefault limit of 100Usingdefault column limit of 100-------------------RowKey: john
=>(column=, value=, timestamp=1374688135443000)=>(column=friends:'doug', value=, timestamp=1374688108307000)=>(column=friends:'eric', value=, timestamp=1374688135443000)=>(column=friends:'matt', value=, timestamp=1374688135443000)=>(column=friends:'patricia', value=, timestamp=1374688108307000)=>(column=friends:'scott', value=, timestamp=1374688108307000)-------------------RowKey: patricia
=>(column=, value=, timestamp=1374688151386000)=>(column=friends:'john', value=, timestamp=1374688116595000)

- See more at: http://www.opensourceconnections.com/2013/07/24/understanding-how-cql3-maps-to-cassandras-internal-data-structure-sets-lists-and-maps/#sthash.6nKcPjFs.dpuf

Published at DZone with permission of {{ articles[0].authors[0].realName }}, DZone MVB. (source)

Opinions expressed by DZone contributors are their own.

{{ tag }}, {{tag}},

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}
{{ parent.authors[0].realName || parent.author}}

{{ parent.authors[0].tagline || parent.tagline }}

{{ parent.views }} ViewsClicks
Tweet

{{parent.nComments}}