Composite Keys in Cassandra
Join the DZone community and get the full member experience.
Join For FreeIntroduction
A composite key consists of one or more primary key fields. Each field must be of data type supported by underlying data-store.
In JPA (Java Persistence API), there are two ways of specifying composite keys:
1. Composite Primary Key:
@Entity @IdClass(TimelineId.class) public class Timeline { @Id int userId; @Id long tweetId; //Other non-primary key fields } Class TimelineId { int userId; long tweetId; }
2. Embedded Primary Key:
@Entity public class Timeline { @EmbeddedId TimelineId id; //Other non-primary key fields } @Embeddable Class TimelineId { int userId; long tweetId; }
Above Timeline entity is inspired from famous twissandra example. Starting 1.1 release, Cassandra supports composite keys.
Cassandra Composite Keys in Action
Visit this page in order to understand Cassandra Schema in general. In this section I will give you a feel of how composite keys are stored in Cassandra.
Let's start Cassandra 1.1.x server and run following commands from Cassandra/bin directory:
CQL:
./cqlsh -3 localhost 9160 CREATE KEYSPACE twissandra with strategy_class = 'SimpleStrategy' and strategy_options:replication_factor=1; use twissandra; CREATE TABLE timeline( user_id varchar, tweet_id varchar, tweet_device varchar, author varchar, body varchar, PRIMARY KEY(user_id,tweet_id,tweet_device)); INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('xamry', 't1', 'web', 'Amresh', 'Here is my first tweet'); INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('xamry', 't2', 'sms', 'Saurabh', 'Howz life Xamry'); INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('mevivs', 't1', 'iPad', 'Kuldeep', 'You der?'); INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('mevivs', 't2', 'mobile', 'Vivek', 'Yep, I suppose'); cqlsh:twissandra> select * from timeline; user_id | tweet_id | author | body ---------+----------+---------+------------------------ xamry | t1 | Amresh | Here is my first tweet xamry | t2 | Saurabh | Howz life Xamry mevivs | t1 | Kuldeep | You der? mevivs | t2 | Vivek | Yep, I suppose cqlsh:twissandra> SELECT * FROM timeline WHERE user_id='xamry'; user_id | tweet_id | tweet_device | author | body ---------+----------+--------------+---------+------------------------ xamry | t1 | web | Amresh | Here is my first tweet xamry | t2 | sms | Saurabh | Howz life Xamry cqlsh:twissandra> select * from timeline where tweet_id = 't1'; user_id | tweet_id | tweet_device | author | body ---------+----------+--------------+---------+------------------------ xamry | t1 | web | Amresh | Here is my first tweet mevivs | t1 | iPad | Kuldeep | You der? cqlsh:twissandra> select * from timeline where user_id = 'xamry' and tweet_id='t1'; user_id | tweet_id | tweet_device | author | body ---------+----------+--------------+--------+------------------------ xamry | t1 | web | Amresh | Here is my first tweet cqlsh:twissandra> select * from timeline where user_id = 'xamry' and author='Amresh'; Bad Request: No indexed columns present in by-columns clause with Equal operator cqlsh:twissandra> select * from timeline where user_id = 'xamry' and tweet_device='web'; Bad Request: PRIMARY KEY part tweet_device cannot be restricted (preceding part tweet_id is either not restricted or by a non-EQ relation) cqlsh:twissandra> select * from timeline where user_id = 'xamry' and tweet_id = 't1' and tweet_device='web'; user_id | tweet_id | tweet_device | author | body ---------+----------+--------------+--------+------------------------ xamry | t1 | web | Amresh | Here is my first tweet
Cassandra-cli:
impadmin@impetus-ubuntu:/usr/local/apache-cassandra-1.1.2/bin$ ./cassandra-cli -h localhost -p 9160 Connected to: "Test Cluster" on localhost/9160 Welcome to Cassandra CLI version 1.1.2 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] use twissandra; Authenticated to keyspace: twissandra [default@twissandra] list timeline; <pre>Using default limit of 100 Using default column limit of 100 ------------------- RowKey: xamry => (column=t1:web:author, value=Amresh, timestamp=1343729388951000) => (column=t1:web:body, value=Here is my first tweet, timestamp=1343729388951001) => (column=t2:sms:author, value=Saurabh, timestamp=1343729388973000) => (column=t2:sms:body, value=Howz life Xamry, timestamp=1343729388973001) ------------------- RowKey: mevivs => (column=t1:iPad:author, value=Kuldeep, timestamp=1343729388991000) => (column=t1:iPad:body, value=You der?, timestamp=1343729388991001) => (column=t2:mobile:author, value=Vivek, timestamp=1343729389941000) => (column=t2:mobile:body, value=Yep, I suppose, timestamp=1343729389941001)
Observations
- First part of composite key (user_id) is called "Partition Key", rest (tweet_id, tweet_device) are remaining keys.
- Cassandra stores columns differently when composite keys are used. Partition key becomes row key. Remaining keys are concatenated with each column name (":" as separator) to form column names. Column values remain unchanged.
- Remaining keys (other than partition keys) are ordered, and it's not allowed to search on any random column, you have to start with the first one and then you can move to the second one and so on. This is evident from "Bad Request" error.
Published at DZone with permission of Amresh Singh, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments