Composite Keys in Cassandra

Amresh Singh

Nov. 14, 12 · Interview

Likes (1)

Comment

Save

20.0K Views

Introduction

A composite key consists of one or more primary key fields. Each field must be of data type supported by underlying data-store.

In JPA (Java Persistence API), there are two ways of specifying composite keys:

1. Composite Primary Key:

@Entity
@IdClass(TimelineId.class)
public class Timeline {
    @Id int userId;
    @Id long tweetId;

    //Other non-primary key fields
}
Class TimelineId {
    int userId;
    long tweetId;
}

2. Embedded Primary Key:

@Entity
public class Timeline {
    @EmbeddedId TimelineId id;

//Other non-primary key fields
}

@Embeddable
Class TimelineId {
   int userId;
   long tweetId;
}

Above Timeline entity is inspired from famous twissandra example. Starting 1.1 release, Cassandra supports composite keys.

Cassandra Composite Keys in Action

Visit this page in order to understand Cassandra Schema in general. In this section I will give you a feel of how composite keys are stored in Cassandra.

Let's start Cassandra 1.1.x server and run following commands from Cassandra/bin directory:

CQL:

./cqlsh -3 localhost 9160

CREATE KEYSPACE twissandra with strategy_class = 'SimpleStrategy' and strategy_options:replication_factor=1;

use twissandra;

CREATE TABLE timeline(
    user_id varchar,
    tweet_id varchar,
    tweet_device varchar,
    author varchar,
    body varchar,
    PRIMARY KEY(user_id,tweet_id,tweet_device));

INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('xamry', 't1', 'web', 'Amresh', 'Here is my first tweet');
INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('xamry', 't2', 'sms', 'Saurabh', 'Howz life Xamry');
INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('mevivs', 't1', 'iPad', 'Kuldeep', 'You der?');
INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('mevivs', 't2', 'mobile', 'Vivek', 'Yep, I suppose');

cqlsh:twissandra> select * from timeline;
 user_id | tweet_id | author  | body
---------+----------+---------+------------------------
   xamry |       t1 |  Amresh | Here is my first tweet
   xamry |       t2 | Saurabh |        Howz life Xamry
  mevivs |       t1 | Kuldeep |               You der?
  mevivs |       t2 |   Vivek |         Yep, I suppose

cqlsh:twissandra> SELECT * FROM timeline WHERE user_id='xamry';
 user_id | tweet_id | tweet_device | author  | body
---------+----------+--------------+---------+------------------------
   xamry |       t1 |          web |  Amresh | Here is my first tweet
   xamry |       t2 |          sms | Saurabh |        Howz life Xamry

cqlsh:twissandra> select * from timeline where tweet_id = 't1';
 user_id | tweet_id | tweet_device | author  | body
---------+----------+--------------+---------+------------------------
   xamry |       t1 |          web |  Amresh | Here is my first tweet
  mevivs |       t1 |         iPad | Kuldeep |               You der?

cqlsh:twissandra> select * from timeline where user_id = 'xamry' and tweet_id='t1';
 user_id | tweet_id | tweet_device | author | body
---------+----------+--------------+--------+------------------------
   xamry |       t1 |          web | Amresh | Here is my first tweet

cqlsh:twissandra> select * from timeline where user_id = 'xamry' and author='Amresh';
Bad Request: No indexed columns present in by-columns clause with Equal operator

cqlsh:twissandra> select * from timeline where user_id = 'xamry' and tweet_device='web';
Bad Request: PRIMARY KEY part tweet_device cannot be restricted (preceding part tweet_id is either not restricted or by a non-EQ relation)

cqlsh:twissandra> select * from timeline where user_id = 'xamry' and tweet_id = 't1' and tweet_device='web';
 user_id | tweet_id | tweet_device | author | body
---------+----------+--------------+--------+------------------------
   xamry |       t1 |          web | Amresh | Here is my first tweet

Cassandra-cli:

impadmin@impetus-ubuntu:/usr/local/apache-cassandra-1.1.2/bin$ ./cassandra-cli -h localhost -p 9160
Connected to: "Test Cluster" on localhost/9160
Welcome to Cassandra CLI version 1.1.2

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown] use twissandra;
Authenticated to keyspace: twissandra
[default@twissandra] list timeline;
<pre>Using default limit of 100
Using default column limit of 100
-------------------
RowKey: xamry
=> (column=t1:web:author, value=Amresh, timestamp=1343729388951000)
=> (column=t1:web:body, value=Here is my first tweet, timestamp=1343729388951001)
=> (column=t2:sms:author, value=Saurabh, timestamp=1343729388973000)
=> (column=t2:sms:body, value=Howz life Xamry, timestamp=1343729388973001)
-------------------
RowKey: mevivs
=> (column=t1:iPad:author, value=Kuldeep, timestamp=1343729388991000)
=> (column=t1:iPad:body, value=You der?, timestamp=1343729388991001)
=> (column=t2:mobile:author, value=Vivek, timestamp=1343729389941000)
=> (column=t2:mobile:body, value=Yep, I suppose, timestamp=1343729389941001)

Observations

First part of composite key (user_id) is called "Partition Key", rest (tweet_id, tweet_device) are remaining keys.
Cassandra stores columns differently when composite keys are used. Partition key becomes row key. Remaining keys are concatenated with each column name (":" as separator) to form column names. Column values remain unchanged.
Remaining keys (other than partition keys) are ordered, and it's not allowed to search on any random column, you have to start with the first one and then you can move to the second one and so on. This is evident from "Bad Request" error.

Relational database

Published at DZone with permission of Amresh Singh, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending