Setting Up Cassandra With Priam
Setting Up Cassandra With Priam
See how Priam, a tool created by Netflix, can help bring backup and restore functionality to your Cassandra databases, and see how to avoid hurdles along the way.
Join the DZone community and get the full member experience.Join For Free
Built by the engineers behind Netezza and the technology behind Amazon Redshift, AnzoGraph™ is a native, Massively Parallel Processing (MPP) distributed Graph OLAP (GOLAP) database that executes queries more than 100x faster than other vendors.
I’ve previously explained how to set up Cassandra in AWS. The described setup works, but in some cases, it may not be sufficient. For example, it doesn’t give you an easy way to make and restore backups, and adding new nodes relies on a custom Python script that randomly selects a seed.
So now I’m going to explain how to set up Priam, a Cassandra helper tool by Netflix.
My main reason for setting it up is the backup/restore functionality that it offers. All other ways to do backups are very tedious, and Priam happens to have implemented the important bits – the snapshotting and the incremental backups.
Priam is a bit tricky to get running, though. The setup guide is not too detailed and not easy to find (it’s the last visible item in the wiki). First, it has one branch per Cassandra version, so you have to check out the proper branch and build it. I immediately hit an issue there, as their naming doesn’t allow Eclipse to import the Gradle project. Within 24 hours, I reported three issues, which isn’t ideal. Priam doesn’t support dynamic SimpleDB names, and it doesn’t let you override bundled properties via the command line. I hope there aren’t bigger issues. The ones that I encountered, I fixed and made a pull request.
What does the setup look like?
- Append a javaagent to the JVM options
- Run the Priam web
- It automatically replaces most of cassandra.yaml, including the seed provider (i.e. how does the node find other nodes in the cluster?)
- Run Cassandra
- It fetches seed information (which is stored in AWS SimpleDB) and connects to a cluster
I decided to run the WAR file with a standalone Jetty runner, rather than installing Tomcat. In terms of shell scripts, the core bits look like this (in addition to the shell script in the original post that is run upon initialization of the node):
# Get the Priam war file and jar file aws s3 cp s3://$BUCKET_NAME/priam-web-3.12.0-SNAPSHOT.war ~/ aws s3 cp s3://$BUCKET_NAME/priam-cass-extensions-3.12.0-SNAPSHOT.jar /usr/share/cassandra/lib/priam-cass-extensions.jar # Set the Priam agent echo "-javaagent:/usr/share/cassandra/lib/priam-cass-extensions.jar" >> /etc/cassandra/conf/jvm.options # Download jetty-runner to be able to run the Priam war file from the command line wget http://central.maven.org/maven2/org/eclipse/jetty/jetty-runner/9.4.8.v20171121/jetty-runner-9.4.8.v20171121.jar nohup java -Dpriam.clustername=LogSentinelCluster -Dpriam.sdb.instanceIdentity.region=$EC2_REGION -Dpriam.s3.bucket=$BACKUP_BUCKET \ -Dpriam.sdb.instanceidentity.domain=$INSTANCE_IDENTITY_DOMAIN -Dpriam.sdb.properties.domain=$PROPERTIES_DOMAIN \ -Dpriam.client.sslEnabled=true -Dpriam.internodeEncryption=all -Dpriam.rpc.server.type=sync \ -Dpriam.partitioner=org.apache.cassandra.dht.Murmur3Partitioner -Dpriam.backup.retention.days=7 \ -Dpriam.backup.hour=$BACKUP_HOUR -Dpriam.vnodes.numTokens=256 -Dpriam.thrift.enabled=false \ -jar jetty-runner-9.4.8.v20171121.jar --path /Priam ~/priam-web-3.12.0-SNAPSHOT.war & while ! echo exit | nc $BIND_IP 8080; do sleep 10; done echo "Started Priam web package" service cassandra start chkconfig cassandra on while ! echo exit | nc $BIND_IP 9042; do sleep 10; done
BACKUP_BUCKET, PROPERTIES_DOMAIN, and INSTANCE_DOMAIN are supplied via a CloudFormation script (as we can’t know the exact names in advance – especially for SimpleDB). Note that these properties won’t work in the main repo – I added them in my pull request.
In order for that to work, you need to have the two SimpleDB domains created (e.g. by CloudFormation). It is possible that you could replace SimpleDB with some other data storage (and not rely on AWS), but that’s out of scope for now.
The result of running Priam would be that you have your Cassandra nodes in SimpleDB (you can browse it using this chrome extension as AWS doesn’t offer any UI) and, of course, backups will be automatically created in the backup S3 Bucket.
You can then restore a backup by logging to each node and executing:
You specify the time range for the restore. Still not ideal, as one would hope to have a one-click restore, but much better than rolling out your own backup and restore infrastructure.
One very important note here – vnodes are not supported. My original cluster had a default of 256 vnodes per machine, and now it has just 1 — because Priam doesn’t support anything other than 1. That’s a pity, since vnodes are the recommended way to set up Cassandra. Apparently Netflix doesn’t use those, however. There’s a work-in-progress branch for that that was abandoned 5 years ago. Fortunately, there’s a fresh pull request with vnode support that can be used in conjunction with my pull request from this branch.
Priam replaces some Cassandra defaults with other values so you might want to compare your current setup and the newly generated cassandra.yaml. Overall, it doesn’t feel super-production ready, but apparently it is, as Netflix is using it in production.
Published at DZone with permission of Bozhidar Bozhanov , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.