Over a million developers have joined DZone.

Cloud Formation on AWS for Cassandra with HPCC

Learn how to implement Cassandra on an existing AWS cluster.

· Cloud Zone

Download the Essential Cloud Buyer’s Guide to learn important factors to consider before selecting a provider as well as buying criteria to help you make the best decision for your infrastructure needs, brought to you in partnership with Internap.


If your primary objective is to setup a simple Cassandra cluster, then you probably want to start here:

http://docs.datastax.com/en/cassandra/2.1/cassandra/install/installAMI.html


However, if you have an existing AWS cluster to which you want to add Cassandra, then read on.


In my case, I wanted to add Cassandra to an existing HPCC cluster.  More specifically, I wanted to be able to spin-up an HPCC + Cassandra cluster with a single command.  To accomplish this, I decided to add a bit of python scripting on top of Cloud Formation.


Amazon has a facility called Cloud Formation.  Cloud Formation reads a JSON template file, and creates instances as described in that file. (pretty slick)  Within that JSON, you can execute shell commands that do the heavy lifting.  The JSON file can define parameters that the administrator can then provide via the management console, or via AWS CLI. 

(IMHO, I suggest installing AWS CLI)


Running a Cloud Formation

First, I started with Tim Humphrie's EasyFastHPCCoAWS.  That cloud formation template is a great basis.  It installs AWS CLI, and copies the contents of an S3 bucket down into /home/ec2-users.  Have a look at the template file.  To get that up and running, it is a simple matter of creating a PlacementGroup, a KeyPair, and an S3 bucket, into which you copy the contents of the github repo.  For simplicity, I named all of those the same thing: "realtime-hpcc".


Now, with a single command, I can fire up a low-cost HPCC cluster with the following:



aws cloudformation create-stack --capabilities CAPABILITY_IAM --stack-name realtime-hpcc --template-body https://s3.amazonaws.com/realtime-hpcc/MyHPCCCloudFormationTemplate.json --parameters \
   ParameterKey=HPCCPlacementGroup,ParameterValue=realtime-hpcc \
   ParameterKey=HPCCPlatform,ParameterValue=HPCC-Platform-5.2.2-1 \
   ParameterKey=KeyPair,ParameterValue=realtime-hpcc \
   ParameterKey=MasterInstanceType,ParameterValue=c3.2xlarge \
   ParameterKey=NumberOfRoxieNodes,ParameterValue=1 \
   ParameterKey=NumberOfSlaveInstances,ParameterValue=1 \
   ParameterKey=NumberOfSlavesPerNode,ParameterValue=2 \
   ParameterKey=RoxieInstanceType,ParameterValue=c3.2xlarge \
   ParameterKey=ScriptsS3BucketFolder,ParameterValue=s3://riptide-hpcc/ \
   ParameterKey=SlaveInstanceType,ParameterValue=c3.2xlarge \
   ParameterKey=UserNameAndPassword,ParameterValue=riptide/HIDDEN



Note, I specified the template via https url.  I also specified a stack-name, which is what you'll use when querying AWS for status, which you can do with the following command:



aws cloudformation describe-stacks --stack-name realtime-hpcc



With that you get a nice, clean JSON back that looks something like this:



{
    "Stacks": [
        {
            "StackId": "arn:aws:cloudformation:us-east-1:633162230041:stack/realtime-hpcc/e609e0b0-2595-11e5-97b7-5001b34a4a0a",
            "Description": "Launches instances for fast executing HPCC on AWS. Plus, it sets up and starts HPCC System.",
            "Parameters": [
                {
                    "ParameterValue": "realtime-hpcc",
                    "ParameterKey": "KeyPair"
                }...
            ],
            "Tags": [],
            "CreationTime": "2015-07-08T17:22:24.461Z",
            "Capabilities": [
                "CAPABILITY_IAM"
            ],
            "StackName": "realtime-hpcc",
            "NotificationARNs": [],
            "StackStatus": "CREATE_IN_PROGRESS",
            "DisableRollback": false
        }
    ]
}



The "StackStatus" is the key property.  You'll want to wait until that says, "CREATE_COMPLETE".

Once it completes, you can go into the management console and see your EC2 instances.


If something went wrong, you can go have a look in /var/log/user-data.log.  Tim's template conveniently redirects the output of the shell commands to that log file.

Installing Cassandra 

NOW -- to actually get Cassandra installed on the machines, I simply forked Tim's work and altered the Cloud Formation template to include the datastax repo and a yum install of Cassandra.   And the next time I created my cluster: poof magic voodoo, Cassandra was installed!


Next I needed to configure the Cassandra instances into a cluster.   At first, I tried to do this using a shell script executed as part of the cloud formation, but that proved difficult because I wanted the IP addresses for all the nodes, not just the one on which the script was running.  I shifted gears and decided to orchestrate the configuration from python after the cloud had already formed.


I wrote a quick little python script (configure_local_cassandra.py) that takes four parameters: the location of the cassandra.yaml file, the cluster name, the private IPs of the Cassandra nodes, and the IP of the node itself.   The python script updates the cassandra config, substituting those values into the template file.  I added this to the S3 bucket, and Cloud Formation took care of deploying the template and the python script to the machines.  (thanks to Tim's template)


Configuring Cassandra 

With that script and the template in place on each machine, the final piece is the script that gathers the IP addresses for the nodes and calls the python script via ssh.  For this, we use the aws ec2 cli, and fetch the JSON for all of our instances.  The aws ec2 command looks like this:



aws ec2 describe-instances



I wrote a python script (configure_cassandra_cluster.py) that parses that JSON and run commands on each of the nodes via ssh. 


To make everything simple, I added a bunch of shell scripts that wrap all the command lines (so I don't need to remember all the parameters).  The shell scripts are as follow

Convenience Scripts

To keep simple, I also added a bunch of shell scripts that wrap all the command lines (so I don't need to remember all the parameters).   The shell scripts allow you to create a cluster, get the status of a cluster, and delete a cluster using a single command line:



create_stack.sh, get_status.sh, delete_stack.sh




(respectively)




Putting it all together...


To summarize, the create_stack.sh script uses aws cloudformation to create the cluster.

Then, you can watch the status of the cluster with, get_status.sh.

Once formed, the configure_cassandra_cluster.py script installs, configures and starts Cassandra.



After that, you should be able to run ecl using Casssandra!



Feel free to take these scripts, and apply them to other things.  And kudos to Tim Humphries for the cloud formation template.


The Cloud Zone is brought to you in partnership with Internap. Read Bare-Metal Cloud 101 to learn about bare-metal cloud and how it has emerged as a way to complement virtualized services.

Topics:
cloud ,cassandra ,hpcc

Published at DZone with permission of Brian O' Neill, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}