Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Setting Up a Cassandra Cluster With Vagrant

DZone's Guide to

Setting Up a Cassandra Cluster With Vagrant

Vagrant is easier to work with systemd services than Docker, so when it comes to provisioning and testing Cassandra clusters, Vagrant is the way to go.

· Cloud Zone
Free Resource

Are you joining the containers revolution? Start leveraging container management using Platform9's ultimate guide to Kubernetes deployment.

This is part 1 of a Cassandra cluster tutorial series. Part 1 uses Vagrant to setup a local Cassandra Cluster and installs Cassandra on boxes.  Later parts of this Cassandra Cluster tutorial series will setup Ansible/ssh for DevOps/DBA tasks, use Packer to create EC2 AMIs and instances, and setup a Cassandra cluster in EC2. 

The cassandra-image (on GitHub) project creates CentOS 7/Cassandra images for Docker, VirtualBox/Vagrant and AWS/EC2 using best practices for Cassandra Linux/OS setup and utilities to auto-configure Cassandra based on the ergonmics of the environment.

It is nice to use Vagrant and/or Docker for local development so we support both. At this time, it is hard to develop systemd services using Docker, so we use Vagrant. Since we do a lot of systemd development, we like to use Vagrant. Our real target, for the most part, is EC2, AWS, VPCs, etc.

The cassandra-image project packages systemd utilities, which run as systemd services to monitor:

The cassandra-image project uses the Cassandra cloud project to configure Cassandra running in instances to aid in setting up the cluster.

With this in mind, let’s setup Vagrant to launch a Cassandra cluster locally.

We are going to setup three nodes using Vagrant as follows that use our provision scripts to install Cassandra and utilities:

  • 192.168.50.4 cassandra node0

  • 192.168.50.5 cassandra node1

  • 192.168.50.5 cassandra node2

Cassandra Cluster: Set Up Network of Boxes Using Vagrant


Vagrant.configure("2") do |config|


  # Use CentOS 7
  config.vm.box = "centos/7"

  # Setup 4 cpus and 3096 MB of memory for each instance
  config.vm.provider "virtualbox" do |vb|
       vb.memory = "3096"
       vb.cpus = 4
  end

  # Run the provision install scripts
  config.vm.provision "shell", inline: <<-SHELL
        sudo /vagrant/scripts/000-vagrant-provision.sh
  SHELL


  config.vm.define "node0" do |node0|
    ...
    # Node 0 is 192.168.50.4
    node0.vm.network "private_network", ip: "192.168.50.4"
   ...
  end

  config.vm.define "node1" do |node1|
    ...
    # Node 1 is 192.168.50.5
    node1.vm.network "private_network", ip: "192.168.50.5"
    ...
  end

  config.vm.define "node2" do |node2|
    ...
    # Node 2 is 192.168.50.6
    node2.vm.network "private_network", ip: "192.168.50.6"
  end


Notice that we set up three boxes on a private network from the same Vagrant file.

In this example, we will use these three servers as seed nodes. Seed nodes are Cassandra nodes that are first contacted by other Cassandra nodes that join the Cassandra cluster. It is a good idea to have two or three of seeds node as having one would be a SPOF (single point of failure).

In this example, we will use the utility cassandra-cloud to configure the seed nodes. We will also use cassandra-cloud to tell Cassandra which address to listen on for clustering (storage network), and which address to listen on for client connections.

The cassandra-cloud (open source utility written in Go lang) is a utility that helps you configure Cassandra and install Cassandra for cloud environments based on server ergonomics (num of data centers, number of disks, number of Cores, type of disk). This utility works well in Docker, Heroku, Mesos/Marathon, Kubernetes, EC2, and VirtualBox environments (and similar environments). For example, it could be kicked off as a USER_DATA script in Amazon EC2 (AWS EC2), and if you change the size of the EC2 instance it can adjust the Cassandra setting accordingly. CassandraCloud usually runs once when an instance is first launched and then never again (or if you redeploy on a larger EC2 instance).

Using Cassandra-Cloud From Vagrant for Cassandra Cluster nodes

# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure("2") do |config|


...

  config.vm.define "node0" do |node0|
...
    node0.vm.network "private_network", ip: "192.168.50.4"

    ### Use Cassandra cloud to configure Cassandra before launching it.
    ### Set the cluster name to test, set the client-address and the cluster-address.
    ### Also setup the Cassandra seed nodes.
    node0.vm.provision "shell", inline: <<-SHELL
                sudo /opt/cassandra/bin/cassandra-cloud -cluster-name test \
                -client-address 192.168.50.4 \
                -cluster-address  192.168.50.4 \
                -cluster-seeds 192.168.50.4,192.168.50.5,192.168.50.6


                /opt/cassandra/bin/cassandra -R
    SHELL
  end

  config.vm.define "node1" do |node1|
...
    node1.vm.provision "shell", inline: <<-SHELL
                sudo /opt/cassandra/bin/cassandra-cloud -cluster-name test \
                -client-address 192.168.50.5 \
                -cluster-address  192.168.50.5 \
                -cluster-seeds 192.168.50.4,192.168.50.5,192.168.50.6

                /opt/cassandra/bin/cassandra -R
    SHELL
  end

  config.vm.define "node2" do |node2|
...
    node2.vm.provision "shell", inline: <<-SHELL
                sudo /opt/cassandra/bin/cassandra-cloud -cluster-name test  \
                -client-address 192.168.50.6 \
                -cluster-address  192.168.50.6 \
                -cluster-seeds 192.168.50.4,192.168.50.5,192.168.50.6


                /opt/cassandra/bin/cassandra -R
    SHELL
  end

...

end


Above you can see that we use cassandra-cloud is invoked as the provision shell for each indivdual box; it installs and configures Cassandra. It sets the name of the cassandra cluster (test) (-cluster-name commnad line argument), which address to bind the cassandra client tranport to (-client-address command line argument), which address to bid the cluster transport (storage transport) to, and a list of seed nodes (-cluster-seeds).

We could start ten more servers and we would not have to change the seeds nodes. New servers would learn the topology of the cluster from one or more of the Cassandra seeds nodes specified with -cluster-seeds.

The utility cassandra-cloud can read setting from environment variables so that it can work well in Mesos, Docker, Heroku, Kubernetes, (or any 12 factor DevOps environment) etc. In later tutorials in this Cassanrda tutorial series, we will use cassandra-cloud with AWS/EC2 when we cover AWS Cassandra. The cassandra-cloud can also read properties from a config file. It can also read properties from the command line. Environment variables override config file settings, and command line args override Environment variables. The cassandra-image creates a cassandra-cloud config file and config templates that can be modified. The cassandra-cloud utility can setup memory, threads, number of workers, etc. for Cassandra. You can set values explicitly or they can be set by looking that the ergonomics of the server.

Ok. Let’s test our cassandra cluster out. Here we will use vagrant to start up our cassandra cluster. Then we will log into one of the nodes (node0), and run the Cassandra nodetool command to see which servers are connected to the cluster.

Testing Our Cassandra Cluster Setup with nodetool

$ vagrant up
$ vagrant ssh node0
[vagrant@localhost ~]$ ps -ef | grep cassandra
root     12414     1  2 19:16 ?        00:00:26 java -Xloggc:/opt/cassandra/bin/../logs/gc.log
...

$ /opt/cassandra/bin/nodetool describecluster
Cluster Information:
        Name: test
        Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
        Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
        Schema versions:
                86afa796-d883-3932-aa73-6b017cef0d19: [192.168.50.4, 192.168.50.5, 192.168.50.6]

We can see that we have a cluster of three servers that make up the Cassandra cluster, namely, 192.16850.4, 192.16850.5, 192.16850.6. You can see the full Vagrant file on GitHub.

More to Come from this Cassandra Cluster tutorial

Check back with us at the Cloudurable blog to find out more about cassandra-image and cassandra-cloud. We have a follow-up article where we setup SSL encryption for Cassandra. We setup SSL for the client transport and the cluster transport. Then we setup SSL for cqlsh so you can connect to your remote instance securely.

About Cloudurable Cassandra Support

Cloudurable provides cassandra supportcassandra consulting, cassandra training, as well as Cassandra examples like AWS CloudFormation templates, Packer, ansible to do common cassandra DBA and cassandra DevOps tasks. We also provide monitoring tools and images (AMI/Docker) to support Cassandra in production running in EC2. Our advanced Cassandra courses teaches how one could develop, support and deploy Cassandra to production in AWS EC2 and is geared towards DevOps, architects and DBAs.

Using Containers? Read our Kubernetes Comparison eBook to learn the positives and negatives of Kubernetes, Mesos, Docker Swarm and EC2 Container Services.

Topics:
cloud ,vagrant ,tutorial ,cassandra cluster ,provisioning ,cassandra tutorial ,devops

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}