Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Setting Up a Cassandra Cluster Through Ansible

DZone's Guide to

Setting Up a Cassandra Cluster Through Ansible

Creating a cluster manually is a tedious task. Ansible can automate the task and handle the configuration management for us.

· Database Zone ·
Free Resource

Running out of memory? Learn how Redis Enterprise enables large dataset analysis with the highest throughput and lowest latency while reducing costs over 75%! 

In this post, we will use Ansible to and set up an Apache Cassandra database cluster. We will use AWS EC2 instances as the nodes for the cluster. Creating a cluster manually is a tedious task. We have to manually configure each node and each node must be correctly configured before starting the cluster. With Ansible, we can automate the task and let Ansible handle the configuration management for us.

First of all, create a directory for storing the files and folders related to the playbook. It helps in keeping our work organized and saves us from the confusion which may arise due to relative and absolute path references in passing the variables in our playbook. Following is the structure of my directory that contains the playbook and the roles:

Steps to Follow While Using AWS

  • Create two or three instances of AWS EC2 that will serve as the nodes in a cluster.
  • Create a security group to allow all connections and add the nodes to that security groups.
  • Create an inventory that has the IP addresses of the nodes.
  • Add the inventory file into the configuration file of the Ansible, e.g. ansible.cfg.

Now, we create a playbook to set up the nodes for us. Following is the playbook:

 ---
- hosts: aws-webservers
  gather_facts: yes
  remote_user: ec2-user
  become: yes
  vars:
    cluster_name: Test_Cluster
    seeds: 13.xxx.xxx.xxx
  roles:
    - installation

Then, we define the roles we have created. In the role, installation, the following tasks have been achieved:

  • Installing a JRE.
  • Adding and unpacking the Apache Cassandra tar.
  • Replacing the cassandra.yaml having default configurations with cassandra.yaml with our own configurations, whose details are given below.
  • Ensuring Cassandra is started.

The following is the main.yml file from the roles:

---
- name: Copt Java RPM file
  copy:
     src: jdk-8_linux-x64_bin.rpm
     dest: /tmp/jdk-8_linux-x64_bin.rpm

- name: install JDK via RPM file with yum
  yum:
    name: /tmp/jdk-8_linux-x64_bin.rpm
    state: present
- name: Copy Cassandra tar
  copy:
     src: apache-cassandra-3.11.2-bin.tar.gz
     dest: /tmp/apache-cassandra-3.11.2-bin.tar.gz

- name: Extract Cassandra
  command: tar -xvf /tmp/apache-cassandra-3.11.2-bin.tar.gz

- name: override cassandra.yaml file
  template: src=cassandra.yaml dest=apache-cassandra-3.11.2/conf/

- name: Run Cassandra from bin folder
  command: ./cassandra -fR
  args:
    chdir: /home/ec2-user/apache-cassandra-3.11.2/bin/

The cassandra.yaml contains most of the Cassandra configuration such as ports used, file locations, and seed node IP addresses. We need to edit this file on each node, so I have created a template for the file. The template cassandra.yaml uses the following variables:

  • cluster_name: '{{ cluster_name }}' can be anything chosen by you to describe the name of the cluster.
  • seeds: "{{ seeds }}" are the IP addresses of the clusters seed servers. Seed nodes are used as known places where cluster information (such as a list of nodes in the cluster) can be obtained.
  • listen_address: {{ aws-webservers }} is the IP address that Cassandra will listen on for internal (Cassandra to Cassandra) communication will occur.
  • rpc_address: {{ aws-webservers }} is the IP address that Cassandra will listen on for client-based communication.

Now, we can run the playbook and our cluster will be up and running. We can add more nodes to the list by simply adding them to the host list and Ansible will ensure that Cassandra is installed and the nodes are connected to the cluster and started.

Points to Remember

  • The host IP should be the public IP of a node.

  • Put the Java rpm packages and Cassandra tar file in the files directory of the role created.

  • Use Java 8, as Cassandra is not supported on higher versions of Java. It will throw the following error:

[0.000s][warning][gc] -Xloggc is deprecated. Will use -Xlog:gc:/home/mmatak/monero/apache-cassandra-3.11.1/logs/gc.log instead.
intx ThreadPriorityPolicy=42 is outside the allowed range [ 0 ... 1 ]
Improperly specified VM option 'ThreadPriorityPolicy=42'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

Thus, Ansible makes it very easy to install distributed systems like Cassandra. The thought of doing it manually is very disheartening. The full source code including templates and directory structure are here.

This article was first published on the Knoldus blog.

Running out of memory? Never run out of memory with Redis Enterprise databaseStart your free trial today.

Topics:
cassandra ,ansible ,database ,tutorial ,aws ,configuration management

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}