Provisioning ElasticSearch Clusters on AWS With Infrastructor

DZone 's Guide to

Provisioning ElasticSearch Clusters on AWS With Infrastructor

Infrastructor is an open source server provisioning and automation framework. This demo shows how to use it to provision ElasticSearch clusters on AWS.

· Cloud Zone ·
Free Resource

Infrastructor is a server provisioning and automation framework written in Groovy. This article describes how to automate the provisioning of a small ElasticSearch cluster on AWS with Infrastructor. 

Step 1: Install the Required Dependencies

To run the example below, you need to install a couple of dependencies:

  1. Oracle Java Virtual Machine 8

  2. Infrastructor (the latest version is 0.1.3)

Infrastructor comes with a CLI, which we will use later to run a provisioning script. It can be downloaded from the project release page. To install the Infrastructor CLI, unpack the ZIP and add the bin directory to the PATH environment variable. Check the installation by launching the CLI:

infrastructor version

Step 2: Prepare AWS Instances

Before we start, make sure you have an access to your private AWS account and create three t2.micro or t2.small instances. These instances will be used to deploy an ElasticSearch cluster. In the example below, I use instances based on Ubuntu 16.04. You can also use more powerful instances and you can increase the instance count if you want to. 

The easiest way to launch three EC2 instances might be to use the AWS web console, but you can also take a look at Terraform or Ansible to do so. Infrastructor also provides basic facilities to launch and manage EC2 instances, but this feature is still in beta and isn't considered to be stable yet.

This is a set of basic requirements for the EC2 instances:

1. Instances have public IP addresses and accept SSH connections on port 22.

2. Instances are able to communicate with each over on TCP ports 9200 and 9300. You may also allow communication between your host and the instances on port 9200 so it will be possible to check cluster health status using a REST API without logging into one of the remote hosts by SSH.

3. In general, t2.micro instances should be good enough to run the example. However, I would recommend choosing t2.small if you can.

4. You also need to have a pair of AWS access keys (aws_access_key_id and aws_secret_access_key) with EC2 read permissions. Infrastructor will use them to retrieve instance information and build an inventory.

5. Give your instances the 'elasticsearch:true' tag so Infrastructor will update only them.

Step 3: Automate ElasticSearch Cluster Provisioning

Here is an Infrastructor provisioning script:

def AWS_ACCESS_KEY_ID = input message: 'AWS_ACCESS_KEY_ID: ', secret: true
def AWS_ACCESS_SECRET_KEY = input message: 'AWS_ACCESS_SECRET_KEY: ', secret: true

// Describe AWS Inventory:
// 1. How to connect to AWS to retrieve a list of instances: awsAccessKeyId, awsAccessSecretKey and awsRegion
// 2. Filter instances by tags: tags
// 3. How to connect to an instance: username and keyfile
awsInventory {
  awsAccessKeyId = AWS_ACCESS_KEY_ID
  awsAccessSecretKey = AWS_ACCESS_SECRET_KEY
  awsRegion = "eu-central-1"
  tags = [elasticsearch: true]
  username = "ubuntu"
  keyfile  = "path/to/your_private_keyfile"
}.provision {

     * Install docker first using the official repository
     * Do it on 3 nodes in parallel
    task name: "install docker", parallel: 3, actions: {
      shell sudo: true, command: """
            apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
            apt-add-repository 'deb https://apt.dockerproject.org/repo ubuntu-xenial main'
            apt-get update
            apt-get install -y docker-engine
            usermod -aG docker ubuntu
            sysctl -w vm.max_map_count=262144

     * Run ElasticSearch containers on each host
     * Here 'node' is a variable which represents currently provisioning EC2 instance
     * 'nodes' variable is a set of all nodes in the inventory
    task name: "run elasticsearch nodes", actions: {
      shell """
        docker rm -f \$(docker ps -aq) || true
        docker run -d -p 9200:9200 -p 9300:9300 \
        -e "node.name=${node.name}" \
        -e "node.master=true" \
        -e "network.publish_host=${node.privateIp}" \
        -e "discovery.zen.ping.unicast.hosts=${nodes*.privateIp.join(',')}" \
        -e "ES_JAVA_OPTS=-Xmx512m -Xms512m" \

Save the script as provisioning.groovy. To run the script with the Infrastructor CLI, type:

infrastructor run -f provisioning.groovy

After some time, you should see a message that the execution has completed successfully. Then check the result by calling the  _cluster/health endpoint of ElasticSearch:

curl http://elastic:changeme@ES_NODE_PUBLIC_IP_HERE:9200/_cluster/health?pretty

You should see something like this:

    "cluster_name" : "docker-cluster",
    "status" : "green",
    "timed_out" : false,
    "number_of_nodes" : 3,
    "number_of_data_nodes" : 3,
    "active_primary_shards" : 2,
    "active_shards" : 4,
    "relocating_shards" : 0,
    "initializing_shards" : 0,
    "unassigned_shards" : 0,
    "delayed_unassigned_shards" : 0,
    "number_of_pending_tasks" : 0,
    "number_of_in_flight_fetch" : 0,
    "task_max_waiting_in_queue_millis" : 0,
    "active_shards_percent_as_number" : 100.0

This was a small demonstration of how to manage configurations and provision AWS nodes with Infrastructor. I hope you find it simple and neat! By the way, Infrastructor is an open source project and it is looking forward to your contributions, including pull requests, feature requests, and bug reports. Thank you for your interest in Infrastructor!


aws, cloud, configuration management, elasticsearch, infrastructor, provisioning, tutorial

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}