Deploying Elasticsearch 6.x on Azure With Terraform
Deploying Elasticsearch 6.x on Azure With Terraform
Today, we will be looking at deploying a full Elasticsearch cluster using best practices end-to-end on Microsoft Azure.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Terraform is my go-to tool for repeatable and easy infrastructure deployments. I've previously shared how I deploy Elasticsearch on AWS with Terraform and Packer, and since posting that, I used it to deploy many clusters and it also got picked up by quite a few others.
Our offerings at BigData Boutique are cloud-agnostic and as such, we also help projects deployed on clouds other clouds. Today, we will be looking at deploying a full Elasticsearch cluster using best practices end-to-end on Microsoft Azure.
You can find all relevant code and documentation here. This entire Terraforming project supports deploying both Elasticsearch 5.x and 6.x clusters.
Feel free to share your experience, report issues, and request features here.
Creating Immutable Images With Packer
To enable quickly launching machines on the cloud without waiting for lengthy installs on provisioning, and also to avoid snowflake servers, I opted for generating images of the servers we deploy and then just provisioning machines with those images loaded into them. This is a general practice I use and here, it is really easy to see how it makes a difference.
" Packer is a tool for creating machine and container images for multiple platforms from a single source configuration."
In other words, you can easily define the steps to execute on a base image and then run it everywhere to create images that you can later deploy.
In my solution, I created two images. One is an image for an Elasticsearch node that is installed on the latest Ubuntu; the second is an image with Kibana, Grafana, and Cerebro installed that is based on the first image and will be later used as an external and internal gateway to the cluster.
More details and instructions for running this can be found in the README. You need to create those images in order to proceed to the next step.
Deploying an Elasticsearch Cluster With Terraform
Terraform is great at describing complex infrastructure easily and in a repeatable way. I find Terraform so much easier to use for deploying and amending infrastructure — especially on Azure, which, for many people, tends to be more UI-oriented.
Once you have created the machine images with Packer, all is left for you to do is editing some configurations (i.e. machine sizes, number of nodes, Azure location, SSH keys to use) and you are set to go.
terraform plan and then
terraform apply will create the cluster for you using scale-sets and load balancing for the client nodes and the necessary network interfaces. Everything will be set up using best practices, although your mileage may vary and you might want to fork my work and adapt it to your use case.
The recommended configuration is to have exactly three master nodes, at least two data nodes, and at least one client node (and it's easy to add more to ensure 100% uptime). This is supported out-of-the-box. We also support a single-node mode, mostly for experimentation, but it also might be usable for very small deployments.
Elastic's X-Pack is deployed on the cluster out of the box with monitoring enabled but security disabled — you should enable and set up X-Pack Security for any production deployment.
Full details and instructions are here.
Client Nodes With Kibana, Grafana, and Cerebro
Once deployed, the cluster is fully configured and is accessible via the deployed client nodes. The client nodes also expose Kibana instances and a Cerebro UI on top of the cluster, so everything is fully visible and ready for use. There is also Grafana installed for those who prefer using Grafana dashboards on top of Elasticsearch.
Those client nodes are also the ones your apps need to talk to (internally, of course). They are password-protected (the password is automatically generated and can be retrieved using
terraform output), and you might want to remove that completely and rely on your vnet and private IPs, removing public IP access completely. I discussed security concerns in this article before.
Note: The first time Kibana is initialized, it takes about ten minutes to become available. It does some magic compressions and stuff.
Elastic Discovery on Azure
Unfortunately, the story of cluster discovery on Azure is quite bad. There is an Azure "Classic" discovery plugin that has been deprecated since circa 5.0 and Elastic are yet to release a properly working discovery plugin. There is a PR for an Azure RM discovery plugin, which is open for over a year now without any real progress if you want to track it.
A discovery plugin on a public cloud is important because it takes a lot of complexity off your hands and manages the initial cluster nodes discovery using the available cloud APIs.
Having none available, I defaulted to using vnet and naming conventions. Another viable option is using file-based discovery, which is a file describing your cluster you can upload to the images and use as a seed.
The Azure repository plugin is installed on the cluster and ready to be used for index snapshots and (should you ever need) a restore. Official documentation is available here.
Published at DZone with permission of Itamar Syn-hershko , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.