Setting Up a ScyllaDB Cluster on AWS Using Terraform
This article will teach you how to set up, deploy, manage, and scale ScyllaDB clusters on AWS using Terraform with this comprehensive step-by-step guide.
Join the DZone community and get the full member experience.
Join For FreeIn this article, I present an example of a simple and quick installation of ScyllaDB in the AWS cloud using Terraform.
Initially, I intended to create a ScyllaDB AMI image using HashiCorp Packer. However, I later discovered that official images are available, allowing ScyllaDB to be easily configured during instance initialization via user data.
In fact, user data can define all parameters supported in scylla.yaml
. Additional options and examples can be found in scylla-machine-image GitHub repository.
What else should you know? For ScyllaDB to automatically configure and start, supported instance types must be used. A list of such instance types can be found here: ScyllaDB System Requirements for AWS. In our example, we’ll use the i4i.large
type as it is the cheapest among supported types.
Assumptions
- A single seed node is sufficient for the setup.
- Hosts are publicly accessible with restricted access from a specific IP address (a static public IP is required).
Terraform Configuration Example
Following best practices, the Terraform code is divided into multiple files in a single directory.
Variables File (variables.tf)
variable "scylladb_version" {
type = string
default = "6.2.1"
description = "The version of the ScyllaDB to install."
}
variable "your_public_network" {
type = string
default = "0.0.0.0/0"
description = "Your public static IP address or your provider network."
}
variable "instance_type" {
type = string
default = "i4i.large"
description = "The AWS instance type."
}
variable "number_of_regular_hosts" {
type = number
default = 2
description = "The number of the regular (not seed) hosts in a cluster."
}
variable "ssh_key_name" {
type = string
default = "my_ssh_key"
description = "The name of your public SSH key uploaded to AWS."
}
This file contains the definition of variables used in the code contained in main.tf
. We’ll discuss them later.
Main Configuration File (main.tf)
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
# Configure the AWS Provider
provider "aws" {
region = "eu-west-1"
}
variable "scylladb_version" {
type = string
default = "6.2.1"
description = "The version of the ScyllaDB to install."
}
variable "your_public_network" {
type = string
default = "0.0.0.0/0"
description = "Your public static IP address or your provider network."
}
variable "instance_type" {
type = string
default = "i4i.large"
description = "The AWS instance type."
}
variable "number_of_regular_hosts" {
type = number
default = 2
description = "The number of the regular (not seed) hosts in a cluster."
}
variable "ssh_key_name" {
type = string
default = "my_ssh_key"
description = "The name of your public SSH key uploaded to AWS."
}
data "aws_ami" "scylladb_ami" {
filter {
name = "name"
values = ["ScyllaDB ${var.scylladb_version}"]
}
}
resource "aws_security_group" "scylladb_all" {
name = "scylladb_all"
description = "Will allow all inbound traffic from your public IP"
tags = {
Name = "ScyllaDB"
}
}
resource "aws_vpc_security_group_ingress_rule" "allow_all_inbound_traffic_ipv4" {
security_group_id = aws_security_group.scylladb_all.id
cidr_ipv4 = var.your_public_network
ip_protocol = "-1" # semantically equivalent to all ports
}
resource "aws_vpc_security_group_ingress_rule" "allow_all_internal_traffic_ipv4" {
security_group_id = aws_security_group.scylladb_all.id
referenced_security_group_id = aws_security_group.scylladb_all.id
ip_protocol = "-1" # semantically equivalent to all ports
}
resource "aws_vpc_security_group_egress_rule" "allow_all_traffic_ipv4" {
security_group_id = aws_security_group.scylladb_all.id
cidr_ipv4 = "0.0.0.0/0"
ip_protocol = "-1" # semantically equivalent to all ports
}
resource "aws_instance" "scylladb_seed" {
ami = data.aws_ami.scylladb_ami.id
instance_type = var.instance_type
vpc_security_group_ids = [aws_security_group.scylladb_all.id]
key_name = var.ssh_key_name
user_data = <<EOF
scylla_yaml:
cluster_name: test-cluster
experimental: true
start_scylla_on_first_boot: true
EOF
tags = {
Name = "ScyllaDB seed"
}
}
resource "aws_instance" "scylladb_host" {
ami = data.aws_ami.scylladb_ami.id
instance_type = var.instance_type
vpc_security_group_ids = [aws_security_group.scylladb_all.id]
key_name = var.ssh_key_name
user_data = <<EOF
scylla_yaml:
cluster_name: test-cluster
experimental: true
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: ${aws_instance.scylladb_seed.private_ip}
start_scylla_on_first_boot: true
EOF
tags = {
Name = "ScyllaDB host"
}
count = var.number_of_regular_hosts
}
The main.tf
file describes the infrastructure resources to be created.
File Describing Outputs (outputs.tf)
output "scylladb_seed_public_ip" {
value = aws_instance.scylladb_seed.public_ip
description = "Public IP address of the ScyllaDB seed host."
}
output "scylladb_host_public_ip" {
value = [aws_instance.scylladb_host.*.public_ip]
description = "Public IP addresses of ScyllaDB regular hosts."
}
This file specifies the data to be output at the end. In our case, we want to know the IP addresses of the hosts so we can connect to them.
You can also find this code on GitHub: ScyllaDB Terraform Example.
How to Use This Terraform Configuration File
First, you need to install Terraform and AWS CLI.
Terraform installation differs across operating systems. Details can be found in the official documentation: Terraform Installation Guide.
AWS CLI is a Python module that can be installed via pip in a similar way across all operating systems where Python is available. Detailed instructions are available in the official documentation: AWS CLI on PyPI.
The next step is to set up security credentials for AWS CLI. Security credentials can be created using the IAM service in AWS. We assume that you already have them.
To enable AWS CLI and, consequently, the AWS provider for Terraform to use your credentials, you need to configure them using the following command:
aws configure
There are other ways to pass credentials to Terraform. More details can be found here: AWS Provider Authentication.
Understanding the Variables
Here’s a breakdown of all the variables:
- scylladb_version: The version of ScyllaDB, used in the image name to search for the AMI.
- your_public_network: The external IP address from which access to hosts will be allowed. It should be in CIDR format (e.g., /32 for a single address).
- instance_type: The type of AWS instance. You must use one of the recommended types mentioned above.
- number_of_regular_hosts: The number of hosts in the cluster, excluding the seed host.
- ssh_key_name: The name of the preloaded public SSH key that will be added to the hosts.
Although variables can be overridden directly in the variables.tf
file, it’s better to use a separate file for this purpose. This can be any file with a .tfvars
extension, such as terraform.tfvars
, located in the same directory as the Terraform configuration file.
In such a file, variables are written in the format <NAME> = <VALUE>. For example:
ssh_key_name = "KEYNAME"
How To Apply a Terraform Configuration
To create the cluster, navigate to the directory containing the code and run the following commands:
Initialize the AWS Provider:
terraform init
Example output:
Initializing the backend...
Initializing provider plugins...
- Finding hashicorp/aws versions matching "~> 5.0"...
- Installing hashicorp/aws v5.82.2...
- Installed hashicorp/aws v5.82.2 (signed by HashiCorp)
Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.
Terraform has been successfully initialized!
Apply the Configuration:
terraform apply
The command output will show that some parameters were taken from the configuration provided to Terraform, while others will be added automatically after applying the changes. Confirm the application by typing yes.
After Terraform completes its work, it will output the public IP addresses of the hosts, which you can use to connect to ScyllaDB.
Verifying Cluster Deployment
To verify that the ScyllaDB cluster was successfully deployed, connect to it via SSH using the following command:
ssh scyllaadm@<ip-address>
Once connected, you’ll immediately see the list of hosts in the cluster. Alternatively, you can run the following command:
nodetool status
Example output:
Datacenter: eu-west
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 172.31.39.205 489.02 KB 256 ? ac814131-bac5-488b-b7f8-b7201a8dbb23 1b
UN 172.31.42.145 466.77 KB 256 ? 0bd8a16f-26d3-4665-878c-74b992b91a70 1b
UN 172.31.46.42 526.42 KB 256 ? 3eb8966e-b42b-48c3-9938-7f24b1a6b097 1b
All hosts must have UN (Up Normal) in the first column.
Adding Hosts to the Cluster
ScyllaDB allows you to easily add hosts to clusters (but not remove them). Terraform, in turn, saves the state of the previous run and remembers, for example, the IP address of the seed host. Therefore, you can simply increase the number of hosts in the variable and run Terraform again. The new host will automatically join the cluster.
Add the following line to your variables file:
number_of_regular_hosts = 3
In this example, it will add one more host to the cluster, but you can set the variable to any number greater than 2.
Run terraform apply
again. Then, log in to the seed host and verify that the list of hosts has increased.
Managing Multiple Clusters
You can deploy multiple clusters using a single Terraform configuration by using workspaces.
Create a new workspace:
terraform workspace new cluster_2
cluster_2
is just an example name for a workspace. It can be anything.
Deploy the New Cluster:
terraform apply
The original cluster will remain in the workspace named default.
List workspaces:
terraform workspace list
Switch between workspaces:
terraform workspace select default
Delete a workspace:
terraform workspace delete cluster_2
Destroying the Cluster
To delete a ScyllaDB cluster and all associated entities, use the following command in the desired workspace:
terraform destroy
This will clean up all the resources created by Terraform for that cluster.
Conclusion
With this guide, you can confidently set up, manage, and expand ScyllaDB clusters on AWS using Terraform. The step-by-step instructions provided ensure a seamless deployment experience, allowing you to focus on your application’s performance and scalability.
Additionally, the flexibility of Terraform empowers you to easily adapt and scale your cluster as needed, whether it’s adding new hosts or managing multiple clusters with workspaces. For further details and advanced configurations, consult the official documentation for Terraform and ScyllaDB, which offer a wealth of resources to help you maximize the potential of your infrastructure.
Opinions expressed by DZone contributors are their own.
Comments