DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
What's in store for DevOps in 2023? Hear from the experts in our "DZone 2023 Preview: DevOps Edition" on Fri, Jan 27!
Save your seat
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Setting Up RecoverX: Cloud-Native Data Protection

Setting Up RecoverX: Cloud-Native Data Protection

This step-by-step guide walks you through utilizing Datos IO's RecoverX to back up and protect your data — in this case, a Cassandra Database on Google Cloud Platform.

Jeannie Liou user avatar by
Jeannie Liou
·
Oct. 12, 16 · Tutorial
Like (3)
Save
Tweet
Share
2.14K Views

Join the DZone community and get the full member experience.

Join For Free

This tutorial shows you how to set up Datos IO RecoverX — cloud-native data protection software — on Google Cloud Platform. Follow this tutorial to deploy and configure Datos IO RecoverX to protect your Cassandra (Apache or DataStax) database cluster. This tutorial assumes that your Cassandra database is already deployed and is fully operational.

You generally deploy RecoverX in the same project as the Cassandra database that you need to protect. If you are deploying RecoverX in a different project, you need to provide SSH connections from all RecoverX nodes to the compute nodes on which Cassandra is deployed. RecoverX connects to the Cassandra nodes through SSH connections, and uses standard Cassandra APIs to take snapshots, and stream the snapshots in parallel to Google Cloud Storage. After the data is copied, RecoverX processes the data to create a single, golden copy of the database that is cluster consistent and has no replicas. The following diagram shows a representative deployment:

Image title

Objectives

  • Provision infrastructure to deploy RecoverX software.
  • Configure the Cassandra database.
  • Configure the RecoverX compute nodes.
  • Deploy RecoverX and connect from a remote location.

Costs

  • Datos IO RecoverX is scale-out software that is deployed in a clustered configuration on three compute engine instances, type n1-standard-8.
  • In addition, you need to provision a cloud storage bucket for storing the backup data. The capacity of cloud storage required will depend on your database size, change rate, retention time, and other factors.
  • Use the Pricing Calculator to generate a cost estimate based on your projected usage.
  • In addition to the cloud platform infrastructure costs, the RecoverX software is licensed directly through Datos IO, based on the physical size of the database that needs to be protected, in terabytes.
  • Contact info@datos.io for any pricing related questions.

Before You Begin

  1. Select or create a cloud platform console project.
  2. Enable billing for your project.

Creating a Compute Engine Instance and Cloud Storage Bucket

The compute nodes for Cassandra and the compute nodes for RecoverX software need to have R+W permissions to the cloud storage bucket that is used as secondary storage. Follow the steps below to create RecoverX compute instances and provision secondary storage with correct permissions:

  1. Configure an IAM role or ACL for a service account to allow R+W access to the cloud storage bucket, as listed in the Access Control Options. You can use the same service account that was used to create Compute Engine instances for Cassandra nodes. Use the role assignment: Editor.
  2. Create three Compute Engine instances of the type n1-standard-8 using this service account.
  3. Select CentOS 6.
  4. In the Firewall section, select Allow HTTPS traffic.
  5. Create an SSD disk (blank disk) with a capacity of at least 140 GB for each Compute Engine instance.
  6. Select an appropriate network.
  7. Run the following command to format the filesystem and mount the volume. Replace [VOLUME_NAME] and[RECOVERX_NODE_NAME] with the appropriate values:
    gcloud compute ssh [RECOVERX_NODE_NAME] 'sudo mkfs -t ext4 [VOLUME_NAME]; sudo mkdir /home sudo mount [VOLUME_NAME] /home'
  8. Create a cloud storage bucket using this service account.

Configuring the Cassandra Cluster

This is the database cluster that you want to protect using RecoverX.

Creating a Datos IO User on Wach Cassandra Node

  1. Create a Datos IO user account, such as datos_db_user, on each Cassandra node. This account is used for running commands to extract data from the source cluster. This user should have the same group ID (GID) as the Cassandra user. The following command requires that Cassandra user is a part of the Cassandra group, and it addsdatos_db_user to the same group.
    gcloud compute ssh [CASSANDRA_NODE_INSTANCE_NAME] 'sudo useradd -g cassandra -m datos_db_user'
  2. Configure authentication for the user you created by using one of the methods such as:
    • Username and password.
    • Username and SSH key with passphrase.
    • Username and SSH access key.
    You will need this information while adding the data source to RecoverX environment.
  3. Give datos_db_user write permission to its home directory /home/datos_db_user on all Cassandra nodes. Replace [CASSANDRA_NODE_INSTANCE_NAME] with the name of your instance:
    gcloud compute ssh [CASSANDRA_NODE_INSTANCE_NAME] 'sudo chmod -R u+w /home/datos_db_user'
  4. Give read and execute permissions to the $CASSANDRA data directory and its parent directory on all Cassandra nodes. Replace [CASSANDRA_NODE_INSTANCE_NAME] with the name of your instance:
    gcloud compute ssh [CASSANDRA_NODE_INSTANCE_NAME] 'sudo chmod -R g+rx /var/lib/cassandra; sudo chmod -R g+rx /var/lib/cassandra/data'

Configuring Maximum SSH Sessions

For each node, edit the file /etc/ssh/sshd_config to set sshd parameters MaxSessions to 500 and MaxStartupsto "500:1:500". You can verify the values of these parameters by using the commands below:

/usr/sbin/sshd -T | grep -i maxs
maxsessions 500
maxstartups 500:1:500

Setting up the Datos IO RecoverX cluster

Setting Up Network Ports

Open the following ports:

NetworkProtocol:PortPurpose
From External - To RecoverXTCP:9090Access Datos IO UI/API.
RecoverX nodes (network)TCP:2888Internal distributed software communication.
TCP:3888Internal distributed software communication.
TCP:2181Internal distributed software communication.
TCP:15039RecoverX Metadata database.
TCP:5672Internal messaging/communications (RabbitMQ).
SSH:22For RecoverX nodes to communicate with each other.
From RecoverX - To Cassandra nodesTCP:9042Cassandra driver port.
TCP:7199Cassandra JMX port.
SSH:22For RecoverX to communicate with Cassandra database nodes.
Cassandra nodes (network)TCP:7000Cassandra storage port.
TCP:9160Cassandra RPC port.

Creating a Datos IO User on Each RecoverX Node

Create a Datos IO user account, such as datos_user, on each RecoverX node. This user should have the same group ID (GID) as the datos_db_user. For example, if the datos_db_user has GID 1001, datos_user must have GID 1001.

  1. To get the GID, run:
    id datos_db_user
  2. Run the following command using the GID you retrieved:
    sudo groupadd -g [GID] cassandra
  3. Add the user:
    sudo useradd -g cassandra -m datos_user -d /home/datos_user

This user should have:

  • The home directory on the non-root volume that was previously created.
  • Passwordless SSH access to each RecoverX node in the cluster, including itself.

Providing sudo Privileges

The datos_user on the RecoverX nodes must have sudo privileges for the following commands:

/bin/cp
/sbin/chkconfig

To add this privilege:

  1. Sign in as the root user.
  2. Use the visudo command to edit the configuration file for sudo access.
  3. Append the following line to the file:
    datos_user ALL=NOPASSWD: /sbin/chkconfig, /bin/cp

Configuring RecoverX Nodes

Make the following changes in the limits.conf file of all RecoverX nodes.

  1. Edit the nproc and nofiles parameters in /etc/security/limits.conf to match the following:
    • hard nproc unlimited
    • soft nproc unlimited
    • hard nofile 64000
    • soft nofile 64000
  2. Edit the nproc parameter/etc/security/limits.d/90-nproc.conf to match the following:
    • hard nproc unlimited
    • soft nproc unlimited
  3. Verify the changes above by running the following command:
    ulimit -a
  4. Make sure that the /tmp directory has at least 2 GB empty space on each RecoverX node.

Verifying Host Name Entry on RecoverX Nodes

Ensure that the short name and FQDN of the RecoverX node is included in its /etc/hosts file. For example, a RecoverX node with the hostname datosserver.dom.local and an IP address of 192.168.2.4 would have a hosts file listing similar to the following:

cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.2.4 datosserver datosserver.dom.local
::1 localhost6.localdomain6 localhost6

Installing RecoverX Software

Follow the steps below to install RecoverX software. Be sure to use the datos_user name that you created earlier.

  1. Copy the RecoverX compressed tarball to one of the compute nodes:
    gcloud compute copy-files datos_[VERSION].tar.gz <recoverx_node_name>:~gcloud compute ssh 'sudo mv datos_[VERSION].tar.gz /home/datos_user; sudo chown datos_user /home/datos_user/datos_[VERSION].tar.gz'
  2. Uncompress the tarball on the target node. A top-level directory called datos_[VERSION] should appear:
    tar -zxf datos_[VERSION].tar.gz
  3. Switch to the uncompressed Datos IO directory.
    cd datos_[VERSION]
  4. Create the target installation directory on all nodes where the RecoverX software will operate.
  5. Install the software in the target installation directory. Replace the [IP_ADDRESS#] values with the internal IP addresses of the instances:
    ./install_datos --ip-address [IP_ADDRESS1] [IP_ADDRESS2] [IP_ADDRESS3] --target-dir /home/datos_user/datosinstall
    Note: The first IP address must belong to the local node; additional IP addresses are assumed to belong to remainder RecoverX nodes. The script installs the software on all of the nodes that make up the RecoverX distributed software environment. It generally takes about 10-20 minutes for the installation to finish.
  6. Upon successful installation, a message similar to the following should appear:
    <timestamp> : INFO: Completed installation of datos software (version <version>) to location /home/datos_user/datosinstall
    Note: Ignore the two warnings regarding fusermount.

Accessing RecoverX Software

RecoverX has a consumer-grade graphical user interface accessible through a web-based console. To log into the console, follow these steps:

  1. Use a web browser to connect to the console with the following URL. Replace [IP_ADDRESS] with the IP address of the node where RecoverX is deployed: https://[IP_ADDRESS]:9090/#/dashboard.
  2. Connecting to the UI requires the user to connect to the public IP address of the primary RecoverX node. To identify the primary RecoverX node, run the CLI command datos_status located in the installation folder of any RecoverX node.
  3. At the login screen, enter the default username "admin" and default password "admin." On successful login, the home page should appear.
  4. After logging in for the first time, change the password for the administrator account by clicking the Settings menu and choosing CHANGE PASSWORD.

Configuring RecoverX

After you have logged into the GUI, use the CONFIGURATION panels and follow the instructions on these panels to:

  • Add the configured Cassandra database cluster.
    • This step adds the Cassandra database to Datos IO RecoverX software.
    • You will need to enter internal IP address of one of the Cassandra nodes, CQLSH port (default: 9042) and choose the SSH authentication method.
    • You will need to install GNU awk.
  • Add the provisioned Cloud Storage bucket.
    • This bucket is where Datos IO RecoverX will store Cassandra backups.
  • Create backup policies by using Versioning panel.

Cleaning Up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial...

Delete the Project

The easiest way to eliminate billing is to delete the project you created for the tutorial.

To delete the project (Note: If you are exploring multiple tutorials and quickstarts, reusing projects instead of deleting them prevents you from exceeding project quota limits.):

  1. In the cloud platform console, go to the Projects page.
  2. Click the trash can icon to the right of the project name.
    Warning: Deleting a project has the following consequences:
    • If you used an existing project, you'll also delete any other work you've done in the project.
    • You can't reuse the project ID of a deleted project. If you created a custom project ID that you plan to use in the future, you should delete the resources inside the project instead. This ensures that URLs that use the project ID, such as an appspot.com URL, remain available.

Delete Your Compute Engine Instances

To delete a compute engine instance:

  1. In the Cloud Platform Console, go to the VM Instances page.
  2. Click the checkbox next to the instance you want to delete.
  3. Click the Delete button at the top of the page to delete the instance.

Delete your Cloud Storage Bucket

To delete a Cloud Storage bucket:

  1. In the Cloud Platform Console, go to the Cloud Storage browser.
  2. Click the checkbox next to the bucket you want to delete.
  3. Click the Delete button at the top of the page to delete the bucket.
Data (computing) Database Cloud storage Cloud Software cluster Directory Command (computing)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • How to Check Docker Images for Vulnerabilities
  • Iptables Basic Commands for Novice
  • Visual Network Mapping Your K8s Clusters To Assess Performance
  • How to Use MQTT in Java

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: