Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Operationalizing MemSQL

DZone's Guide to

Operationalizing MemSQL

In a recent , we shared some tips and tricks on how you can operationalize MemSQL for configuring MemSQL Ops and MemSQL root user passwords; memory settings ...

· Database Zone ·
Free Resource

Compliant Database DevOps and the role of DevSecOps DevOps is becoming the new normal in application development, and DevSecOps is now entering the picture. By balancing the desire to release code faster with the need for the same code to be secure, it addresses increasing demands for data privacy. But what about the database? How can databases be included in both DevOps and DevSecOps? What additional measures should be considered to achieve truly compliant database DevOps? This whitepaper provides a valuable insight. Get the whitepaper

In a recent webcast, we shared some tips and tricks on how you can operationalize MemSQL for configuring MemSQL Ops and MemSQL root user passwords; memory settings and health metrics; and how you can take backups and add nodes securely to your cluster.

Here are the topics we covered:

  • Permissioning a new cluster by adding a super-user to MemSQL Ops and via GRANT statements on the cluster itself.
  • Best practices for configuring memory limits.
  • Best practices for basic cluster monitoring, and how to retrieve the relevant metrics.
  • How to take physical backups of a cluster, and how backups are stored in the filesystem.
  • How to add new nodes to a cluster using the MemSQL Ops Command Line Interface (CLI).

We also received questions from webcast attendees, and here are the answers.

Question: In MemSQL 6.x, do all backup files need to be present on all nodes when attempting a restore?

Any particular backup file needs to be accessible to the node that will restore the data on it. This is easiest to understand with an example.

Consider a cluster with one Master Aggregator, two leaves, and four partitions per leaf, and a database named my_database. The data within my_database will by default consist of eight partitions, along with metadata. When my_database is backed up, it goes to a single metadata backup file of the same name, along with eight partition backup files named my_database_0, my_database_1, my_database_2, etc. Restoring the cluster successfully does not require that each of the partition backup files be accessible on every node, but it does require that each of them be accessible to at least one leaf node and that the metadata backup file be accessible to the master aggregator.

Question: How can a new Master Aggregator be deployed/assigned to a cluster?

Currently, the only way to create a new Master Aggregator is through the promotion of an existing Child Aggregator using the AGGREGATOR SET AS MASTER command.

Question: Since MemSQL backups are distributed across the cluster in the same way as the data they back up, how can they be made robust against hardware failures?

If MemSQL backups are stored to a location that represents a locally mounted filesystem on each host, then there are two approaches to reduce the risk of losing backup data due to hardware failure:

  1. Instead of backing up to a local filesystem, the backups can be written directly to a network file system visible to all of the nodes in the cluster. This segregates the integrity of the backup files from the health of the hardware on which they were generated at the cost of incurring some overhead during the backup process itself due to network latency.
  2. Backups can be written locally, then moved or copied via a scripted process to either a network location or object storage system. This approach avoids the potential performance impact of the previous option but requires adding an additional layer of complexity to the deployment and is asynchronous.

Question: Does the node addition and rebalancing process change when operating a cluster in High Availability mode?

The operational procedure for adding new nodes remains the same in HA mode, and from the user perspective, the rebalancing operation also does not change. An even number of nodes must be used to preserve the HA topology; Ops will distribute the new nodes automatically among the two availability groups as they are added.

Question: What are the architectural requirements for using the Ops CLI to add nodes, as was done in the webcast? When these requirements cannot be met, how can nodes be added?

Using Ops to deploy new MemSQL nodes throughout the cluster requires it be able to SSH between nodes, but (as was done in the webcast) this is not necessary if the new MemSQL node is simply deployed from a colocated agent downloaded onto the new host. This process does still require that Ops be able to access the MemSQL download site to download the MemSQL engine binaries.

If, for security or other concerns, the MemSQL engine binaries cannot be downloaded directly by Ops, the binaries can be transferred to the host manually, and Ops can complete the rest of the installation process using them. In this scenario, the binaries should be unzipped manually into the location specified in Ops' settings.conf file.

If Ops cannot be used at all, MemSQL nodes can still be installed and added manually via the ADD LEAF command. In this case, we recommend engaging with a MemSQL representative to ensure best practices are followed during the node addition.

Question: How does the monitoring picture change if I’m mostly using columnstore?

The fundamental monitoring picture does not change. Even in a purely columnstore use case, MemSQL still leverages the rowstore during ingest and allocates memory aggressively for query execution performance, so it is still important to monitor memory utilization as described in the webcast. Columnstore data is typically compressed enough on disk that a properly sized cluster is unlikely to be at risk of encountering an out-of-disk condition. However, it is still more likely to be the case than with the rowstore, so system-level disk space monitoring is more important.

Question: What is a secure way to clear buffers in memory?

A best practice for MemSQL deployments precludes running other memory-intensive processes on the same hosts, and buffered memory should not cause an issue on a properly-sized and configured cluster.

If scenarios arise during operational use of MemSQL that appear to require flushing the memory buffer, we recommend reaching out to MemSQL support.

Did you miss the webcast? You can get a copy of the recording to watch on demand here.

Compliant Database DevOps and the role of DevSecOps DevOps is becoming the new normal in application development, and DevSecOps is now entering the picture. By balancing the desire to release code faster with the need for the same code to be secure, it addresses increasing demands for data privacy. But what about the database? How can databases be included in both DevOps and DevSecOps? What additional measures should be considered to achieve truly compliant database DevOps? This whitepaper provides a valuable insight. Get the whitepaper

Topics:
memsql ,database ,tutorial ,operationalization

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}