Which AWS Storage Solution Is Right for Your Elasticsearch Cluster?
There are way too many storage options on AWS. Let's see which one is best for your Elasticsearch cluster.
Join the DZone community and get the full member experience.Join For Free
Amazon Web Services (AWS) is one of the most competent cloud service providers around right now. It offers a number of different kinds of storage. It provides low-cost data storage with high durability and high availability.
This article will help you to understand the different storage services and features available in the AWS Cloud and how to select the right the storage type for your ELK stack.
AWS Storage Options for Your ELK Stack
Simple Storage Service (S3)
Amazon S3 provides access to reliable and inexpensive object storage. S3. Its simple hosting solution makes it a great option for hosting static websites and associated assets. It is strong for disaster recovery solutions that aid in business continuity. It is not ideal for dynamic web hosting or rapidly changing data.
Elasticsearch supports an integrated storage tier, named UltraWarm. It complements the hot tier by providing a low cost, read-only tier for older and less-frequently accessed data. UltraWarm uses Amazon S3 for storage, which is designed for 99.999999999 percent durability.
It is a fully-managed, low-cost, warm storage tier for Amazon Elasticsearch Service and is compatible with Elasticsearch and Kibana. It is ideal for managing the so-called 'hot-warm-cold' architecture where you gradually move and reshape your indexes, mostly based on time/size conditions, to accommodate for continuously changing needs.
Elastic Block Storage (EBS)
EBS provides persistent block storage volumes for use with Amazon EC2 instances. Block storage stores files in volumes that Amazon calls “blocks.” EBS offers improved performance over regular file storage. It is well-suited for use as primary storage for a database or file system. It is not ideal for temporary storage or highly durable storage.
EBS can work well with Elasticsearch but is sensitive to I/O performance, and has Input/Output Operations Per Second (IOPS) limits not only per volume, but per EC2 instance. There's a limit to how many IOPS an EC2 instance can perform. No matter how many IOPS are provisioned for the attached volume, and going over your IOPS limits, it can cause very significant slowdowns in performance that are difficult to diagnose.
Elastic File System (EFS) - Dangerous!
Data in EFS is organized into directories. EFS does offer durability, shared storage, and the ability to grow and shrink file systems dynamically, but you can achieve the same benefits using Elasticsearch directly.
Elasticsearch already replicates data across nodes. This greatly mitigates the need for the features that EFS offers.
Using more optimized storage processes directly from your ELK stack will show you how to move index shards across different tiers of your nodes and how to maintain actual data structures and files that it uses.
Amazon EC2 Instance Store
Instance store or storage is the disk that is physically attached to the virtualization host. This is the closest or lowest latency storage available to your instance other than RAM. Instance stores have value especially when it comes to massive IOPS at low latency. When disk storage optimization requires the need for high-performance storage, allowing for significant sustained indexing IOPS and significant random-read IOPS, the use of instance storage is ideal.
When running clusters of larger size and with replicas, the short-lived nature of Instance Store is ideal for Elasticsearch, since it can tolerate the loss of shards.
Amazon Glacier is an easy one to discount. It is not ideal for rapidly changing data and real-time access. With a long data retrieval process, Glacier should only be used for data accessed very infrequently, such as compliance logs.
Owing to the above points, the data which is to be stored in Glacier should be static data. This means it should not change, as active or more frequently changing data if stored in Glacier can result in more cost.
The most common use case of data storage in Glacier is for log data. The relevance of log data is inversely proportional to time and hence the old logs are almost completely archive material. The other case is for long term data storage solutions for enterprises, where there are requirements and recommendations that collected data should be stored/archived for years, even though it is not used.
AWS Storage Gateway
Amazon Storage Gateway is Amazon’s solution for joining on premise infrastructure with cloud-based storage. It is great for corporate file sharing, data mirroring to cloud resources, and disaster recovery; however, it is not ideal for database storage.
The storage gateway service relies on Amazon S3. It allows you to take advantage of the scale, durability, and cost benefits of cloud storage from your existing environment. Storage gateway also integrates with lots of other AWS systems, such as Glacier, for archiving.
Amazon FSx for Lustre
Amazon FSx for Lustre is all about speed. This makes it ideal for demanding workloads, such as machine learning. It uses a Solid-State Drive (SSD) to provide fast performance with low latency.
The high performance of Amazon FSx opens up some architectural options for you. For example, you can store frequently accessed logs in your FSx instance, while delegating less active logs to a slower but less expensive service, such as Amazon S3.
The platform integrates well with Amazon S3, so users can link S3 data sets with your high performance Lustre file systems to run compute intensive workloads. FSx for Lustre automatically copies data from S3, and writes results back to S3 once workloads have been run.
This array of Amazon Cloud Storage options can be confusing when it comes to selecting the right option for your ELK stack.
Our best advice? Experiment with the different storage options and take a data-driven approach. Measure the overall query performance of your Elasticsearch cluster when deployed onto different storage options. Secondly, don’t box yourself into one storage option. Combining rapid access storage, such as Amazon FSx, with a more scalable option, such as Amazon S3, will allow you to optimize your cluster to accommodate unforeseen load.
Opinions expressed by DZone contributors are their own.