Amazon Elasticsearch Service Revised
The Amazon Elasticsearch Service has its draws, but be ready for challenges if you plan on using a VPC, IAM roles, or restore data from a snapshot.
Join the DZone community and get the full member experience.Join For Free
AWS first! is one of our consulting principles. Using a managed service provided by AWS is usually offering the most bang for the buck. But there are pitfalls and downsides hidden behind shiny marketing promises of these managed services as well.
Amazon Elasticsearch Service is offering the popular open-source search and analytics engine as a service. You can benefit from many advantages when using the service as described on AWS’s marketing page:
- Easy to use
- Highly available and secure
- Easily scalable
But that’s only one part of the story. There are downsides when using the Amazon Elasticsearch Service as well. I will highlight common challenges when using the managed service in real-world scenarios as well as illustrate workarounds in this article.
Placing a database into a VPC is possible with a relational Database Service (SQL database) and ElastiCache (in-memory database). The Amazon Elasticsearch Service does not support the Virtual Private Cloud (VPC). Accessing the service is only possible with a connection via the Internet.
All traffic from EC2 instances running in a private subnet to the Elasticsearch database running outside the VPC flows through a NAT gateway. The additional traffic is causing costs and consuming a limited resource.
Using a service outside the VPC is a no-go because of compliance requirements in some scenarios as well.
Snapshot and Restore
Being able to restore your data is an important aspect of a managed service. Amazon Elasticsearch Service is creating snapshots of your data daily. Unfortunately, you are not able to restore a snapshot yourself. Only the AWS support team can restore a snapshot. Therefore, you need to create a support ticket if you want to restore your data. Wich increases the RTO (recovery time objective) and reduces your control over the restore process.
There is a workaround for this problem. Elasticsearch has built-in support for snapshots stored on S3. As illustrated in the following figure you can use AWS Lambda to trigger snapshots based on a schedule as well as cleaning up snapshots automatically.
The Identity and Access Management (IAM) service is used to authenticate and authorize requests to the Amazon Elasticsearch Service. Almost every part of AWS offers integrations with IAM. As you can reuse IAM users, groups, and roles it is very easy to control access to your data stored within your Elasticsearch database.
But there some downsides caused by IAM as well. It is not possible to use the standard security mechanism provided by X-Pack (formerly Shield) with Amazon Elasticsearch Service.
A lot of tools from the Elasticsearch ecosystem do not support authentication with IAM out of the box. If you want to use logstash to ingest log data into your Elasticsearch database, an additional plugin is needed to handle the IAM-based authentication.
Amazon Elasticsearch Service offers Kibana out-of-the-box. But it is not possible to access Kibana when using IAM for authentication because your browser is not able to use IAM-based authentication. Using an IAM proxy as shown in the following figure allows you to work around this limitation.
AWS is offering encryption integrated with KMS (Key Management Service) for almost all services storing data (RDS, EBS, S3, …). In contrast, the Amazon Elasticsearch Service does not provide server-side encryption for data-at-rest. Storing data unencrypted could be a no-go if you must encrypt all data because of regulations.
The Amazon Elasticsearch Service stores data on EBS volumes. It is not possible to use Instance Storage, which provides cheaper storage with lower latency and higher throughput. For example, I3 instances would perfectly fit for a distributed database such as Elasticsearch but are not an option when using the Amazon Elasticsearch Service.
Some Elasticsearch operations are not available when using the Amazon Elasticsearch Service. For example, it is not possible to open and close indices. A process that is required to make changes to analyzers.
I’d recommend using a managed service provided by AWS whenever possible. AWS is doing a great job at offering services fulfilling the needs of the majority of their customers. The Amazon Elasticsearch Service is taking away the burden of managing a distributed system. But if you are planning to use Amazon Elasticsearch Service you should consider the downsides I have experienced in real-world projects and documented in this article as well.
Published at DZone with permission of Andreas Wittig. See the original article here.
Opinions expressed by DZone contributors are their own.