Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Amazon Elasticsearch Service Revised

DZone's Guide to

Amazon Elasticsearch Service Revised

The Amazon Elasticsearch Service has its draws, but be ready for challenges if you plan on using a VPC, IAM roles, or restore data from a snapshot.

· Cloud Zone
Free Resource

Deploy and scale data-rich applications in minutes and with ease. Mesosphere DC/OS includes everything you need to elastically run containerized apps and data services in production.

AWS first! is one of our consulting principles. Using a managed service provided by AWS is usually offering the most bang for the buck. But there are pitfalls and downsides hidden behind shiny marketing promises of these managed services as well.

Amazon Elasticsearch Service is offering the popular open-source search and analytics engine as a service. You can benefit from many advantages when using the service as described on AWS’s marketing page:

  • Easy to use
  • Highly available and secure
  • Easily scalable

But that’s only one part of the story. There are downsides when using the Amazon Elasticsearch Service as well. I will highlight common challenges when using the managed service in real-world scenarios as well as illustrate workarounds in this article.

VPC

Placing a database into a VPC is possible with a relational Database Service (SQL database) and ElastiCache (in-memory database). The Amazon Elasticsearch Service does not support the Virtual Private Cloud (VPC). Accessing the service is only possible with a connection via the Internet.

Elasticsearch running outside VPC

All traffic from EC2 instances running in a private subnet to the Elasticsearch database running outside the VPC flows through a NAT gateway. The additional traffic is causing costs and consuming a limited resource.

Using a service outside the VPC is a no-go because of compliance requirements in some scenarios as well.

Snapshot and Restore

Being able to restore your data is an important aspect of a managed service. Amazon Elasticsearch Service is creating snapshots of your data daily. Unfortunately, you are not able to restore a snapshot yourself. Only the AWS support team can restore a snapshot. Therefore, you need to create a support ticket if you want to restore your data. Wich increases the RTO (recovery time objective) and reduces your control over the restore process.

There is a workaround for this problem. Elasticsearch has built-in support for snapshots stored on S3. As illustrated in the following figure you can use AWS Lambda to trigger snapshots based on a schedule as well as cleaning up snapshots automatically.

Triggering Elasticsearch snapshots with Lambda

IAM

The Identity and Access Management (IAM) service is used to authenticate and authorize requests to the Amazon Elasticsearch Service. Almost every part of AWS offers integrations with IAM. As you can reuse IAM users, groups, and roles it is very easy to control access to your data stored within your Elasticsearch database.

But there some downsides caused by IAM as well. It is not possible to use the standard security mechanism provided by X-Pack (formerly Shield) with Amazon Elasticsearch Service.

A lot of tools from the Elasticsearch ecosystem do not support authentication with IAM out of the box. If you want to use logstash to ingest log data into your Elasticsearch database, an additional plugin is needed to handle the IAM-based authentication.

Amazon Elasticsearch Service offers Kibana out-of-the-box. But it is not possible to access Kibana when using IAM for authentication because your browser is not able to use IAM-based authentication. Using an IAM proxy as shown in the following figure allows you to work around this limitation.

IAM Proxy for Elasticsearch

Encryption

AWS is offering encryption integrated with KMS (Key Management Service) for almost all services storing data (RDS, EBS, S3, …). In contrast, the Amazon Elasticsearch Service does not provide server-side encryption for data-at-rest. Storing data unencrypted could be a no-go if you must encrypt all data because of regulations.

Instance Storage

The Amazon Elasticsearch Service stores data on EBS volumes. It is not possible to use Instance Storage, which provides cheaper storage with lower latency and higher throughput. For example, I3 instances would perfectly fit for a distributed database such as Elasticsearch but are not an option when using the Amazon Elasticsearch Service.

Limited Operations

Some Elasticsearch operations are not available when using the Amazon Elasticsearch Service. For example, it is not possible to open and close indices. A process that is required to make changes to analyzers.

Summary

I’d recommend using a managed service provided by AWS whenever possible. AWS is doing a great job at offering services fulfilling the needs of the majority of their customers. The Amazon Elasticsearch Service is taking away the burden of managing a distributed system. But if you are planning to use Amazon Elasticsearch Service you should consider the downsides I have experienced in real-world projects and documented in this article as well.

Discover new technologies simplifying running containers and data services in production with this free eBook by O'Reilly. Courtesy of Mesosphere.

Topics:
cloud ,aws ,elasticseach ,managed services

Published at DZone with permission of Andreas Wittig, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}