Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Shipping Data to AWS Elasticsearch With Logagent

DZone's Guide to

Shipping Data to AWS Elasticsearch With Logagent

Data scientists need data. Learn how to ship data to your AWS Elasticsearch service using this open source solution. Let's go!

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Elasticsearch is already quite popular and its popularity just keeps growing. Looking at a Google Trends chart for the last 5 years shows this nicely:

There are a number of reasons why Elasticsearch is popular: it is very easy to get going with Elasticsearch, it's user-friendly and has great APIs, among other things. Its growing popularity is not only beneficial to Elasticsearch itself, but also to the whole community as the ecosystem around it is growing rapidly as well. There are tools developed by Elastic (the company, not software), like Logstash or Kibana and there are tools provided by third-party companies and developers like Logagent, Search Guard (you can find a blog post about it on our blog - Securing Elasticsearch and Kibana with Search Guard), Grafana, and many many more.

In addition to Elasticsearch and the ecosystem built around it, there are also commercial offerings. From hosted ELK as a service for logs that's a part of Sematext Cloud, to hosted Elasticsearch services like Amazon Elasticsearch Service which lets you run your own managed clusters.

Amazon Elasticsearch Service or Sematext Logsene

Before we talk about shipping data to an AWS Elasticsearch service let's just do a quick check. Is an AWS Elasticsearch service really what you want? To help with that decision, consider the following questions:

  • What is my use case for Elasticsearch? Do I have use cases other than centralized logging?
  • Do I have enough knowledge and experience to support my own Elasticsearch cluster?
  • Do I want to take the full responsibility for Elasticsearch maintenance and scaling to support a growing volume of data and/or queries?
  • Are there enough people on my team to share the burden or is it all going to fall on my plate?

If some or most of the answers are no, then you may want to stop reading here, check Logsene and save yourself both time and money. If, however, your use cases potentially include, but are not limited to, logging, then keep reading!

Here is a diagram that helps people figure out if they should use AWS Elasticsearch, or run their own Elasticsearch, or use a service like Logsene.

If most of the answers to the questions above were yes then you are likely considering an Amazon Elasticsearch service and going through the above flow diagram should confirm that. We compared the self-hosted Elasticsearch and the Amazon Elasticsearch service some time ago - you can read about in AWS Elasticsearch Service vs. Elasticsearch on EC2 blog post. The gist is that Amazon Elasticsearch service provides:

  • Automatic failed node replacement.
  • Node adding/removal via an API.
  • Rights management via IAM.
  • Daily S3 snapshots.
  • Basic CloudWatch metrics for Elasticsearch.

But the downsides are:

  • Increased costs compared to traditional EC2 instances.
  • Fewer instance types available.
  • Limited cluster-wide changes possible.
  • Unavailability of Elasticsearch logs.
  • Limited debugging possibilities because of API restrictions.

If the pros are more valuable than the cons limit you, and you would like to ship your logs to Amazon Elasticsearch Service let's see how to actually do that with Logagent, an open source, Node.js based log shipper.

Logagent to Amazon Elasticsearch Service

When using Amazon Elasticsearch Service you gain security as an out of the box feature, but you are also left hanging a bit as the official Elasticsearch client library doesn't support it. You either have to disable security and allow for communication from certain hosts without authentication or choose not to rely on the official Elasticsearch client libraries. However, if your use case is log/event shipping things are not that bad - you can use Logstash with an additional plugin or the newest version of Logagent, which has lower overhead and minimal impact on the system. See Top 5 Logstash Alternatives for more details.

Configuring Logagent to ship data to the Amazon Elasticsearch Service is really quite simple. For the purpose of the blog post, I will simply send the contents of a file to Elasticsearch. Let's assume my file is called app.log and is that it lives in the /var/log/myapp/ directory. The input part of Logagent configuration looks as follows:

input:
  stdin: true
  files:
    - /var/log/myapp/app.log

Now the data that is read from the input needs to be sent to an output - in our case our AWS Elasticsearch Service instance available at https://search-blog-elasticsearch-service-u3isln3erq3ocb2vkv3v2tmt24.eu-west-1.es.amazonaws.com/ (no, not available anymore - get your own!). To do that, we will use the output module called output-aws-elasticsearch. The configuration looks as follows:

output:
 aws-es:
   module: output-aws-elasticsearch
   url: https://search-blog-elasticsearch-service-u3isln3erq3ocb2vkv3v2tmt24.eu-west-1.es.amazonaws.com/
   index: myapp_logs
   type: myapp
   awsConfigFile: ./aws-config.json
   log:
     - type: 'stdio'
       levels: []

In the above output definition we mention the auth and awsConfigFile options. These are important. Logagent uses AWS SDK libraries and supports all authentication methods provided by the AWS API including signed HTTP requests. Logagent assumes you will provide the credentials to access your AWS environment by using the JSON file specified using the awsConfigFile options. The content of the aws-config.json looks as follows:

{
 "accessKeyId": <YOUR_ACCESS_KEY_ID>,
 "secretAccessKey": <YOUR_SECRET_ACCESS_KEY>,
 "region": "us-east-1"
}

So we need to provide the AWS access key, AWS secret key, and the region where we have our AWS Elasticsearch Service instances created and we are good to go. You can get all of this via the AWS Console.

And this is really all you need to do. Logagent really makes data shipping easy it's Apache Licensed and open-sourced on Github, completely pluggable, featuring a number of input, output, and processor plugins, and it's very easy to add your own. For more information check out http://sematext.com/docs/logagent. Enjoy!

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:
big data ,elasticsearch ,data gathering ,data sharing

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}