Over a million developers have joined DZone.

Clean and Optimize the ElasticSearch Indexes of Logstash

DZone's Guide to

Clean and Optimize the ElasticSearch Indexes of Logstash

Free Resource

Effortlessly power IoT, predictive analytics, and machine learning applications with an elastic, resilient data infrastructure. Learn how with Mesosphere DC/OS.

ElasticSearch index files grow large quickly, and one of the most common questions about them is how to optimize them and clean them, getting rid of old records you're not interested in any longer. A very easy way to accomplish these tasks is using the following two scripts:

  • logstash_index_optimize.py
  • logstash_index_cleaner.py

The first optimizes the indexes newer than the specified number of days, while the latter cleans the indexes older than the specified number of days. The complete synopsis of either command can be obtained using the -h option.

Installing the Dependencies

These scripts depend on the following components:
  • The Python runtime (at least version 2).
  • The pyes package.
The pyes package, in turn, can be installed using pip:

 # pip install pyes

Beware that the ElasticSearch instance bundled by Logstash is not supported by the latest pyes release (0.90.x) which requires ElasticSearch 0.90. If you're using the ElasticSearch instance bundled in Logstash, you must install version 0.20.1:

 # pip install pyes==0.20.1

Installation on FreeBSD

The FreeBSD ports collection ships all the required dependencies as binary packages. The Python runtime can be installed with the following command:

 # pkg install python

pip can be installed using (assuming Python 2.7 has been installed, as in FreeBSD 9.2 and 10.0):

 # pkg install py27-pip

Once pip is installed, it can be used to installed pyes in a platform-independent way as explained in the previous section.

Running the Scripts

The simplest way to run the scripts is:

  • Passing the --host option to specify the ElasticSearch server to connect to.
  • Passing the -d option to specify the desired number of days.
 $ python /path/to/logstash_index_cleaner.py \
  --host es-host \
  -d 30

Given the periodic nature of these tasks, I usually schedule them as cron jobs in a crontab file.

Learn to design and build better data-rich applications with this free eBook from O’Reilly. Brought to you by Mesosphere DC/OS.


Published at DZone with permission of Enrico Maria Crisostomo, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}