DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Java Code Review Solution
  • Leverage Lambdas for Cleaner Code
  • Getting Started With JMS-ActiveMQ: Explained in a Simple Way
  • Binary Code Verification in Open Source World

Trending

  • Emerging Data Architectures: The Future of Data Management
  • Medallion Architecture: Efficient Batch and Stream Processing Data Pipelines With Azure Databricks and Delta Lake
  • Why High-Performance AI/ML Is Essential in Modern Cybersecurity
  • Enhancing Security With ZTNA in Hybrid and Multi-Cloud Deployments
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Time-Based Indexing in Elasticsearch Using Java

Time-Based Indexing in Elasticsearch Using Java

Want to learn more about using time-based indexing in Elasticsearch?

By 
Abhinav Sinha user avatar
Abhinav Sinha
·
Updated Nov. 13, 19 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
26.9K Views

Join the DZone community and get the full member experience.

Join For Free

Anybody who uses Elasticsearch for indexing time-based data such as application logs, is accustomed to the index-per-day pattern: use an index name derived from the timestamp of the logging event rounded to the nearest day( viz. myapp_logs_index_07_11_2019, myapp_logs_index_08_11_2019 etc.) and new indices pop into existence as soon as they are required. It’s a classic use-case.

Need for Time-Based Indexing

Most traditional use cases for search engines involve a relatively static collection of documents that grow slowly. Searches look for the most relevant documents, regardless of when they were created.

With application logs, this number of documents in the index grows rapidly, often accelerating with time. Documents are almost never ( with logs, never actually) updated, and searches mostly target the most recent documents. As documents age, they lose value.

If we were to have one big index for documents of this type, we would soon run out of space. Logging events just keep on coming without pause or interruption. We could delete the old events with a scroll query and bulk delete, but this approach is very inefficient. When you delete a document, it is only marked as deleted. It won’t be physically deleted until the segment containing it is merged away.

Purging old data with time-based indexing is easy — just delete old indices.

Rollover API

Elasticsearch provides support for time-based indexing using its Rollover API. It is offered in two forms that I found particularly interesting:

  1. REST-based APIs
  2. Java APIs

For testing and playing around with how rollover actually works, it is imperative to use the REST endpoint since it’s so easy to set up and run. We will talk about both the ways in this blog.

Rollover API follows the Rollover pattern, which essentially works as follows:

  • There is one alias used for indexing that points to the active index.
  • Another alias points to active and inactive indices and is used for searching.
  • The active index can have as many shards as you have hot nodes to take advantage of the indexing resources of all your expensive hardware.
  • When the active index is too full or too old, it is rolled over, a new index is created, and the indexing alias switches atomically from the old index to the new.
  • The old index is moved to a cold node and is shrunk down to one shard, which can also be force-merged and compressed. However, this will not be not covered in this blog.

REST-Based Method

We’re going to create two aliases:logs-search for searches and logs-write for indexing.

1. First, we create a new index template with a search alias. We will now refer to the index using this alias only for searches.

PUT localhost:9200/_template/logs

{

“template”: “logs-*”,

“settings”: {

“number_of_shards”: 5,

“number_of_replicas”: 1

},

“aliases”: {

“logs-search”: {



}

}

}


2. Next, we create an index with payload as it writes the alias and rollover settings.

PUT localhost:9200/logs-000001

{

“aliases”: {

“logs-write”: {

“rollover”: {

“conditions”: {

“max_age”: “60s”,

“max_docs”: 10

}

}

}

}

}


3. We index some data using the alias — this is not the actual index name.

POST localhost:9200/logs-write/_doc/861233345

{

“user”: “kimchy”,

“post_date”: “2009-11-15T14:12:12”,

“message”: “trying out Elasticsearch”

}

You may see a response similar to this:

{

“_index”: “logs-000001”,

“_type”: “_doc”,

“_id”: “861233345”,

“_version”: 3,

“result”: “updated”,

“_shards”: {

“total”: 2,

“successful”: 1,

“failed”: 0

},

“_seq_no”: 2,

“_primary_term”: 1

}


4.  You need to keep on hitting the rollover endpoint to do the rollover, given any of the three specified conditions are met. If it is such, the rollover happens and a new index is created with the name viz. logs-00002. The alias now points to this active index. The rollover API is smart enough to detect naming patterns via numbers, dates, and increments to the next value. 

POST localhost:9200/logs-write/_rollover

{
“conditions”: {
“max_age”: “5s”,
“max_docs”: 5,
“max_size”: “5mb”
}
}


5.  To verify that the rollover did, indeed, happen, try writing some new data to the index (again using the alias):

POST localhost:9200/logs-write/_doc/1233

You can see the index that was written to logs-000002, which is the rolled over index:

{

“_index”: “logs-000002”,

“_type”: “_doc”,

“_id”: “861233345”,

“_version”: 3,

“result”: “updated”,

“_shards”: {

“total”: 2,

“successful”: 1,

“failed”: 0

},

“_seq_no”: 2,

“_primary_term”: 1

}


6.  For searches, however, you would use the search alias, which keeps on pointing to all the logs-*  indexes because of the index template we defined in step one. If we were to use the  logs-write alias for searching, it would only point to the rolled over index (only one), and we won’t have all the documents from the previous indexes.

GET localhost:9200/logs-search/_search

{
"took": 17,
"timed_out": false,
"_shards": {
"total": 20,
"successful": 20,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 1,
"hits": [
{
"_index": "logs-000001",
"_type": "_doc",
"_id": "8611234677862",
"_score": 1,
"_source": {
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch"
}
},
{
"_index": "logs-000002",
"_type": "_doc",
"_id": "861123467jahd",
"_score": 1,
"_source": {
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch"
}
},
{
"_index": "logs-000003",
"_type": "_doc",
"_id": "861123467jahd",
"_score": 1,
"_source": {
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch"
}
},
{
"_index": "logs-000004",
"_type": "_doc",
"_id": "861123467jahd",
"_score": 1,
"_source": {
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch"
}
},
{
"_index": "logs-000001",
"_type": "_doc",
"_id": "8611234677",
"_score": 1,
"_source": {
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch"
}
},
{
"_index": "logs-000002",
"_type": "_doc",
"_id": "8611234677",
"_score": 1,
"_source": {
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch"
}
}
]
}
}


As you can see, the search result contains data from different indexes(logs-00001 thru logs-00004).

7. The multiple indices fetch was possible because of the logs-search  alias that points to multiple indices. To verify this, use:

alias index filter routing.index routing.search
logs-search logs-000002 - - -
logs-write logs-000002 - - -
logs-search logs-000001 - - -


Also, notice that logs-write just points to one index at a time, which is what we desire.

Rollover Java API

For the Java API, refer to the code here.

Feel free to leave a comment below if you have any questions!

Java (programming language) Elasticsearch code style

Opinions expressed by DZone contributors are their own.

Related

  • Java Code Review Solution
  • Leverage Lambdas for Cleaner Code
  • Getting Started With JMS-ActiveMQ: Explained in a Simple Way
  • Binary Code Verification in Open Source World

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!