DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Which AWS Storage Solution Is Right for Your Elasticsearch Cluster?
  • Building a Cost-Effective ELK Stack for Centralized Logging
  • Elasticsearch Query and Indexing Architecture
  • Optimizing Vector Search Performance With Elasticsearch

Trending

  • Advancing Robot Vision and Control
  • Next Evolution in Integration: Architecting With Intent Using Model Context Protocol
  • Traditional Testing and RAGAS: A Hybrid Strategy for Evaluating AI Chatbots
  • Distributed Consensus: Paxos vs. Raft and Modern Implementations
  1. DZone
  2. Data Engineering
  3. Data
  4. How to Migrate ElasticSearch Data Using Logstash

How to Migrate ElasticSearch Data Using Logstash

Learn how to migrate a data cluster in ElasticSearch with a new method for purposes like data backup during a system upgrade.

By 
Leona Zhang user avatar
Leona Zhang
·
Jun. 08, 18 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
12.0K Views

Join the DZone community and get the full member experience.

Join For Free

Engineers often find themselves in a position where they need to migrate data in ElasticSearch. The purpose of migrating a cluster could be to ensure data backup and system upgrade. There are just as many methods as there are reasons to perform the migration; for example, you can use ElasticSearch-dump, snapshot, or even reindex method. In this article, we will introduce a new method to quickly migrate an ElasticSearch cluster using Logstash.

I hope that with this explanation, you will be able to understand the theory behind using Logstash to migrate data. In its essence, the operation consists of using Logstash to read data from the source ElasticSearch cluster, then writing the data into the target ElasticSearch cluster. I have outlined the exact operation in the following section.

Steps to Migrate ElasticSearch Using Logstash

Step 1: Create a data sync conf file in the Logstash directory

vim ./logstash-5.5.3/es-es.conf

Step 2: Ensure Identical Names: When configuring the conf file, ensure that the index names are identical in both the target and source clusters. Refer to the screenshot below.

input {
    ElasticSearch {
        hosts => ["********your host**********"]
        user => "*******"
        password => "*********"
        index => "logstash-2017.11.07"
        size => 1000
        scroll => "1m"
    }
}
# a note in this section indicates that filter can be selected
filter {
}
output {
    ElasticSearch {
        hosts => ["***********your host**************"]
        user => "********"
        password => "**********"
        index => "logstash-2017.11.07"
    }
}

Step 3: Running Logstash: Once you have configured the conf file, run Logstash

bin/logstash -f es-es.conf

Sometimes running this command will generate the following error message

[FATAL][logstash.runner] Logstash could not be started because there is already another instance using the configured data directory. If you wish to run multiple instances, you must change the "path.data" setting.

This is because the current version of Logstash does not allow multiple instances to share the same path.data. Therefore, when you start it up, include "--path.data PATH” in the command to define different paths for different instances.

bin/logstash -f es-es.conf --path.data ./logs/

If all goes as intended, you can use the following command to view the corresponding index in the target ElasticSearch

curl -u username:password host:port/_cat/indices

Let us now look at a sample use case.

Migrating ElasticSearch Data Using Logstash Sample Use Case

**A lot of clients using their own home-built versions of ElasticSearch have been paying close attention to the Alibaba Cloud ElasticSearch products. They want to use it but have difficulties migrating their data from their own ElasticSearch to Alibaba Cloud ElasticSearch. The following will be an explanation of how to use Logstash to quickly migrate home built ElasticSearch index data on the cloud.

The logic behind this solution is quite simple, you require to configure multiple es-to-es conf file. However, this can be a cumbersome process. You can make this easier by using Logstash. Before I start explaining how you can do this, let me explain three core concepts of Logstash.

  1. Metadata: The concept of metadata was introduced in Logstash 1.5. You can use it to describe an event and can change it whenever you require it. However, you cannot write into it or affect the event results. Furthermore, the metadata field contains the metadata information for the event and can survive throughout the entire operation of the input, filter, and output plug-ins. You can know more about metadata by clicking here.
  2. Docinfo: A parameter in the ElasticSearch input plug-in which is set to false by default. The description on the official website is, "If set, include ElasticSearch document information such as index, type, and the id in the event.” That means once this field is set to take effect, the metadata will record the index, type, and id information. This also means that you can use index, type, and id parameters at any point in the entire lifecycle of the event.
  3. *: The index parameter in the ElasticSearch input plug-in supports the wildcard character "*” to represent all objects.

Because of the way metadata works, you can "inherit” the index and type information from the input to the output. Additionally, you can create the index, type, and id information in the target cluster that is identical to that in the source cluster.

If at any point in the process you want to see and debug the metadata information, you need to add the following setting to the output:

stdout { codec => rubydebug { metadata => true } }

Use the following configuration code:

input {
    ElasticSearch {
        hosts => ["yourhost"]
        user => "**********"
        password => "*********"
        index => "*"#This wildcard requires the process to read all index information
        size => 1000
        scroll => "1m"
        codec => "json"
        docinfo => true
    }
}
# a note in this section indicates that filter can be selected
filter {
}

output {
    ElasticSearch {
        hosts => ["yourhost"]
        user => "********"
        password => "********"
        index => "%{[@metadata][_index]}"

    }
    stdout { codec => rubydebug { metadata => true } }

}

After running the command, Logstash will copy all of the indexes in the source cluster to the target cluster, carrying with it the mapping information. Next, it will begin gradually migrating the data inside the indexes.

When formally executing, you will see a setting which looks like this:

stdout { codec => rubydebug { metadata => true } }

I would recommend you to delete this setting to prevent your screen from being filled with metadata information.

Conclusion

I hope this article helped you understand how you can migrate ElasticSearch data using Logstash. I have also described the core concepts of Logstash, which you should be aware before you start the migration process.

Elasticsearch Data (computing) cluster Metadata

Published at DZone with permission of Leona Zhang. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Which AWS Storage Solution Is Right for Your Elasticsearch Cluster?
  • Building a Cost-Effective ELK Stack for Centralized Logging
  • Elasticsearch Query and Indexing Architecture
  • Optimizing Vector Search Performance With Elasticsearch

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!