Over a million developers have joined DZone.

Introduction to Elasticsearch Snapshot and Restore Module

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

When working with large amounts of data, backup and--if necessary--restoring is an important requirement. Elasticsearch has a snapshot and restore module that addresses this need.

The Elasticsearch snapshot and restore module is designed to meet the needs of the user indices for backup and restore.
 

Snapshot Repository

Before backup and restore operations of the indices, a snapshot repository should be registered in Elasticsearch.

The following command registers a shared file system repository named kodcucom_backup that in the location “/Users/hakdogan/data”

                              (1)          (2) 
curl -XPUT localhost:9200/_snapshot/kodcucom_backup -d '{
    "type": "fs", (3)
    "settings": {
        "location": "/Users/hakdogan/data", (4)
        "compress": true, (5)
        "chunk_size": "10m" (6)
    }
}'

Command details

1) REST enpoint for snapshot operations. First parameter is the repository name.

2) Unique repository name.

3) Defines the information of where snapshot files will be stored. FS value, specifies to use the shared file system.

4) Mandatory. Specified path, should point to same location in the shared filesystem and be accessible on all data and for master nodes.

Elasticsearch initially was supporting only shared file system repository, now supports AWSAzure like cloud services and HDFS file system via officially supported plugin repositories for snapshot files.

5) Defines the compression policy of the snapshot file. Default to true.

6) Big files can be broken down into chunks during snapshotting if desired. The chunk size can be specified in bytes or by using size value notation(eg 10m for 10 megabytes chunks). Defaults to null. Means that unlimited chunk size.


Snapshot

Now that we’ve registered a repository, we can now create our first snapshot.

                                           (1)          (2)
curl -XPUT localhost:9200/_snapshot/kodcucom_backup/snapshot_1

1) REST enpoint for snapshot operations. Second parameter is the snapshot name.

2) Unique snapshot name.

The above command creates a snapshot of all open and started indices in the cluster because it does not have body and request configuration parameters.

This behavior can be changed by specifying the list of indices in the body of the request.

curl -XPUT localhost:9200/_snapshot/kodcucom_backup/snapshot_1 -d '{
    "indices": "kodcucom", (1)
    "ignore_unavailable": "true" (2)
}'

1) Defines the information of the indices to be in the created snapshot. When the asterisk ( * ) character is used, a snapshot is created to all open and started indices in the cluster.

2) Default to true. The value will cause indices that do not exist to be ignored during snapshot creation.

The snapshotting process is executed in a non-blocking fashion by Elasticsearch. This means all operations can continue to be executed against the index during snapshotting.

After you create a snapshot, its information can be obtained as shown below.

curl -XGET localhost:9200/_snapshot/kodcucom_backup/snapshot_1
{
   "snapshots": [
      {
         "snapshot": "snapshot_1",
         "indices": [
            "kodcucom"
         ],
         "state": "SUCCESS",
         "start_time": "2014-10-01T06:55:12.413Z",
         "start_time_in_millis": 1412146512413,
         "end_time": "2014-10-01T06:55:23.371Z",
         "end_time_in_millis": 1412146523371,
         "duration_in_millis": 10958,
         "failures": [],
         "shards": {
            "total": 5,
            "failed": 0,
            "successful": 5
         }
      }
   ]
}

To obtain contact information for all snapshots of the Repository _all parameter is used.

curl -XGET localhost:9200/_snapshot/kodcucom_backup/_all

A snapshot can be deleted using the following command.

curl -XDELETE localhost:9200/_snapshot/kodcucom_backup/snapshot_1


Restore

A snapshot can be restored using the following command.

curl -XPOST localhost:9200/_snapshot/kodcucom_backup/snapshot_1/_restore (1)

1) Restored the snapshot mentioned in the previous parameter name.

The above command restores all indices of the specified snapshot name because it does not have body and request configuration parameters.

This behavior can be changed by specifying the list of indices in the body of the request.

curl -XPOST localhost:9200/_snapshot/kodcucom_backup/snapshot_1/_restore -d '{
    "indices": "kodcucom", (1)
    "ignore_unavailable": "true" (2)
}'

1) Defines the information of the indices to be restored. When the asterisk ( * )character is used,  restores all indices of the backup by the specified snapshot.

2) Default to true. The value will cause indices that do not exist in the snapshot to be ignored.

The restore operation can only be performed on a closed index in a functioning cluster. The restore operation automatically opens closed indices. Similarly, the restore operation creates new indices if they didn’t exist in the cluster.

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:

Published at DZone with permission of Hüseyin Akdoğan. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}