Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Introduction to Elasticsearch Snapshot and Restore Module

DZone's Guide to

Introduction to Elasticsearch Snapshot and Restore Module

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

When working with large amounts of data, backup and--if necessary--restoring is an important requirement. Elasticsearch has a snapshot and restore module that addresses this need.

The Elasticsearch snapshot and restore module is designed to meet the needs of the user indices for backup and restore.
 

Snapshot Repository

Before backup and restore operations of the indices, a snapshot repository should be registered in Elasticsearch.

The following command registers a shared file system repository named kodcucom_backup that in the location “/Users/hakdogan/data”

                              (1)          (2) 
curl -XPUT localhost:9200/_snapshot/kodcucom_backup -d '{
    "type": "fs", (3)
    "settings": {
        "location": "/Users/hakdogan/data", (4)
        "compress": true, (5)
        "chunk_size": "10m" (6)
    }
}'

Command details

1) REST enpoint for snapshot operations. First parameter is the repository name.

2) Unique repository name.

3) Defines the information of where snapshot files will be stored. FS value, specifies to use the shared file system.

4) Mandatory. Specified path, should point to same location in the shared filesystem and be accessible on all data and for master nodes.

Elasticsearch initially was supporting only shared file system repository, now supports AWSAzure like cloud services and HDFS file system via officially supported plugin repositories for snapshot files.

5) Defines the compression policy of the snapshot file. Default to true.

6) Big files can be broken down into chunks during snapshotting if desired. The chunk size can be specified in bytes or by using size value notation(eg 10m for 10 megabytes chunks). Defaults to null. Means that unlimited chunk size.


Snapshot

Now that we’ve registered a repository, we can now create our first snapshot.

                                           (1)          (2)
curl -XPUT localhost:9200/_snapshot/kodcucom_backup/snapshot_1

1) REST enpoint for snapshot operations. Second parameter is the snapshot name.

2) Unique snapshot name.

The above command creates a snapshot of all open and started indices in the cluster because it does not have body and request configuration parameters.

This behavior can be changed by specifying the list of indices in the body of the request.

curl -XPUT localhost:9200/_snapshot/kodcucom_backup/snapshot_1 -d '{
    "indices": "kodcucom", (1)
    "ignore_unavailable": "true" (2)
}'

1) Defines the information of the indices to be in the created snapshot. When the asterisk ( * ) character is used, a snapshot is created to all open and started indices in the cluster.

2) Default to true. The value will cause indices that do not exist to be ignored during snapshot creation.

The snapshotting process is executed in a non-blocking fashion by Elasticsearch. This means all operations can continue to be executed against the index during snapshotting.

After you create a snapshot, its information can be obtained as shown below.

curl -XGET localhost:9200/_snapshot/kodcucom_backup/snapshot_1
{
   "snapshots": [
      {
         "snapshot": "snapshot_1",
         "indices": [
            "kodcucom"
         ],
         "state": "SUCCESS",
         "start_time": "2014-10-01T06:55:12.413Z",
         "start_time_in_millis": 1412146512413,
         "end_time": "2014-10-01T06:55:23.371Z",
         "end_time_in_millis": 1412146523371,
         "duration_in_millis": 10958,
         "failures": [],
         "shards": {
            "total": 5,
            "failed": 0,
            "successful": 5
         }
      }
   ]
}

To obtain contact information for all snapshots of the Repository _all parameter is used.

curl -XGET localhost:9200/_snapshot/kodcucom_backup/_all

A snapshot can be deleted using the following command.

curl -XDELETE localhost:9200/_snapshot/kodcucom_backup/snapshot_1


Restore

A snapshot can be restored using the following command.

curl -XPOST localhost:9200/_snapshot/kodcucom_backup/snapshot_1/_restore (1)

1) Restored the snapshot mentioned in the previous parameter name.

The above command restores all indices of the specified snapshot name because it does not have body and request configuration parameters.

This behavior can be changed by specifying the list of indices in the body of the request.

curl -XPOST localhost:9200/_snapshot/kodcucom_backup/snapshot_1/_restore -d '{
    "indices": "kodcucom", (1)
    "ignore_unavailable": "true" (2)
}'

1) Defines the information of the indices to be restored. When the asterisk ( * )character is used,  restores all indices of the backup by the specified snapshot.

2) Default to true. The value will cause indices that do not exist in the snapshot to be ignored.

The restore operation can only be performed on a closed index in a functioning cluster. The restore operation automatically opens closed indices. Similarly, the restore operation creates new indices if they didn’t exist in the cluster.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}