{{announcement.body}}
{{announcement.title}}

How to Keep Your Files Safe in S3 with Versioning

DZone 's Guide to

How to Keep Your Files Safe in S3 with Versioning

Warning! Your important files are at risk of being accidentally or maliciously overwritten or deleted and will be lost forever!

· Cloud Zone ·
Free Resource
Warning! Your important files are at risk of being accidentally or maliciously overwritten or deleted and will be lost forever! The solution? Enable versioning!

A new S3 bucket has versioning disabled by default. By enabling versioning, S3 will manage an unlimited amount of historical versions for your objects. Uploading an object will not overwrite but will instead create a new version. Deleting an object will not remove but will instead create a delete marker, which is a placeholder that enables S3 to keep track of deleted objects without actually removing the object.

Delete marker

This means that your files are safe because only the bucket owner can delete previous versions.

What’s the downside, you ask? Cost!

AWS charges for every stored version, which can add up quickly depending on how frequently you are replacing and deleting existing objects.

Every time you delete an object, the object isn’t being removed; you will continue to pay storage costs for that object.

So how can I get the benefits of versioning while still keeping your costs down? The answer is lifecycle rules.

Using lifecycle rules, you can delete older versions after some time automatically (more on this later)

How to Enable Versioning for Your Bucket?

You can only enable versioning on the bucket level.

You can turn on versioning by accessing the S3 section in the AWS Console and performing the steps below:

  • Select the bucket from the list.

Selecting bucket

  • Click properties and then versioning

Versioning


  • Click enable versioning and then save.


Versioning


Once versioning is enabled, you will see a new option on the overview screen to hide/show versions. This option will show even if you later suspend versioning.

Hide versions


Tip: Enabling versioning is a pre-requisite to other S3 capabilities such as cross-region replication and object locking.

Disclaimer: You Only Need to Read This if You Already Have Lifecycle Rules Setup

If you have an existing lifecycle rule setup to delete objects after a specified period, the behavior will change. Before versioning, the lifecycle rule will simply remove the object. After versioning, the object is made a previous version, and you will still pay storage costs on that object. To maintain the past behavior, you would need to modify the rule to extend to earlier versions as well. You can perform the following steps to accomplish this:

Lifecycle rule


The above lifecycle rule translates to:

a) Create a delete marker for the newest version of an object if it was created more than a day ago.

b) Any older version which has not been the latest version for more than a day should be permanently deleted.

This means that there is a waiting period of a minimum of 2 days before an active object can be fully cleaned up. One day to make the current version a previous one and one to remove the previous one.

The Details: How Does Versioning Actually Work?

When retrieving an object, S3 will always give you the latest version. If the newest version is a delete marker, the object will appear deleted. You can ask S3 for a specific version of an object by specifying the version number when requesting an object. Alternatively, you can tell S3 to list all the versions of an object and then choose the one you would like to retrieve.

A version is considered a regular S3 object and, therefore, can have its own permissions and encryption settings.

Creating a Version

Uploading any object will automatically create a new version and be given a version id, any object stored before enabling versioning will have a version id of null.

After and before versioning

When you upload an object via Rest API, the response will contain a header with the version id.

Response

Retrieving the Latest Version of An Object

When requesting an object, S3 will always give you the latest version. If the newest version is a delete marker, you will receive a 404 error and a response header of “x-amz-delete-marker:true”.

The URL to access an object consists of the bucket name, the region, and filename, for example:

GET https://bucketname.s3.region.amazonaws.com/image.jpg

404 not found


Retrieving a Specific Version of An Object

You can retrieve a specific version of an object by adding a query string parameter called “version id,” as shown below;

GET: https://bucket.s3.region.amazonaws.com/image.jpg?versionId=f5jZYxWqfe.WlmF73GctmFHqVYfdrf8.

Alternatively, you can make a HEAD request just to get the metadata of the object without the actual contents, as shown below:

HEAD: https://test-bucket-2-dg.s3.amazonaws.com/image.jpg?versionId=f5jZYxWqfe.WlmF73GctmFHqVYfdrf8.

Listing the Files in A Bucket

Listing the files in a bucket will only return the current version of all objects stored and exclude anything which has a delete marker.

An example of listing the files in a bucket is below:

GET https://bucket.s3.region.amazonaws.com/

The response…

GET response


Listing the Object Versions in A Bucket

You can list all versions of all files in a bucket by calling
GET https://bucket.s3.region.amazonaws.com/?versions

The response…

Response

You can add the prefix query string parameter to the above request to limit the response to a specific object. For example:

GET https://test-bucket-2-dg.s3.us-east-1.amazonaws.com/?versions&prefix=image.jpg

Deleting an Object

You can delete an object by hiding versions (1), selecting the object to remove (2), clicking actions (3), and then delete (4).

Actions

When deleting an object in this way, the object is not deleted; instead, a delete marker is created as a version.

Delete marker

This means that anyone trying to retrieve the object will see that it’s deleted, but the previous versions will still exist, and you can either restore from an older version or remove the delete marker to restore the file.

Here’s how you would remove the delete marker to restore the file:

Removing delete marker

Below is how it looks once the delete marker is removed and the object is restored.

Delete marker removed

Delete markers aren’t full objects but do take up storage. Here is a quote from the AWS documentation as to how much storage they take up:
Delete markers accrue a nominal charge for storage in Amazon S3. The storage size of a delete marker is equal to the size of the key name of the delete marker. The UTF-8 encoding adds from 1 to 4 bytes of storage to your bucket for each character in the name.

Deleting a Version

When deleting a version, the version will be permanently erased. Only the bucket owner or those with the correct permissions can delete a version.

Additionally, you can require multi-factor authentication (MFA) to delete a version. This will require a multi-factor token to be used to perform the deletion or change any versioning settings for the bucket. Muti Factor delete cannot be enabled through the AWS console at this time, and only the root account can activate it.

When MFA is enabled, every delete API request will need the header “x-amz-MFA” which is a combination of your authentication device’s serial number, and the authentication code displayed on it. An example is below:

DELETE /image.jpg?versionId=3HL4kqCxf3vjVBH40Nrjfkd
x-amz-mfa: 20899872 301749

Restoring a Version

Restoring a version of a previous object can be accomplished in one of two ways

  1. The first is to delete the current version of the object, which will cause S3 to promote the last version as the current one. You would use this technique if you are not concerned about losing any version information, as it will require you to delete every version from the current until the version you wanted to restore.
  2. An alternative approach is to download the version you want to restore and re-upload as the current version. You can also issue a copy API call to avoid downloading the file. You would use this technique if you didn’t want to lose any version history.

Lifecycle Rules

You can use lifecycle rules to clean up older versions by:

  • Removing previous versions after a specific period
  • Adding a delete marker to current versions after a particular period
  • Cleaning up old expired delete markers

To set up a lifecycle rule, perform the following steps:

Click management, then Lifecycle then Add Lifecycle Rule:

Add lifecycle rule


Decide whether the rule should apply to every object or just specific objects based on either a tag or prefix.

Apply to all objects


Leave this next screen blank.

Leave blank


Decide whether to insert a delete marker on the current version (1) or have this rule permanently delete previous versions after a specified period (2). Additionally, you can have the rule cleanup expired delete markers (3), which serve no purpose and can degrade the performance of List Operations.

Lifecycle rule


Save the rule:

Save rule

Note: There is no way only to keep a specified number of versions using just built-in functionality. For example, if you wanted to say only keep the last two versions, this would need to be accomplished in other ways.

Cleaning Up Expired Delete Markers

S3 doesn’t clean up automatically expired delete markers. If an object is deleted more than once, S3 will keep both the current delete marker, which indicates the object is currently deleted and the old one, which serves no purpose.

Cleaning up expired markers

AWS says that you do not pay storage costs on old delete markers, but they could impact performance, especially on LIST operations.

Not Happy with Versioning! Let’s Disable It!

Let’s say you decide versioning isn’t for you! Let’s go ahead and disable it or can we? The truth is you can only suspend versioning, which means you can prevent new versions from being created, but it will not automatically cleanup old versions. You will continue to be charged for those versions until they are removed from your bucket. Additionally, lifecycle rules will continue to run for previous versions even though versioning is suspended.

Only the bucket owner or those with the relevant permissions can suspend versioning.

To truly disable versioning, you would need to create a new bucket and copy all your objects into that bucket.

When versioning is suspended, your old versions are safe and can’t be accidentally overwritten, except for any object that has a version id of null. Every version id must be unique, so only one version can have an id of null. This applies to both full objects as well as delete markers.

Suppose you uploaded an object with versioning disabled, and that object was given a version id of null. After enabling versioning, you uploaded several more versions of that file, as shown below.

Disabled versioning

By suspending versioning again, any object uploaded will replace the versioned object with an id of null.

Deleting an object with versioning suspended will create a delete marker with a version id of null. If there is already an existing version with an id of null, it will be replaced.

Cleaning up Previous Versions

To clean up older versions, you can either delete those versions manually or set up a temporary lifecycle policy to handle it for you.

I hope you enjoyed this article. Feel free to leave any comments below.

Topics:
aws, backup, cloud, data recovery, s3, tutorial, versioning

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}