DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
What's in store for DevOps in 2023? Hear from the experts in our "DZone 2023 Preview: DevOps Edition" on Fri, Jan 27!
Save your seat
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Five Things You Probably Didn't Know About Amazon S3

Five Things You Probably Didn't Know About Amazon S3

Test your knowledge or learn something new in this list of (probably) little-known features about Amazon's S3 cloud service.

Agraj Mangal user avatar by
Agraj Mangal
·
May. 18, 18 · Analysis
Like (9)
Save
Tweet
Share
12.94K Views

Join the DZone community and get the full member experience.

Join For Free

Here is a list of top 5 things that are not very well known among the AWS developer community despite S3 being one of the most widely used AWS services.

Data Consistency

S3 provides read-after-write consistency for PUTS of new Objects. To understand what that means, it is important to understand that S3 achieves High Availability (HA) by replicating the data across multiple servers that could even span multiple data centers. So until you get back a 200 OK response to the PUT call, you cannot be sure that the new Object was created successfully and any immediate GET or HEAD call (like listing the keys within the bucket) for the same object might result in not showing the object. On the other hand, once the previous call has returned with a 200 OK, any subsequent GET calls for the new Object is guaranteed to return the object, as 200 OK signifies that the data is stored safely in S3.

S3 provides eventual consistency for overwrite PUTS and DELETES for existing objects. It is easy to follow this from the above-established premise — until the change (PUT or DELETE) has been propagated to all copies of data in S3, anyone else requesting the same object can get the previous data or deleted object.

For a more detailed understanding of eventual consistency, read this wonderful blog post by Werner Vogels.

S3 Select

With S3 Select, Amazon provides you with the capability to do Query in Place on the humongous data that you might have stored in S3, without having to download, decompress, process the entire dataset and then filter out the data that you need for further analysis. With S3 Select, you could just retrieve the data that you are interested in, which may result in large cost reduction as well in some cases. There are some limitations though; the data in S3 must be either in CSV or JSON format, and only a subset of SQL queries are supported. For a more involved data set, one could always use Amazon Athena but for a lot of cases, S3 Select could be used directly and can help in substantial cost reduction.

Transfer Acceleration

Since S3 buckets have a universal namespace, it's possible that your users might end up uploading tons of data to a bucket located in Sydney from different parts of the world. Some might get a good upload speed depending on the distance, while others may not. To rectify this, you could enable Transfer Acceleration on your S3 bucket. What that implies is now the end user can upload instead to a Cloudfront's Edge location and that data will be copied over to the original S3 bucket on a network optimized path, completely transparent to the end user. The end user just needs to interact with a common URI (bucketname.s3-accelerate.amazonaws.com) Just make sure that the bucket name is DNS compliant and does not contain periods (.)

Cross-Region Replication

Cross-region replication is a bucket-level feature that enables automatic, asynchronous copying of objects across buckets in different AWS Regions. Both the source and destination buckets must enable versioning before being able to use CRR. You can either replicate all the objects from source to destination or can specify the key name prefix so as to replicate only those objects which have that prefix (folder level replication). You can also change the storage tier of the destination bucket if you are doing replication for creating a backup of the data and that backup data is not going to be accessed frequently; it's beneficial to use S3-IA storage tier instead of the default one. The source and destination buckets can also be present in different AWS accounts altogether. If you replicate a bucket with existing data/files, then those are not copied or replicated to destination bucket. Only new objects are replicated. To help customers more proactively monitor the replication status of their Amazon S3 objects, AWS offers the Cross-Region Replication Monitor (CRR Monitor) solution.

Lifecycle Rules

S3 provides different tiers of storage for storing data. The default one provides 4 9's of Availability and 11 9's of Durability and can sustain the loss of 2 data center facilities concurrently, making it highly durable. But if you want to store infrequently accessed data that when needed should be readily available, then you better use S3-IA (Infrequently Accessed) storage as it has lower storage cost but higher retrieving cost than the default one. There is also another one, Reduced Redundancy Storage, which provides 4 9's of Availability and 4 9's of Durability making it less durable than Default, also cheaper than default storage. This is typically used for storing data that can be generated again easily. Another option is to use Glacier — typically used for Data Archival as it takes 3-5 hours to restore data from Glacier.

The lifecycle rules let you manage the lifecycle of an object in a particular storage tier and lets you define the transition and expiration actions. For example, you might choose to transition objects to the S3-IA storage class 30 days after you created them, or archive objects to the Glacier storage class one year after creating them. For more details on how to define these rules, refer to AWS documentation.

These are some of the lesser known features of S3. What do you think? How many did you already know?

AWS Data (computing) Object (computer science)

Published at DZone with permission of Agraj Mangal, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • A Beginner's Guide to Back-End Development
  • Project Hygiene
  • Top Five Tools for AI-based Test Automation
  • Public Cloud-to-Cloud Repatriation Trend

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: