DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Architecting Petabyte-Scale Hyperspectral Pipelines on AWS
  • Model Context Protocol Vs Agent2Agent: Practical Integration with Enterprise Data
  • AWS Airflow vs Step Functions: The Data Engineering Orchestration Dilemma
  • Enterprise-Grade Document Intelligence: Cloud Big Data AI With YOLOv9 and Spark on AWS

Trending

  • Optimizing High-Volume REST APIs Using Redis Caching and Spring Boot (With Load Testing Code)
  • AI Agents in Java: Architecting Intelligent Health Data Systems
  • Securing the AI Host: Spring AI MCP Server Communication With API Keys
  • Docker Hardened Images Are Free Now — Here's What You Still Need to Build
  1. DZone
  2. Data Engineering
  3. Big Data
  4. 5 Benefits of S3 Select and How it Protects Data Under GDPR Compliance

5 Benefits of S3 Select and How it Protects Data Under GDPR Compliance

AWS S3 was built to help teams use their data better. Learn how it can also be leveraged to increase data security and GDPR compliance.

By 
Jayashree Hegde Adkoli user avatar
Jayashree Hegde Adkoli
·
Jun. 05, 18 · Opinion
Likes (2)
Comment
Save
Tweet
Share
9.6K Views

Join the DZone community and get the full member experience.

Join For Free

Before the release of S3 Select, if you had to pull only a specific set of raw data from the S3 Bucket, you had to download the entire chunk of data from the bucket, unzip them, and then search for the required records. Amazon Athena helps to some extent, but it only analyzes a specific set of data (like Big Data) residing in Amazon S3. AWS S3 Select has a different use case. It scans just the requested columns from S3 and returns only that set of sieved data, not the entire dataset.  

Ever since AWS announced Amazon S3 Select, there have been several introductory articles making the rounds and explaining how fast it is compared to S3. Very few spoke about its key merits. This post discusses a few benefits of S3 Select and how useful it is in data protection under GDPR compliance.

Before that, for those who are unaware of Amazon S3 Select, here is a small intro. It is an add-on AWS service that can filter out only the required data from an object in an S3 bucket without retrieving the entire object itself.

The particular data you need from an object is pulled using a standard SQL expression via an API/SDK. Say, for example, there is 1 TB of data in a GZIP-ed file in an S3 bucket. You want to selectively query a specific set of CSV data from this huge file. You can use AWS CLI, query the SQL, and get that required data within minutes.

One of the most lauded features of S3 Select is its ability to simplify and improve the performance of scanning and filtering object content into smaller, targeted dataset by up to 400%.

Coming back to benefits, AWS S3 Select:

Integrates With Other AWS Services

It is interoperable with other AWS services like Lambda function, which makes it easier to pull the necessary data. For example, you can invoke a simple Lambda function to run S3 Select API calls against a set of values to get the select data from the file in S3.

Supports CSV or JSON Files “With or Without” GZIP Compression

This means that you can selectively query a specific set of CSV/JSON data from both GZipped and unzipped files, making the service more flexible.

Eliminates the Need for Compute Resources

While using Amazon S3 Select, your applications no longer have to use compute resources to scan and filter the data from an object. This is one of the primary reasons for the increase in query performance by up to 400%. As per AWS, you need to use SELECT instead of GET to take advantage of S3 Select.

Accelerates Big Data Querying by 5X

S3 Select pulls only the required data and uses 1/40th of the CPU compared to S3. This makes it an ideal service if you are extensively using Big Data frameworks, like Presto, Apache Hive, and Apache Spark and looking to heavy lift all that unwanted data processing. Here’s the link to the AWS SlideShare, if you want to know more.

Is Available on AWS SDK for Ruby, Not Only on AWS SDK for Java and Python or AWS CLI

With the recent announcement from AWS, you can now process selected record events asynchronously with the AWS SDK for Ruby as well, with multiple usage patterns. Get more details about this announcement here.

Availability of AWS S3 Select

S3 Select is available to all AWS customers. You can use the service from the AWS SDK for Java, AWS SDK for Python, AWS CLI and now AWS SDK for Ruby. Its pricing is based on the data scanned and the data returned.

A Pro Use Case: Protecting Data Under GDPR Compliance Using Amazon S3 Select and Macie

With everyone catching up quickly on the GDPR compliance lately, we thought of sharing a simple 3-step exercise to protect the data under GDPR compliance automatically using Amazon Macie and S3 Select.


Image title

  1. Dump your data, including sensitive customer or PCI-DSS data to S3 buckets in the EU region.

  2. Create an S3 Select pipeline from the S3 bucket, wherein you can query only the non-sensitive required data as and when required from other AWS services residing in other regions or the same region.

  3. Before querying the data from S3 Select, you can use Amazon Macie Security service to validate the outward going data from S3 to S3 Select. By doing so, you can ensure that outward going data is not sensitive.

This way, all your data getting pulled from cross region services will be GDPR compliant.

AWS Big data

Published at DZone with permission of Jayashree Hegde Adkoli. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Architecting Petabyte-Scale Hyperspectral Pipelines on AWS
  • Model Context Protocol Vs Agent2Agent: Practical Integration with Enterprise Data
  • AWS Airflow vs Step Functions: The Data Engineering Orchestration Dilemma
  • Enterprise-Grade Document Intelligence: Cloud Big Data AI With YOLOv9 and Spark on AWS

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook