DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Enhanced Security for Your Secrets With AWS Secrets Manager
  • Top 10 Jobs With AWS Certification
  • The Truth About AI and Job Loss
  • AWS WAF Classic vs WAFV2: Features and Migration Considerations

Trending

  • Apple and Anthropic Partner on AI-Powered Vibe-Coding Tool – Public Release TBD
  • Build a Simple REST API Using Python Flask and SQLite (With Tests)
  • How to Create a Successful API Ecosystem
  • Introducing Graph Concepts in Java With Eclipse JNoSQL
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Data Classification With AWS Macie: Step by Step

Data Classification With AWS Macie: Step by Step

In this article, I will explain how you can use Amazon Macie to automatically classify sensitive data in S3 with a quick tutorial for beginners to use Amazon Macie

By 
Gilad David Maayan user avatar
Gilad David Maayan
DZone Core CORE ·
Dec. 15, 21 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
7.0K Views

Join the DZone community and get the full member experience.

Join For Free

What Is Amazon Macie?

Amazon Macie is a fully-managed data classification service that helps monitor your information. It uses machine learning technology to continuously analyze and classify content in Amazon Simple Storage Service (S3) buckets. 

After you activate Macie, the service starts scanning the contents of your S3 buckets. The initial scan helps establish a baseline of the data, including details on who accesses the data and with which protocols. Next, Macie inspects any request to access the data and provides visualizations on the dashboard. 

Macie currently supports Amazon S3. However, Amazon Web Services (AWS) plans to extend coverage to other data storage services, including Amazon Elastic Block Store (Amazon EBS) and Amazon S3 Glacier.

Amazon Macie Pricing

Amazon charges you according to two considerations: the number of S3 buckets Macie continually evaluates for security and access controls at the bucket level and the amount of data Macie processes during sensitive data discovery. For more background on Amazon storage and services pricing, see this excellent post on the AWS pricing model.

  • Number of S3 Buckets Evaluated 

Macie collects data on all your Amazon S3 buckets - this includes details such as the names and sizes of buckets, the number of objects in each bucket, any access controls or resource tags that apply, and encryption status. Macie then evaluates the buckets to control security and access, notifying you of any publicly accessible, unencrypted, or externally shared buckets. After the free trial period of 30 days, you pay according to the total number of S3 buckets associated with your account. Amazon pro rates charges every day. 

  • Amount of Data Processed 

When you enable the service, you can configure buckets and submit them for sensitive data discovery. You select the buckets you want Macie to scan and configure a discovery job to identify sensitive data (this could be one-time or periodic), which you submit to Macie. You only pay for data that Macie processes in supported object types. When you run a Macie sensitive data discovery job, you also incur charges for LIST and GET requests, according to the standard Amazon S3 pricing scheme. 

  • Free Tier for Sensitive Data Discovery

Amazon doesn’t charge you for the first GB of data processed each month for sensitive data discovery jobs. All additional data processed in your account incurs charges based on the AWS Region and quantity bracket (i.e., the first 50,000 GB/month, the next 450,000 GB/month, and over 500,000 GB/month).

Discovering Sensitive Data With Amazon Macie

Amazon Macie lets you set up and execute sensitive data discovery jobs, analyzing objects in Amazon S3 buckets to identify sensitive data. You can specify which buckets you want to analyze when you create a job.

Macie provides reports detailing the data found. It can also provide alerts on anomalies and data security issues, and feed relevant security data to an integrated security information and event management (SIEM) system, or other security tools.

Discovery jobs use data identifiers to analyze objects—you can customize your identifiers or use Macie-provided identifiers. Macie-managed data identifiers contain criteria for detecting specific types of sensitive data, such as access keys or payment card details. Custom identifiers detect data based on criteria that you define, useful for company-specific data categories, such as customer accounts and employee ID numbers. These criteria include regular expressions to define text patterns and proximity rules to refine results.

Macie can analyze objects that use supported storage or file formats or Macie-compatible encryption keys. Macie can access objects stored in supported Amazon S3 classes, including S3 Standard, S3 Standard-IA, S3 One Zone-IA, and S3 Intelligent-Tiering, but not S3 Glacier. Buckets with restrictive policies must provide permissions for Macie to access them. If you want to use Macie for sensitive data discovery in other storage systems, you’ll need to move your data to S3, either permanently or temporarily. 

You can configure discovery jobs to run once (i.e., for an on-demand analysis) or repeatedly (i.e., for period analyses). You can also select the scope of a job from several options, including custom criteria related to object properties like tags. Each job produces a record of any sensitive data discovered (called a data discovery result) and a report (called a sensitive data finding) detailing any insights. These records and reports help you comply with data privacy and security obligations.

Getting Started With Amazon Macie

Use the following steps to set up and run Amazon Macie.

1. Enable Macie

First, you need to set up the necessary permissions. Go to https://console.aws.amazon.com/macie/ and open the Amazon Macie console. 

Next, use the selector for the AWS Region at the page’s top-right corner to choose the Region where you wish to enable Macie. Select get started and enable Macie.

Macie offers an option to create a role linked to a service, granting permissions that allow Macie to call various AWS services for you. 

Once enabled, Macie quickly compiles an inventory of all your buckets located in the selected region. Macie monitors the buckets to maintain security and access control. You can review the bucket inventory by selecting S3 buckets in the console’s navigation pane. Select the name of the bucket you want to view to display the details about a bucket, such as statistics and security information.

2. Configure a Results Repository 

Macie produces a discovery result for every object analyzed and a sensitive data finding for any object containing sensitive data. Results provide more information than findings, as they include objects that don’t contain sensitive data or that Macie cannot access, in addition to the information from relevant findings. 

Macie stores discovery results for a 90-day period by default, so if you want to store them for a longer period, you need to configure S3 bucket storage for your results. If you set this configuration within the first 30 days since you enabled Macie, you can use the S3 bucket as a long-term discovery result repository.

3. Create a Sensitive Data Discovery Job

To create a one-time job using default settings, go to https://console.aws.amazon.com/macie/ and open the Amazon Macie console. Select jobs in the navigation pane, then select create job and continue to the bucket selection step. 

You can select specific S3 buckets from an inventory of all buckets in the current Region, checking the box next to each bucket you want to analyze. Next, you can review your S3 buckets selections and refine the scope of the job—select a one-time job. 

Next, select all managed data identifiers to configure default settings. You can select next to skip the custom data identifier selection step. You can then enter a job name and optional description. Finally, you can review your configuration settings and create the job. Once you select submit, Macie launches the job.

4. Review Sensitive Data Discovery Findings

Macie uses automatic monitoring to ensure security and access control over your buckets, and it generates data policy findings that report any suspected policy violations. If you create and execute a sensitive data discovery job, Macie creates findings that report any sensitive data discovered.

To review the findings of a discovery job, go to https://console.aws.amazon.com/macie/ and open the Amazon Macie. Select findings in the navigation pane. You can filter findings according to the criteria you enter in the filter bar at the top of the table. Select any field except for the checkbox to view the details of a finding.

Conclusion

In this article, I explained how you can use Amazon Macie to automatically classify sensitive data in S3, and showed a quick tutorial to using Amazon Macie for the first time:

  1. Enable Amazon Macie in your Amazon account
  2. Configure a Results Repository, which stores discovery results for 90 days
  3. Create a sensitive data discovery job, which can be a one-time or recurring job
  4. Review discovery findings in the Macie console, or in integrated security tools

I hope this will be useful as you improve your treatment and protection of sensitive data in Amazon S3.

AWS Data security Amazon Web Services career Discovery (law)

Opinions expressed by DZone contributors are their own.

Related

  • Enhanced Security for Your Secrets With AWS Secrets Manager
  • Top 10 Jobs With AWS Certification
  • The Truth About AI and Job Loss
  • AWS WAF Classic vs WAFV2: Features and Migration Considerations

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!