Data Classification With AWS Macie: Step by Step
In this article, I will explain how you can use Amazon Macie to automatically classify sensitive data in S3 with a quick tutorial for beginners to use Amazon Macie
Join the DZone community and get the full member experience.Join For Free
What Is Amazon Macie?
Amazon Macie is a fully-managed data classification service that helps monitor your information. It uses machine learning technology to continuously analyze and classify content in Amazon Simple Storage Service (S3) buckets.
After you activate Macie, the service starts scanning the contents of your S3 buckets. The initial scan helps establish a baseline of the data, including details on who accesses the data and with which protocols. Next, Macie inspects any request to access the data and provides visualizations on the dashboard.
Macie currently supports Amazon S3. However, Amazon Web Services (AWS) plans to extend coverage to other data storage services, including Amazon Elastic Block Store (Amazon EBS) and Amazon S3 Glacier.
Amazon Macie Pricing
Amazon charges you according to two considerations: the number of S3 buckets Macie continually evaluates for security and access controls at the bucket level and the amount of data Macie processes during sensitive data discovery. For more background on Amazon storage and services pricing, see this excellent post on the AWS pricing model.
Number of S3 Buckets Evaluated
Macie collects data on all your Amazon S3 buckets - this includes details such as the names and sizes of buckets, the number of objects in each bucket, any access controls or resource tags that apply, and encryption status. Macie then evaluates the buckets to control security and access, notifying you of any publicly accessible, unencrypted, or externally shared buckets. After the free trial period of 30 days, you pay according to the total number of S3 buckets associated with your account. Amazon pro rates charges every day.
Amount of Data Processed
When you enable the service, you can configure buckets and submit them for sensitive data discovery. You select the buckets you want Macie to scan and configure a discovery job to identify sensitive data (this could be one-time or periodic), which you submit to Macie. You only pay for data that Macie processes in supported object types. When you run a Macie sensitive data discovery job, you also incur charges for LIST and GET requests, according to the standard Amazon S3 pricing scheme.
Free Tier for Sensitive Data Discovery
Amazon doesn’t charge you for the first GB of data processed each month for sensitive data discovery jobs. All additional data processed in your account incurs charges based on the AWS Region and quantity bracket (i.e., the first 50,000 GB/month, the next 450,000 GB/month, and over 500,000 GB/month).
Discovering Sensitive Data With Amazon Macie
Amazon Macie lets you set up and execute sensitive data discovery jobs, analyzing objects in Amazon S3 buckets to identify sensitive data. You can specify which buckets you want to analyze when you create a job.
Macie provides reports detailing the data found. It can also provide alerts on anomalies and data security issues, and feed relevant security data to an integrated security information and event management (SIEM) system, or other security tools.
Discovery jobs use data identifiers to analyze objects—you can customize your identifiers or use Macie-provided identifiers. Macie-managed data identifiers contain criteria for detecting specific types of sensitive data, such as access keys or payment card details. Custom identifiers detect data based on criteria that you define, useful for company-specific data categories, such as customer accounts and employee ID numbers. These criteria include regular expressions to define text patterns and proximity rules to refine results.
Macie can analyze objects that use supported storage or file formats or Macie-compatible encryption keys. Macie can access objects stored in supported Amazon S3 classes, including S3 Standard, S3 Standard-IA, S3 One Zone-IA, and S3 Intelligent-Tiering, but not S3 Glacier. Buckets with restrictive policies must provide permissions for Macie to access them. If you want to use Macie for sensitive data discovery in other storage systems, you’ll need to move your data to S3, either permanently or temporarily.
You can configure discovery jobs to run once (i.e., for an on-demand analysis) or repeatedly (i.e., for period analyses). You can also select the scope of a job from several options, including custom criteria related to object properties like tags. Each job produces a record of any sensitive data discovered (called a data discovery result) and a report (called a sensitive data finding) detailing any insights. These records and reports help you comply with data privacy and security obligations.
Getting Started With Amazon Macie
Use the following steps to set up and run Amazon Macie.
1. Enable Macie
First, you need to set up the necessary permissions. Go to https://console.aws.amazon.com/macie/ and open the Amazon Macie console.
Next, use the selector for the AWS Region at the page’s top-right corner to choose the Region where you wish to enable Macie. Select get started and enable Macie.
Macie offers an option to create a role linked to a service, granting permissions that allow Macie to call various AWS services for you.
Once enabled, Macie quickly compiles an inventory of all your buckets located in the selected region. Macie monitors the buckets to maintain security and access control. You can review the bucket inventory by selecting S3 buckets in the console’s navigation pane. Select the name of the bucket you want to view to display the details about a bucket, such as statistics and security information.
2. Configure a Results Repository
Macie produces a discovery result for every object analyzed and a sensitive data finding for any object containing sensitive data. Results provide more information than findings, as they include objects that don’t contain sensitive data or that Macie cannot access, in addition to the information from relevant findings.
Macie stores discovery results for a 90-day period by default, so if you want to store them for a longer period, you need to configure S3 bucket storage for your results. If you set this configuration within the first 30 days since you enabled Macie, you can use the S3 bucket as a long-term discovery result repository.
3. Create a Sensitive Data Discovery Job
To create a one-time job using default settings, go to https://console.aws.amazon.com/macie/ and open the Amazon Macie console. Select jobs in the navigation pane, then select create job and continue to the bucket selection step.
You can select specific S3 buckets from an inventory of all buckets in the current Region, checking the box next to each bucket you want to analyze. Next, you can review your S3 buckets selections and refine the scope of the job—select a one-time job.
Next, select all managed data identifiers to configure default settings. You can select next to skip the custom data identifier selection step. You can then enter a job name and optional description. Finally, you can review your configuration settings and create the job. Once you select submit, Macie launches the job.
4. Review Sensitive Data Discovery Findings
Macie uses automatic monitoring to ensure security and access control over your buckets, and it generates data policy findings that report any suspected policy violations. If you create and execute a sensitive data discovery job, Macie creates findings that report any sensitive data discovered.
To review the findings of a discovery job, go to https://console.aws.amazon.com/macie/ and open the Amazon Macie. Select findings in the navigation pane. You can filter findings according to the criteria you enter in the filter bar at the top of the table. Select any field except for the checkbox to view the details of a finding.
In this article, I explained how you can use Amazon Macie to automatically classify sensitive data in S3, and showed a quick tutorial to using Amazon Macie for the first time:
- Enable Amazon Macie in your Amazon account
- Configure a Results Repository, which stores discovery results for 90 days
- Create a sensitive data discovery job, which can be a one-time or recurring job
- Review discovery findings in the Macie console, or in integrated security tools
I hope this will be useful as you improve your treatment and protection of sensitive data in Amazon S3.
Opinions expressed by DZone contributors are their own.