DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • From Data Lakes to Intelligence Lakes: Augmenting Apache Iceberg With Generative AI Metadata on AWS
  • Scaling Cloud Data Automation: A Practical Guide to Open Table Formats
  • Why SAP S/4HANA Landscape Design Impacts Cloud TCO More Than Compute Costs
  • Cost Is a Distributed Systems Bug

Trending

  • Every Cache Miss Is a Tiny Tax on Your Performance
  • Stateless JWT Auth Microservice Architecture With Spring Boot 3 and Redis Sentinel
  • Pragmatica Aether: Let Java Be Java
  • Event-Driven Pipelines With Apache Pulsar and Go
  1. DZone
  2. Data Engineering
  3. Data
  4. Attribute-Level Governance Using Apache Iceberg Tables

Attribute-Level Governance Using Apache Iceberg Tables

This article explains how data filter options in lake formation can be fruitful in managing fine-grained access leveraging Apache Iceberg tables.

By 
Ankur Srivastava user avatar
Ankur Srivastava
·
Mar. 17, 25 · Analysis
Likes (2)
Comment
Save
Tweet
Share
3.7K Views

Join the DZone community and get the full member experience.

Join For Free

Large organizations where the number of users accessing crucial data is pretty high have to face a lot of challenges in managing fine-grained access.

A variety of AWS services like IAM, Lake Formation, and S3 ACL can help in fine-grained access control. But there are scenarios where a single entity containing the global data needs to be accessed by multiple user groups across the system with restrictive access. Also, organizations with a global presence might be working in different environments and with different toolsets, so data movement and cataloging become very tedious.

For example, a user wants to access the sales data from a table for analytics purposes, but he should be restricted to accessing only Australia region-related sales data. No other data should be visible to him. Also, he wants to access the data from a different cloud platform for multiple DML operations, so he needs to bring data and transform it into the tool’s native format for processing, which causes delays.

For this kind of scenario, we require data control at the attribute level and data across environments to support the native toolset formats and faster access.

We took a step ahead to address these challenges and deliver a cloud transformation solution leveraging Lake Formation for data governance on Apache Iceberg table, which can be queried and catalogued in AWS S3 itself and can be accessed across platforms and clouds.

Using the data filter option in Lake Formation, we can ensure column-level security, row-level security, and cell-level security.

What Is the Iceberg Table Format?

 Iceberg is an open-source table format with the following benefits:

  • Iceberg fully supports flexible SQL commands, making it possible to update, merge, and delete the data. Iceberg can be used to rewrite data files to enhance read performance and use delete deltas to quicken the pace of updates.
  • Iceberg supports full schema evolution. Schema updates in Iceberg tables change only the metadata, leaving the data files themselves unaffected. Schema evolution changes include adds, drops, renaming, reordering, and type promotions.
  • Data stored in a data lake or data mesh architecture is available to multiple independent applications across an organization simultaneously.
  • Iceberg is designed for use with huge analytical data sets. It offers multiple features designed to increase querying speed and efficiency, including fast scan planning, pruning metadata files that aren’t needed, and the ability to filter out data files that don’t contain matching data.

Solution Overview

The solution we have proposed is using Lake Formation service to create data filters on which we can grant permissions to the user for access. The heart of the solution is using the Iceberg table format, which is catalogued and then added with filter conditions to govern access.

Solution overview

Data Flow

  1. DMS or Glue is used to fetch data from the source system repositories to store it in a designated S3 bucket. 
  2. The event-based architecture triggers an event as S3 pushes to call the respective Lambda function to start the ETL process.
  3. Data will be stored in Iceberg table format and will be cataloged.
  4. Data can be processed and transformed using Glue, leveraging the GenAI readymade models.
  5. Processed data will be stored in Redshift for consumption.
  6. Cataloged Iceberg tables will be added with the tag column (tag value is mapped to the user group). 

The image below describes a sample data filter and how it looks. We can also limit the number of columns using the data filters.

A sample data filter


Once the filter is created, we can then use the grant permission option to give permission to users, roles, groups, and accounts. The user can use Athena to query the data.

The various capabilities of our solution are:

  • Ability to effectively manage the fine-grained control of access to the data. 
  • Reusability of the data filters for multiple user groups.
  • We can achieve column-level security, row-level security, and cell-level security.
  • Effective use of Apache Iceberg table format features for seamless control over the data and its access.
  • Efficiency and effectiveness in data preparation.
  • Centralized access management and governance using lake formation.
  • Less manual intervention in the fully integrated solution.
  • End-to-end data delivery using cloud agnostic solution and serverless components to provide scalability and cost effectiveness.

Benefits

  • Operational efficiency. The use of serverless components reduces the operational and maintenance overheads involved in managing it.
  • Effort optimization. Up to 20-30% reduction in effort by using GenAI models to generate standardized and efficient ETL scripts.
  • Governance and compliance benefits. Attribute-based control in lake formation helps to comply with the standard regulations and provide audit and logging capabilities.

Industrial Usage

Attribute-level governance using Apache Iceberg table can be very seamlessly implemented in the financial sector, like a bank or insurance company, where customers need to have restricted access to the data, ensuring authenticity and security of the data. The healthcare sector can use it to generate and share the patient's electronic health record in a fast manner, ensuring the sensitivity of data, which can lead to timely treatment and medication.

Conclusion

So, the overall solution will deliver attribute-level governance at scale with data preparation in a speedy manner using the Apache Iceberg table format needed for most organizations and implementing the solution leveraging Amazon Cloud services, which offers the benefit of quick wins, optimal cost, and unlimited scalability.

AWS Attribute (computing) Data (computing) Apache

Opinions expressed by DZone contributors are their own.

Related

  • From Data Lakes to Intelligence Lakes: Augmenting Apache Iceberg With Generative AI Metadata on AWS
  • Scaling Cloud Data Automation: A Practical Guide to Open Table Formats
  • Why SAP S/4HANA Landscape Design Impacts Cloud TCO More Than Compute Costs
  • Cost Is a Distributed Systems Bug

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook