DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Exploring Intercooler.js: Simplify AJAX With HTML Attributes
  • Building an AI/ML Data Lake With Apache Iceberg
  • AWS S3 Strategies for Scalable and Secure Data Lake Storage
  • The Future of Data Lakehouses: Apache Iceberg Explained

Trending

  • DZone's Article Submission Guidelines
  • Enforcing Architecture With ArchUnit in Java
  • Chat With Your Knowledge Base: A Hands-On Java and LangChain4j Guide
  • GitHub Copilot's New AI Coding Agent Saves Developers Time – And Requires Their Oversight
  1. DZone
  2. Data Engineering
  3. Data
  4. Attribute-Level Governance Using Apache Iceberg Tables

Attribute-Level Governance Using Apache Iceberg Tables

This article explains how data filter options in lake formation can be fruitful in managing fine-grained access leveraging Apache Iceberg tables.

By 
Ankur Srivastava user avatar
Ankur Srivastava
·
Mar. 17, 25 · Analysis
Likes (1)
Comment
Save
Tweet
Share
2.5K Views

Join the DZone community and get the full member experience.

Join For Free

Large organizations where the number of users accessing crucial data is pretty high have to face a lot of challenges in managing fine-grained access.

A variety of AWS services like IAM, Lake Formation, and S3 ACL can help in fine-grained access control. But there are scenarios where a single entity containing the global data needs to be accessed by multiple user groups across the system with restrictive access. Also, organizations with a global presence might be working in different environments and with different toolsets, so data movement and cataloging become very tedious.

For example, a user wants to access the sales data from a table for analytics purposes, but he should be restricted to accessing only Australia region-related sales data. No other data should be visible to him. Also, he wants to access the data from a different cloud platform for multiple DML operations, so he needs to bring data and transform it into the tool’s native format for processing, which causes delays.

For this kind of scenario, we require data control at the attribute level and data across environments to support the native toolset formats and faster access.

We took a step ahead to address these challenges and deliver a cloud transformation solution leveraging Lake Formation for data governance on Apache Iceberg table, which can be queried and catalogued in AWS S3 itself and can be accessed across platforms and clouds.

Using the data filter option in Lake Formation, we can ensure column-level security, row-level security, and cell-level security.

What Is the Iceberg Table Format?

 Iceberg is an open-source table format with the following benefits:

  • Iceberg fully supports flexible SQL commands, making it possible to update, merge, and delete the data. Iceberg can be used to rewrite data files to enhance read performance and use delete deltas to quicken the pace of updates.
  • Iceberg supports full schema evolution. Schema updates in Iceberg tables change only the metadata, leaving the data files themselves unaffected. Schema evolution changes include adds, drops, renaming, reordering, and type promotions.
  • Data stored in a data lake or data mesh architecture is available to multiple independent applications across an organization simultaneously.
  • Iceberg is designed for use with huge analytical data sets. It offers multiple features designed to increase querying speed and efficiency, including fast scan planning, pruning metadata files that aren’t needed, and the ability to filter out data files that don’t contain matching data.

Solution Overview

The solution we have proposed is using Lake Formation service to create data filters on which we can grant permissions to the user for access. The heart of the solution is using the Iceberg table format, which is catalogued and then added with filter conditions to govern access.

Solution overview

Data Flow

  1. DMS or Glue is used to fetch data from the source system repositories to store it in a designated S3 bucket. 
  2. The event-based architecture triggers an event as S3 pushes to call the respective Lambda function to start the ETL process.
  3. Data will be stored in Iceberg table format and will be cataloged.
  4. Data can be processed and transformed using Glue, leveraging the GenAI readymade models.
  5. Processed data will be stored in Redshift for consumption.
  6. Cataloged Iceberg tables will be added with the tag column (tag value is mapped to the user group). 

The image below describes a sample data filter and how it looks. We can also limit the number of columns using the data filters.

A sample data filter


Once the filter is created, we can then use the grant permission option to give permission to users, roles, groups, and accounts. The user can use Athena to query the data.

The various capabilities of our solution are:

  • Ability to effectively manage the fine-grained control of access to the data. 
  • Reusability of the data filters for multiple user groups.
  • We can achieve column-level security, row-level security, and cell-level security.
  • Effective use of Apache Iceberg table format features for seamless control over the data and its access.
  • Efficiency and effectiveness in data preparation.
  • Centralized access management and governance using lake formation.
  • Less manual intervention in the fully integrated solution.
  • End-to-end data delivery using cloud agnostic solution and serverless components to provide scalability and cost effectiveness.

Benefits

  • Operational efficiency. The use of serverless components reduces the operational and maintenance overheads involved in managing it.
  • Effort optimization. Up to 20-30% reduction in effort by using GenAI models to generate standardized and efficient ETL scripts.
  • Governance and compliance benefits. Attribute-based control in lake formation helps to comply with the standard regulations and provide audit and logging capabilities.

Industrial Usage

Attribute-level governance using Apache Iceberg table can be very seamlessly implemented in the financial sector, like a bank or insurance company, where customers need to have restricted access to the data, ensuring authenticity and security of the data. The healthcare sector can use it to generate and share the patient's electronic health record in a fast manner, ensuring the sensitivity of data, which can lead to timely treatment and medication.

Conclusion

So, the overall solution will deliver attribute-level governance at scale with data preparation in a speedy manner using the Apache Iceberg table format needed for most organizations and implementing the solution leveraging Amazon Cloud services, which offers the benefit of quick wins, optimal cost, and unlimited scalability.

AWS Attribute (computing) Data (computing) Apache

Opinions expressed by DZone contributors are their own.

Related

  • Exploring Intercooler.js: Simplify AJAX With HTML Attributes
  • Building an AI/ML Data Lake With Apache Iceberg
  • AWS S3 Strategies for Scalable and Secure Data Lake Storage
  • The Future of Data Lakehouses: Apache Iceberg Explained

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!