DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Challenges With Traditional Data Sharing and Emergence of Delta Sharing to the Rescue

Challenges With Traditional Data Sharing and Emergence of Delta Sharing to the Rescue

This article provides insight into Delta Sharing, how it reduces ELT's complexity, and where it stands, along with other data-sharing solutions.

Sandip Roy user avatar by
Sandip Roy
·
Mar. 14, 23 · Review
Like (2)
Save
Tweet
Share
1.64K Views

Join the DZone community and get the full member experience.

Join For Free

With the increasing number of organizations championing data as a strategic asset and creating financial value from sharing data, sharing of data remained a challenge. While use cases are endless, starting from data monetization strategies in enterprises to data as a service from fleet management to drug discovery and then to real-time public data feeds of environmental data such as climate change or water resources and many others.

And yet, sharing data across different platforms, companies, and clouds is no easy task. Almost all of them lack today’s open-format, multi-cloud, and performance standards.

Databricks Delta Sharing overcomes most of the above problems in its own way. This is the industry’s first-ever open protocol, an open standard for sharing data in a secure manner. Users can then access that data securely within and now between organizations.

Also, it opens the floodgate of sharing and consuming data from external sources allowing collaboration with customers, establishing new partnerships, and hence generating avenues for new revenues. 

Where Do Current Data-Sharing Solutions Leave Us?

Commercial DBs/DWHs 

Commercial DB and DWH vendors can share data across their systems by installing (and licensing) a new instance of their product. With this approach, you are locked into that vendor’s solution, their restrictions in scale, and their availability on specific cloud platforms (and their pricing).

sFTP  

Putting data on an (s)FTP server for data sharing is vendor-agnostic and open source and works across clouds but clearly lacks scalability. 

Object Storage URLs 

All CSPs allow you to share objects with an URL. You profit from the availability and durability guarantees of object storage. Still, it’s more like low-level storage where files are more like objects, but your data scientists and data engineers want to work with tables and CRUD operations on tables.

What Delta Sharing Brings to the Table for Customers?

Sharing of Real-Time/Batch Data Without Replication  

With data physically hosted on cloud storage, Delta sharing facilitates sharing of data from your Lakehouse/data lake without physically copying the data outside your environment, saving substantial egress costs, unlike few Cloud DWH solutions.

Highly Secured, Tracked, and Governed 

It allows granting, tracking, and auditing of shared data from a centralized place called Unity Catalog. We can also define how long the recipient can access the in terms of hours, months, days, etc., and eventually, after that, access is revoked automatically.

Scalability  

You can share data at any scale by leveraging the underneath cloud storage systems in a more economic and efficient manner.

Support for a Diverse Set of Recipients 

The recipient platform can be neutral, i.e., no obligation to be a certain/specific computing platform, i.e., recipients can be another Databricks account in a different region, different cloud provider, or it can be a simple client leveraging APIs from Pandas, Apache Spark, or any BI tools, data science notebooks like Google Colab, Amazon Sagemaker, and many other systems.

Comparative view of different sharing solutions. 

How Does It Work?

Delta Sharing is essentially a REST protocol that follows a lake-first approach, so your data stays on the cloud object store with Provider and Recipient as the two main constructs of it.

Data Provider and Data Recipient

 

Data Provider decides what data they want to share and runs a sharing server that implements delta sharing protocol and manages access for Data Recipients. In contrast, recipients consume the share using as delta sharing clients.

Once the request is made by the recipient, the same is validated using the provider token to execute the query from the table. 

After validation is complete, the Delta sharing server creates short-lived URLs for the client or data recipient to read the live data that this client has access to from the delta table parallelly at any scale with the consistent tabular view. 

Summary

This article provides insight into Delta Sharing and how it reduces the complexity of ELT, and where it stands along with other data-sharing solutions. All these secure and live data sharing capabilities of Delta Sharing promote a scalable and tightly coupled interaction between data providers and consumers within the Lakehouse paradigm.

Data science Data sharing DELTA (taxonomy)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • AWS CodeCommit and GitKraken Basics: Essential Skills for Every Developer
  • How To Build a Spring Boot GraalVM Image
  • Low-Code Development: The Future of Software Development
  • Accelerating Enterprise Software Delivery Through Automated Release Processes in Scaled Agile Framework (SAFe)

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: