Snowflake Data Sharing and Data Marketplace
Snowflake data sharing and data marketplace can support modern data sharing techniques and eliminate the need for data movement.
Join the DZone community and get the full member experience.Join For Free
Traditional data sharing is cumbersome; it is based on making multiple copies of the data for each consumer and using file transfer protocols (SFTP/FTPS/FTP), emails, or more recently APIs or cloud storage to distribute them. The data consumer side is not easy either. You need to download the files and use some ETL processes to load the data into the target database/data warehouse. While this process is being executed, the data may potentially become obsolete and would need to be refreshed or augmented with the latest set of records. In addition, in many use cases, data security needs to be ensured both in transit and rest; the data provider will need to encrypt the data and the consumer must decrypt it.
Modern data sharing can eliminate the need for data movement thus significantly improving this process. It can also support a virtually unlimited number of consumers with the same data source.
That is what the Snowflake Data Sharing feature makes possible.
Modern Data Sharing Approach in Snowflake
The data-sharing architecture in Snowflake looks as follows:
In Snowflake, there is no need to extract the data from the provider database and use some secure data transfer mechanism to share it with the consumers. Snowflake supports data sharing embedded into their SQL language so databases can be shared from within SQL commands. And on top of that, the data provider can update the data in real-time ensuring that all consumers will have a consistent, up-to-date view of their data sets.
How Data Sharing Works
Snowflake can share regular and external tables, and secure views and secure materialized views. Snowflake enables the sharing of databases through the concept of shares.
The provider creates a share (it is a native Snowflake SQL construct, embedded in their CREATE DDL statement) and the grant access to the providers:
In essence, as we can see, shares are named objects that contain the privileges that grant access to the database, the schema, and the required objects for the defined accounts.
Data sharing is only supported between Snowflake accounts. However, Snowflake has introduced a concept called reader account that offers an effective way to share data without the need that the consumer must become a Snowflake customer. Reader accounts are special Snowflake managed accounts, they can be created from the web user interface or from a DDL command:
A reader account can only consume data from the provider account that created it. In this case, all responsibilities for charing incurred by the reader accounts lie with the data provider.
From a security standpoint, Snowflake highly recommends using secure views instead of granting access directly to the entire table.
From the consumer standpoint, making the share accessible as a database in Snowflake is as simple as executing a CREATE DATABASE SQL statement:
Snowflake Data Marketplace
Snowflake has launched its Data Marketplace which is using this secure data sharing technology to connect 3rd party providers with consumers. Data vendors from a wide variety of industries like finance, health, energy, marketing, media, etc can share their data via Snowflake Data Marketplace. Snowflake admins can explore the Data Marketplace from within the web user interface by clicking on the Data Marketplace icon:
Exploring data marketplace is easy by using filters or free text search:
In order to demo data marketplace functionalities, we are going to explore SEC Reporting Analytics Demo. The data can be made accessible using the Get Data button:
The database can be created from the share via user interface:
The schema objects (tables, views, stages, UDFs, etc) can then be checked from the user interface:
And users can query the shared table from the database that was created from the share, just like any other regular database tables or views:
Snowflake data sharing and data marketplace can support modern data sharing techniques and eliminate the need for data movement, extracting data from the provider side and transferring it to the consumers. It also provides real-time access to the latest version of the data. It helps providers to monetize their data, while offers accurate, up-to-date information for the consumers.
Opinions expressed by DZone contributors are their own.