Integrating Snowflake With Trino
Trino simplifies accessing Snowflake, highlighting its versatility in providing robust data management and analytics solutions for modern enterprises.
Join the DZone community and get the full member experience.
Join For FreeIn today's discourse, we delve into the intricacies of accessing Snowflake via the Trino project. This article illuminates the seamless integration of Trino with Snowflake, offering a comprehensive analysis of its benefits and implications.
Previous Articles
Previous articles on Snowflake and Trino:
- SaaS Galore: Integrating CockroachDB with Confluent Kafka, Fivetran, and Snowflake
- Data federation with CockroachDB and Presto
Motivation
A common query among potential adopters of Snowflake pertains to its compatibility with on-premise data and cloud platforms like Azure. In this article, we address this question head-on, exploring the feasibility of accessing Snowflake alongside on-premise data through the Trino project. Let's unravel the possibilities together.
High-Level Steps
- Deploy Trino in Docker
- Get a trial Snowflake account
- Connect the dots
- Conclusion
Step-By-Step Instructions
Navigating the landscape of data integration can be daunting, especially when considering the compatibility of Snowflake with on-premise environments. In this tutorial, we aim to simplify the process by utilizing a Docker environment to simulate on-premise conditions. Our approach prioritizes simplicity, leveraging standard Snowflake configurations and a basic Trino Docker setup. It's essential to consult your documentation for specific scenarios, but let's begin with the fundamentals.
Deploy Trino in Docker
I have a compose file called compose-trino.yaml
with the following contents:
services: trino: container_name: trino hostname: trino build: trino/. ports: - "8080:8080" environment: - _JAVA_OPTIONS=-Dfile.encoding=UTF-8 volumes: - ./trino/catalog:/etc/trino/catalog - ./trino/etc:/etc/trino
In the current directory, I have a folder called trino
. Within the folder, I have the following files:
FROM trinodb/trino:442
LABEL version="1.0"
LABEL description="trino container"
ENV REFRESHED_AT 2024_03_15
I also have two more folders called etc
and catalog
.
Within the catalog
directory, I've set up a snowflake.properties
file with the following contents:
connector.name=snowflake connection-url=jdbc:snowflake://<account>.snowflakecomputing.com connection-user=root connection-password=secret snowflake.account=account snowflake.database=database snowflake.role=role snowflake.warehouse=warehouse
If you encounter any hurdles along the way, don't hesitate to refer to the comprehensive Trino documentation available here. Let's dive in!
Once you set up the Snowflake environment, you can adjust these properties with your values.
Within the etc
directory, I have a jvm.config
with the following contents:
--add-opens=java.base/java.nio=ALL-UNNAMED -Djdk.module.illegalAccess=permit
These particular JDK flags are Snowflake-specific.
I also have config.properties
with the following contents:
coordinator=true node-scheduler.include-coordinator=true http-server.http.port=8080 discovery.uri=http://example.net:8080
And finally, node.properties
with the following contents:
node.environment=production node.id=ffffffff-ffff-ffff-ffff-ffffffffffff node.data-dir=/tmp/trino/data
With everything in place, you're now ready to initiate the Compose environment. Execute the following command to start the environment: docker compose -f compose-trino.yaml up -d
.
Upon successful configuration, you should observe a running container named trino
. You can confirm this by executing the command: docker ps
.
f426506aa443 snowflake-docker-trino "/usr/lib/trino/bin/…" 53 minutes ago Up 47 minutes (healthy) 0.0.0.0:8080->8080/tcp trino
If you encounter any issues, you can further troubleshoot by examining the Trino logs using the following command: docker logs trino
.
You can access the Trino container with the following command:
docker exec -it trino trino
Once logged in, you can verify the correct configuration of the Snowflake catalog by executing the following command:
trino> show catalogs; Catalog ----------- snowflake system
For the next phase of this tutorial, kindly proceed to sign up for a Snowflake Trial Account. Opt for the standard edition since we won't be utilizing enterprise features. During the signup process, I selected the Azure eastus2 region for my Snowflake deployment.
Upon completing the signup, you'll receive a verification email. Once verified, you'll gain access to your Snowflake environment. Retrieve the necessary details from the email sent by Snowflake, particularly the credentials, and populate the snowflake.properties
file located in the trino/catalog
directory.
Connect the Dots
Snowflake provides a variety of demo tutorials, including the Tasty Bytes series. For this tutorial, we'll focus on the "Load sample data with SQL from S3 bucket" worksheet. Alternatively, feel free to select a dataset of your preference.
---> set the Role USE ROLE accountadmin; ---> set the Warehouse USE WAREHOUSE compute_wh; ---> create the Tasty Bytes Database CREATE OR REPLACE DATABASE tasty_bytes_sample_data; ---> create the Raw POS (Point-of-Sale) Schema CREATE OR REPLACE SCHEMA tasty_bytes_sample_data.raw_pos; ---> create the Raw Menu Table CREATE OR REPLACE TABLE tasty_bytes_sample_data.raw_pos.menu ( menu_id NUMBER(19,0), menu_type_id NUMBER(38,0), menu_type VARCHAR(16777216), truck_brand_name VARCHAR(16777216), menu_item_id NUMBER(38,0), menu_item_name VARCHAR(16777216), item_category VARCHAR(16777216), item_subcategory VARCHAR(16777216), cost_of_goods_usd NUMBER(38,4), sale_price_usd NUMBER(38,4), menu_item_health_metrics_obj VARIANT ); ---> confirm the empty Menu table exists SELECT * FROM tasty_bytes_sample_data.raw_pos.menu; ---> create the Stage referencing the Blob location and CSV File Format CREATE OR REPLACE STAGE tasty_bytes_sample_data.public.blob_stage url = 's3://sfquickstarts/tastybytes/' file_format = (type = csv); ---> query the Stage to find the Menu CSV file LIST @tasty_bytes_sample_data.public.blob_stage/raw_pos/menu/; ---> copy the Menu file into the Menu table COPY INTO tasty_bytes_sample_data.raw_pos.menu FROM @tasty_bytes_sample_data.public.blob_stage/raw_pos/menu/; ---> how many rows are in the table? SELECT COUNT(*) AS row_count FROM tasty_bytes_sample_data.raw_pos.menu; ---> what do the top 10 rows look like? SELECT TOP 10 * FROM tasty_bytes_sample_data.raw_pos.menu; ---> what menu items does the Freezing Point brand sell? SELECT menu_item_name FROM tasty_bytes_sample_data.raw_pos.menu WHERE truck_brand_name = 'Freezing Point'; ---> what is the profit on Mango Sticky Rice? SELECT menu_item_name, (sale_price_usd - cost_of_goods_usd) AS profit_usd FROM tasty_bytes_sample_data.raw_pos.menu WHERE 1=1 AND truck_brand_name = 'Freezing Point' AND menu_item_name = 'Mango Sticky Rice'; ---> to finish, let's extract the Mango Sticky Rice ingredients from the semi-structured column SELECT m.menu_item_name, obj.value:"ingredients"::ARRAY AS ingredients FROM tasty_bytes_sample_data.raw_pos.menu m, LATERAL FLATTEN (input => m.menu_item_health_metrics_obj:menu_item_health_metrics) obj WHERE 1=1 AND truck_brand_name = 'Freezing Point' AND menu_item_name = 'Mango Sticky Rice';
We have a dataset in Snowflake, so let's now pivot back to Trino and access the Snowflake data from there.
If your Compose environment is currently active but lacks essential configurations such as snowflake.database
, snowflake.warehouse
or other pertinent Snowflake properties, it's crucial to halt the environment. Before proceeding, ensure these properties are appropriately configured. Once adjusted, you can restart the Compose environment and continue with the integration process seamlessly.
docker compose -f compose-trino.yaml down
Back in the snowflake.properties
file, change the properties to:
connection-user=snowflakeuser connection-password=snowflakepassword snowflake.database=tasty_bytes_sample_data snowflake.role=accountadmin snowflake.warehouse=compute_wh
Restart the environment and access the Trino shell.
Within the Trino shell, type:
use snowflake.raw_pos;
Since the Snowflake catalog is already configured to connect to our Trino environment, we can omit the database name from the fully qualified table name. Pick any of the above queries from the Snowflake worksheet and try running them in the Trino container.
trino:raw_pos> SELECT COUNT(*) AS row_count FROM raw_ pos.menu; row_count ----------- 100 (1 row) Query 20240315_185131_00013_45g27, FINISHED, 1 node Splits: 1 total, 1 done (100.00%) 0.84 [1 rows, 0B] [1 rows/s, 0B/s]
trino:raw_pos> SELECT -> menu_item_name -> FROM raw_pos.menu -> WHERE truck_brand_name = 'Freezing Poi nt'; menu_item_name -------------------- Lemonade Sugar Cone Waffle Cone Two Scoop Bowl Bottled Water Bottled Soda Ice Tea Ice Cream Sandwich Mango Sticky Rice Popsicle (10 rows) Query 20240315_185212_00015_45g27, FINISHED, 1 node Splits: 1 total, 1 done (100.00%) 1.23 [10 rows, 0B] [8 rows/s, 0B/s]
Indeed, accessing Snowflake datasets using Trino from our local environment demonstrates the flexibility and interoperability of these tools. This integration allows us to seamlessly work with data across different platforms, enhancing our analytical capabilities and workflow efficiency.
Additionally, you can access the Trino UI via http://localhost:8080. With the default configuration, no password is required, and the username is set to admin
. By navigating to the "finished queries" section, you can review the queries you've executed, providing valuable insights into your workflow and facilitating debugging if needed. This feature enhances visibility and transparency into your data operations within the Trino environment.
Conclusion
Trino and its commercial version, Starburst, are potent tools for federating data across different sources. This article showcases how easily Snowflake can be accessed using local tools with Trino. The synergy between Snowflake and Trino offers a robust solution for data management and analytics, empowering organizations to leverage cloud data warehousing and distributed query processing for enhanced insights.
Published at DZone with permission of Artem Ervits. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments