Snowflake and dbt Integration To Enable Advanced Analytics
This guide describes how to leverage dbt and Snowflake to build high-quality data pipelines, focusing on data and analytics rather than on infrastructure.
Join the DZone community and get the full member experience.
Join For FreeData strategies have to be designed today for the modern business person to help in being flexible, growing, and running efficiently. Snowflake, powered by the Data Cloud, enables you to create data-intensive applications without thinking about the infrastructure that is going to manage it so the focus may rest on data and analytics.
dbt is a workflow tool to help teams quickly and collaboratively write analytics code using some of the best practices from software engineering, such as modularity, portability, continuous integration, continuous deployment, and documentation. Anyone who knows SQL can use dbt to build high-quality data pipelines without needing to manage infrastructure.
This guide will walk you through the step-by-step process of how to use dbt with Snowflake. What you will learn in this tutorial is how to run a powerful combination — Snowflake and dbt.
Setting Up Snowflake
- Create a Snowflake trial account: You can follow this link to create your account and then select a cloud provider. After creating the account, you will get an email to activate it. Once it's activated, log in to your account.
- Accessing dataset: First, navigate to the worksheets tab in your Snowflake instance. Next, create a new worksheet.
- If the TPC-H dataset isn't available, feel free to restore it with the following query:
create or replace database snowflake_sample_data from share sfc_samples.sample_data;
- Querying dataset: Test that access has been appropriately granted for the dataset by running:
select * from snowflake_sample_data.tpch_sf1.orders limit 100;
Launching the dbt Cloud With Partner Connect
- Set up dbt Cloud: Utilize Snowflake Partner Connect to set up your dbt Cloud account. It will include the management of your Snowflake connection, repository, environments, and credentials.
- From the Snowflake UI, click Admin in the left-hand sidebar, then Admin > Admin > Partner Connect:
- Click the dbt Cloud icon to start setup.
- Follow the instructions to sign up for dbt Cloud. Upon authenticating, you will be forwarded to the dbt Cloud setup page. Fill in the details and connect your Snowflake account.
- With your dbt Cloud account created, you will get a confirmation screen saying that the setup of your connection is complete.
IDE Walkthrough
1. Opening the IDE
Now that your dbt Cloud account is configured, let's open up the Integrated Development Environment, or IDE, and familiarize ourselves with some key features.
- In the upper left-hand corner of the dbt Cloud interface click on the Develop button.
2. Initialize the dbt Project
- When the IDE loads, click the green Initialize dbt project button in the upper left-hand corner of the screen.
- Create a dbt project in the file tree on the left-hand side of the screen with all core dbt files and folders necessary.
3. Project Commit Initialization
- Once initialization is completed, a new folder structure in the left-hand sidebar will appear. Click commit and push to commit the new files and folders.
- In the pop-up window, give a commit message like initialize project and click commit changes.
4. Exploring IDE
- DBT Cloud IDE is a text editor, SQL runner, and CLI with Git version control all jammed into a single package. This simply means you can edit your SQL files, and see results with the SQL Runner — it even runs Jinja and builds models at the command line without moving between a host of different applications.
- Write our first dbt models. Two example models are included in your dbt project in the models/examples folder. At the bottom left of your screen, type dbt run into the command line, and hit Enter.
- The run results will enable you to see the code that dbt compiles and sends to Snowflake for execution. Click one of the model tabs, then click details to see the logs for this run.
5. Check Objects in Snowflake
- Toggle over to Snowflake to verify that the objects were created: Refresh Database Objects, and then expand the
PC_DBT_DB
database. Now, you should be able to see your created models in addition to your development schema.
6. Creating a New Warehouse
- Use your worksheet to run these below commands to create a new warehouse for your dbt.
use role accountadmin;
create warehouse pc_dbt_wh_large with warehouse_size = large;
grant all on warehouse pc_dbt_wh_large to role pc_dbt_role;
Configuring Your dbt Project
- Update the dbt_project.yml file with the paths, materializations, and warehouse settings that are needed by your project.
name: 'snowflake_workshop'
version: '1.0.0'
config-version: 2
profile: 'default'
source-paths: ["models"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
target-path: "target"
clean-targets:
- "target"
- "dbt_modules"
models:
snowflake_workshop:
staging:
materialized: view
snowflake_warehouse: pc_dbt_wh
marts:
materialized: table
snowflake_warehouse: pc_dbt_wh_large
- Folder structure: Create an organized file tree for your models so that everything is consistent and, more importantly, predictable within your project.
Developing Your Models
Create Sources
Define where the raw data resides in your dbt project. Apply tests and document them.
version: 2
sources:
- name: tpch
description: source tpch data
database: snowflake_sample_data
schema: tpch_sf1
tables:
- name: orders
description: main order tracking table
columns:
- name: o_orderkey
description: SF*1500000 are sparsely populated
tests:
- unique
- not_null
- name: lineitem
description: main lineitem table
columns:
- name: l_orderkey
description: Foreign Key to O_ORDERKEY
tests:
- relationships:
to: source('tpch' 'orders')
field: o_orderkey
Create Staging Models
Build staging models with one-to-one relationships with source tables, doing simple transformations like renaming columns.
with source as (
select * from {{ source('tpch', 'orders') }}
)
renamed as (
select
o_orderkey as order_key,
o_custkey as customer_key,
o_orderstatus as status_code,
o_totalprice as total_price,
o_orderdate as order_date,
o_orderpriority as priority_code,
o_clerk as clerk_name,
o_shippriority as ship_priority,
o_comment as comment
from source
)
select * from renamed;
Running and Testing Models
- Run dbt models: Use the command
dbt run
to build models in your development schema. - Testing: Run dbt test to enable quality in the data and build trust with your stakeholders.
- Deploy to production
- Review code: Merge your feature branch code into the main branch.
- Deploy: Run dbt in production, build your models for real, and run any tests.
Conclusion
You worked through this guide to learn how to build scalable data transformation pipelines for analytics using dbt and Snowflake. Now, having a firm grasp of the basics in place, you can apply these to your data projects and dig deeper into both dbt Cloud and Snowflake.
Opinions expressed by DZone contributors are their own.
Comments