Snowflake and dbt Integration to Enable Advanced Analytics

This guide describes how to leverage dbt and Snowflake to build high-quality data pipelines, focusing on data and analytics rather than on infrastructure.

Rajesh Remala

Aug. 20, 24 · Tutorial

Likes (2)

Comment

Save

4.1K Views

Data strategies have to be designed today for the modern business person to help in being flexible, growing, and running efficiently. Snowflake, powered by the Data Cloud, enables you to create data-intensive applications without thinking about the infrastructure that is going to manage it so the focus may rest on data and analytics.

dbt is a workflow tool to help teams quickly and collaboratively write analytics code using some of the best practices from software engineering, such as modularity, portability, continuous integration, continuous deployment, and documentation. Anyone who knows SQL can use dbt to build high-quality data pipelines without needing to manage infrastructure.

This guide will walk you through the step-by-step process of how to use dbt with Snowflake. What you will learn in this tutorial is how to run a powerful combination — Snowflake and dbt.

Setting Up Snowflake

Create a Snowflake trial account: You can follow this link to create your account and then select a cloud provider. After creating the account, you will get an email to activate it. Once it's activated, log in to your account.
Accessing dataset: First, navigate to the worksheets tab in your Snowflake instance. Next, create a new worksheet.
If the TPC-H dataset isn't available, feel free to restore it with the following query:

    SQL
   
   create or replace database snowflake_sample_data from share sfc_samples.sample_data;

Querying dataset: Test that access has been appropriately granted for the dataset by running:

    SQL
   
   select * from snowflake_sample_data.tpch_sf1.orders limit 100;

Launching the dbt Cloud With Partner Connect

Set up dbt Cloud: Utilize Snowflake Partner Connect to set up your dbt Cloud account. It will include the management of your Snowflake connection, repository, environments, and credentials.
From the Snowflake UI, click Admin in the left-hand sidebar, then Admin > Admin > Partner Connect:
- Click the dbt Cloud icon to start setup.
- Follow the instructions to sign up for dbt Cloud. Upon authenticating, you will be forwarded to the dbt Cloud setup page. Fill in the details and connect your Snowflake account.
- With your dbt Cloud account created, you will get a confirmation screen saying that the setup of your connection is complete.

IDE Walkthrough

1. Opening the IDE

Now that your dbt Cloud account is configured, let's open up the Integrated Development Environment, or IDE, and familiarize ourselves with some key features.

In the upper left-hand corner of the dbt Cloud interface click on the Develop button.

2. Initialize the dbt Project

When the IDE loads, click the green Initialize dbt project button in the upper left-hand corner of the screen.
Create a dbt project in the file tree on the left-hand side of the screen with all core dbt files and folders necessary.

3. Project Commit Initialization

Once initialization is completed, a new folder structure in the left-hand sidebar will appear. Click commit and push to commit the new files and folders.
In the pop-up window, give a commit message like initialize project and click commit changes.

4. Exploring IDE

DBT Cloud IDE is a text editor, SQL runner, and CLI with Git version control all jammed into a single package. This simply means you can edit your SQL files, and see results with the SQL Runner — it even runs Jinja and builds models at the command line without moving between a host of different applications.
Write our first dbt models. Two example models are included in your dbt project in the models/examples folder. At the bottom left of your screen, type dbt run into the command line, and hit Enter.
The run results will enable you to see the code that dbt compiles and sends to Snowflake for execution. Click one of the model tabs, then click details to see the logs for this run.

5. Check Objects in Snowflake

Toggle over to Snowflake to verify that the objects were created: Refresh Database Objects, and then expand the PC_DBT_DB database. Now, you should be able to see your created models in addition to your development schema.

6. Creating a New Warehouse

Use your worksheet to run these below commands to create a new warehouse for your dbt.

    SQL
   
   use role accountadmin;
create warehouse pc_dbt_wh_large with warehouse_size = large;
grant all on warehouse pc_dbt_wh_large to role pc_dbt_role;

Configuring Your dbt Project

Update the dbt_project.yml file with the paths, materializations, and warehouse settings that are needed by your project.

    YAML
   
 

   name: 'snowflake_workshop'
version: '1.0.0'
config-version: 2

profile: 'default'

source-paths: ["models"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target"  
clean-targets:         
    - "target"
    - "dbt_modules"

models:
  snowflake_workshop:
    staging:
      materialized: view
      snowflake_warehouse: pc_dbt_wh
    marts:
      materialized: table
      snowflake_warehouse: pc_dbt_wh_large

  

Folder structure: Create an organized file tree for your models so that everything is consistent and, more importantly, predictable within your project.

Developing Your Models

Create Sources

Define where the raw data resides in your dbt project. Apply tests and document them.

    YAML
   
 

   version: 2
sources:
  - name: tpch
    description: source tpch data
    database: snowflake_sample_data
    schema: tpch_sf1
    tables:
      - name: orders
        description: main order tracking table
        
        columns:
          - name: o_orderkey
            description: SF*1500000 are sparsely populated
            tests: 
              - unique
              - not_null

      - name: lineitem
        description: main lineitem table
        columns:
          - name: l_orderkey
            description: Foreign Key to O_ORDERKEY
            tests:
              - relationships:
                  to: source('tpch' 'orders')
                  field: o_orderkey

  

Create Staging Models

Build staging models with one-to-one relationships with source tables, doing simple transformations like renaming columns.

    SQL
   
 

   with source as (
    select * from {{ source('tpch', 'orders') }}
)

renamed as (
    select
        o_orderkey as order_key,
        o_custkey as customer_key,
        o_orderstatus as status_code,
        o_totalprice as total_price,
        o_orderdate as order_date,
        o_orderpriority as priority_code,
        o_clerk as clerk_name,
        o_shippriority as ship_priority,
        o_comment as comment
    from source
)

select * from renamed;

  

Running and Testing Models

Run dbt models: Use the command dbt run to build models in your development schema.
Testing: Run dbt test to enable quality in the data and build trust with your stakeholders.
Deploy to production
Review code: Merge your feature branch code into the main branch.
Deploy: Run dbt in production, build your models for real, and run any tests.

Conclusion

You worked through this guide to learn how to build scalable data transformation pipelines for analytics using dbt and Snowflake. Now, having a firm grasp of the basics in place, you can apply these to your data projects and dig deeper into both dbt Cloud and Snowflake.

Analytics Data transformation Integrated development environment Cloud sql

Opinions expressed by DZone contributors are their own.

Related

Trending