DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Transforming Data Analytics by Combining SQL and ML
  • How Doris Connects to Multiple Databases for Federated Analytics and Data Migration
  • Cloud-Driven Analytics Solution Strategy in Healthcare
  • Efficient Long-Term Trend Analysis in Presto Using Datelists

Trending

  • Detection and Mitigation of Lateral Movement in Cloud Networks
  • Orchestrating Microservices with Dapr: A Unified Approach
  • Docker Base Images Demystified: A Practical Guide
  • Secrets Sprawl and AI: Why Your Non-Human Identities Need Attention Before You Deploy That LLM
  1. DZone
  2. Data Engineering
  3. Data
  4. Snowflake and dbt Integration to Enable Advanced Analytics

Snowflake and dbt Integration to Enable Advanced Analytics

This guide describes how to leverage dbt and Snowflake to build high-quality data pipelines, focusing on data and analytics rather than on infrastructure.

By 
Rajesh Remala user avatar
Rajesh Remala
·
Aug. 20, 24 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
4.1K Views

Join the DZone community and get the full member experience.

Join For Free

Data strategies have to be designed today for the modern business person to help in being flexible, growing, and running efficiently. Snowflake, powered by the Data Cloud, enables you to create data-intensive applications without thinking about the infrastructure that is going to manage it so the focus may rest on data and analytics.

dbt is a workflow tool to help teams quickly and collaboratively write analytics code using some of the best practices from software engineering, such as modularity, portability, continuous integration, continuous deployment, and documentation. Anyone who knows SQL can use dbt to build high-quality data pipelines without needing to manage infrastructure.

This guide will walk you through the step-by-step process of how to use dbt with Snowflake. What you will learn in this tutorial is how to run a powerful combination — Snowflake and dbt. 

Setting Up Snowflake

  • Create a Snowflake trial account: You can follow this link to create your account and then select a cloud provider. After creating the account, you will get an email to activate it. Once it's activated, log in to your account.
  • Accessing dataset: First, navigate to the worksheets tab in your Snowflake instance. Next, create a new worksheet.
  •  If the TPC-H dataset isn't available, feel free to restore it with the following query:
SQL
 
create or replace database snowflake_sample_data from share sfc_samples.sample_data;


  • Querying dataset: Test that access has been appropriately granted for the dataset by running:
SQL
 
select * from snowflake_sample_data.tpch_sf1.orders limit 100;


Launching the dbt Cloud With Partner Connect

  • Set up dbt Cloud: Utilize Snowflake Partner Connect to set up your dbt Cloud account. It will include the management of your Snowflake connection, repository, environments, and credentials.
  • From the Snowflake UI, click Admin in the left-hand sidebar, then Admin > Admin > Partner Connect:
    • Click the dbt Cloud icon to start setup.
    • Follow the instructions to sign up for dbt Cloud. Upon authenticating, you will be forwarded to the dbt Cloud setup page. Fill in the details and connect your Snowflake account.
    • With your dbt Cloud account created, you will get a confirmation screen saying that the setup of your connection is complete.

IDE Walkthrough

1. Opening the IDE

Now that your dbt Cloud account is configured, let's open up the Integrated Development Environment, or IDE, and familiarize ourselves with some key features.

  •  In the upper left-hand corner of the dbt Cloud interface click on the Develop button.

2. Initialize the dbt Project

  • When the IDE loads, click the green Initialize dbt project button in the upper left-hand corner of the screen.
  • Create a dbt project in the file tree on the left-hand side of the screen with all core dbt files and folders necessary.

3. Project Commit Initialization

  • Once initialization is completed, a new folder structure in the left-hand sidebar will appear. Click commit and push to commit the new files and folders.
  • In the pop-up window, give a commit message like initialize project and click commit changes.

4. Exploring IDE

  • DBT Cloud IDE is a text editor, SQL runner, and CLI with Git version control all jammed into a single package. This simply means you can edit your SQL files, and see results with the SQL Runner — it even runs Jinja and builds models at the command line without moving between a host of different applications.
  • Write our first dbt models. Two example models are included in your dbt project in the models/examples folder. At the bottom left of your screen, type dbt run into the command line, and hit Enter.
  • The run results will enable you to see the code that dbt compiles and sends to Snowflake for execution. Click one of the model tabs, then click details to see the logs for this run.

5. Check Objects in Snowflake

  • Toggle over to Snowflake to verify that the objects were created: Refresh Database Objects, and then expand the PC_DBT_DB database. Now, you should be able to see your created models in addition to your development schema. 

6. Creating a New Warehouse

  • Use your worksheet to run these below commands to create a new warehouse for your dbt.
SQL
 
use role accountadmin;
create warehouse pc_dbt_wh_large with warehouse_size = large;
grant all on warehouse pc_dbt_wh_large to role pc_dbt_role;


Configuring Your dbt Project

  • Update the dbt_project.yml file with the paths, materializations, and warehouse settings that are needed by your project.
YAML
 
name: 'snowflake_workshop'
version: '1.0.0'
config-version: 2

profile: 'default'

source-paths: ["models"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target"  
clean-targets:         
    - "target"
    - "dbt_modules"

models:
  snowflake_workshop:
    staging:
      materialized: view
      snowflake_warehouse: pc_dbt_wh
    marts:
      materialized: table
      snowflake_warehouse: pc_dbt_wh_large


  • Folder structure: Create an organized file tree for your models so that everything is consistent and, more importantly, predictable within your project.

Developing Your Models

Create Sources

Define where the raw data resides in your dbt project. Apply tests and document them. 

YAML
 
version: 2
sources:
  - name: tpch
    description: source tpch data
    database: snowflake_sample_data
    schema: tpch_sf1
    tables:
      - name: orders
        description: main order tracking table
        
        columns:
          - name: o_orderkey
            description: SF*1500000 are sparsely populated
            tests: 
              - unique
              - not_null

      - name: lineitem
        description: main lineitem table
        columns:
          - name: l_orderkey
            description: Foreign Key to O_ORDERKEY
            tests:
              - relationships:
                  to: source('tpch' 'orders')
                  field: o_orderkey


Create Staging Models

Build staging models with one-to-one relationships with source tables, doing simple transformations like renaming columns.  

SQL
 
with source as (
    select * from {{ source('tpch', 'orders') }}
)

renamed as (
    select
        o_orderkey as order_key,
        o_custkey as customer_key,
        o_orderstatus as status_code,
        o_totalprice as total_price,
        o_orderdate as order_date,
        o_orderpriority as priority_code,
        o_clerk as clerk_name,
        o_shippriority as ship_priority,
        o_comment as comment
    from source
)

select * from renamed;


Running and Testing Models

  • Run dbt models: Use the command dbt run to build models in your development schema.
  • Testing: Run dbt test to enable quality in the data and build trust with your stakeholders.
  • Deploy to production
  • Review code: Merge your feature branch code into the main branch.
  • Deploy: Run dbt in production, build your models for real, and run any tests. 

Conclusion

You worked through this guide to learn how to build scalable data transformation pipelines for analytics using dbt and Snowflake. Now, having a firm grasp of the basics in place, you can apply these to your data projects and dig deeper into both dbt Cloud and Snowflake.

Analytics Data transformation Integrated development environment Cloud sql

Opinions expressed by DZone contributors are their own.

Related

  • Transforming Data Analytics by Combining SQL and ML
  • How Doris Connects to Multiple Databases for Federated Analytics and Data Migration
  • Cloud-Driven Analytics Solution Strategy in Healthcare
  • Efficient Long-Term Trend Analysis in Presto Using Datelists

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!