DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Data Privacy and Security: A Developer's Guide to Handling Sensitive Data With DuckDB
  • What Is SQL Injection and How Can It Be Avoided?
  • Streamlining Data Integration
  • Snowflake Data Time Travel

Trending

  • The 4 R’s of Pipeline Reliability: Designing Data Systems That Last
  • The Modern Data Stack Is Overrated — Here’s What Works
  • Rethinking Recruitment: A Journey Through Hiring Practices
  • Segmentation Violation and How Rust Helps Overcome It
  1. DZone
  2. Data Engineering
  3. Databases
  4. How to Do a Snowflake Query Pushdown in Talend

How to Do a Snowflake Query Pushdown in Talend

In this article, see how Talend leverages Snowflake query pushdown via ELT.

By 
Rekha Sree user avatar
Rekha Sree
·
Nov. 27, 19 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
15.8K Views

Join the DZone community and get the full member experience.

Join For Free

One snowflake in the air

How to Do a Snowflake Query Pushdown in Talend

In a typical/traditional data warehouse solution, the data is read into ETL memory, processed/transformed in the memory before loading into the target database. With the growing data, the cost of compute is also increasing and hence it becomes vital to look for an alternate design.

Welcome to pushdown query processing. The basic idea of pushdown is that certain parts of SQL queries or the transformation logic can be "Pushed" to where the data resides in the form of generated SQL statements. So instead of bringing the data to processing logic, we take the logic to where data resides. This is very important for performance reasons.

You may also like:  A Deep Dive Into Couchbase N1QL Query Optimization

Snowflake supports Query pushdown with v2.1 and later. This pushdown can help you transition from a traditional ETL process to a more flexible and powerful ELT model. In this blog, I will be showcasing how Talend leverages Snowflake query pushdown via ELT.

ETL vs ELT

Before we get into advance details, let's rejuvenate the basics. With traditional ETL, Extract Transform Load, the data is first extracted, transformed, and then loaded into the target like Snowflake. Here, most of the data transformation like filtering, sorting, aggregation, etc. takes place at ETL tool memory before loading it into the target.

With ELT, Extract Load and Transform, the data is first Extracted, then loaded and then data transformations are performed. With the ELT model, all the data is loaded into Snowflake and then the date transformations are performed directly in Snowflake.

Snowflake offers powerful SQL capabilities via query pushdown thereby enabling data transformation to a more effective ELT model. During development using ELT, it is possible to view the code as it will be executed by Snowflake.

Talend ELT Job Design

In Talend, there are native components to configure pushdown optimization. These components would convert the transformation logic to an SQL query and also send the query to the snowflake database. The Snowflake database runs the query. In Talend, query pushdown can be leveraged using ELT components tELTInput, tELTMap, and tELTOutput. These components are available under ELT -> Map -> DB JDBC

Let's take a quick look at these components

  • tELTInput: This component adds input tables for the SQL statement. There is no restriction on the number of input tables. One can add as many Input tables as required by the SQL statement to be executed.
  • tELTMap: this is the mapping component where the transformation is defined. This component uses the table(s) provided as input to feed the parameter in the built SQL statement. This component converts the transformation into SQL statements.
  • tELTOutput: Carries out the action on the table specified along with the action on the data as specified according to the output schema defined.

Now, let's build a job to use these components and to utilize Snowflake query pushdown. I will explain it with an example. Let's assume that I have two tables in Snowflake named SALES and CITY. Sales table contains details of item sold, unit sold, sales channel (Online or offline) cost of unit, total revenue, total profit as per Region, country. City table is a dimension table which has Country code, population of country. Now the metric which I need to calculate is the total profit for online sales for each item at Region, country level. The result must be inserted into ONLINE_AGG table.

Now to this logic in ELT format, my job would look as given below:

Let's look at this job in more detail. As a best practice, I have used tPrejob to open the snowflake connection and tPostjob to close the connection. I have also used tDie to handle exceptions at various components. The next few section explains in detail the sections marked in the image above (A, B, C, and D).

After adding the input tables, I perform an INNER JOIN on the SALES and CITY table. As shown in the image below. This editor can also be used for providing additional where clause, group by clause and order by clause. In the example, I have given and performed the following transformation:

  • where condition as PUBLIC.SALES.SALES_CHANNEL ='Online'
  • sum(PUBLIC.SALES.TOTAL_PROFIT)
  • as I am doing aggregation on Total_Profit, I have given group by on group by PUBLIC.SALES.REGION , SALES.COUNTRY , PUBLIC.SALES.ITEM_TYPE , PUBLIC.SALES.SALES_CHANNEL columns
  • Null handling for column ITEM_TYPE

These transformations are highlighted in the image below.

Now, the beauty of this component is that as you write the transformation, the SQL gets generated. Click on the 'Generated SQL Select query for table2 output' to see the generated SQL

To validate the results, I copied this SQL and ran it in the Snowflake worksheet.

Now that the job design is completed, let's run the job. At runtime, you will see that the records are not bought into Talend memory.

Instead, the query is executed at Snowflake.

To confirm the execution, let's query the history at Snowflake.

Expanded view of the query executed

Conclusion

In this article, we saw how we could leverage the power of query pushdown with Talend while working with Snowflake. This job design method enables high utilization of snowflake clusters for processing data. Well, that's all for now. Keep watching this space for more blogs, and until then, happy reading!

Further Reading

Database Administration on Snowflake

Database sql Data (computing) Extract, load, transform

Published at DZone with permission of Rekha Sree, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Data Privacy and Security: A Developer's Guide to Handling Sensitive Data With DuckDB
  • What Is SQL Injection and How Can It Be Avoided?
  • Streamlining Data Integration
  • Snowflake Data Time Travel

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!