DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • The Magic of Apache Spark in Java
  • How Trustworthy Is Big Data?
  • Enhancing Avro With Semantic Metadata Using Logical Types
  • A Deep Dive into Apache Doris Indexes

Trending

  • A Simple, Convenience Package for the Azure Cosmos DB Go SDK
  • Detection and Mitigation of Lateral Movement in Cloud Networks
  • FIPS 140-3: The Security Standard That Protects Our Federal Data
  • Beyond Code Coverage: A Risk-Driven Revolution in Software Testing With Machine Learning
  1. DZone
  2. Data Engineering
  3. Data
  4. Improving Query Performance in Snowflake and Its Related Costs

Improving Query Performance in Snowflake and Its Related Costs

In the fourth article of the Snowflake blog series, let's take a look at the different optimization techniques to help improve query performance.

By 
Kedhar Natekar user avatar
Kedhar Natekar
·
Jun. 12, 23 · Analysis
Likes (1)
Comment
Save
Tweet
Share
3.8K Views

Join the DZone community and get the full member experience.

Join For Free

In the previous article, we understood how to Optimally Use Snowflake Warehouse and Tables.

So let’s continue this blog series, where we will now focus on improving the performance in Snowflake and its associated costs.

As we know, Snowflake tables are micro-partitioned, which significantly improves query performance. However, over time, you may experience slowness in queries due to increased DML operations on large datasets. 

In such cases, you can cluster the table or use the search optimization service to improve performance. Before deciding which option to choose, you should consider the specific use cases in which they are most beneficial and the associated cost impacts. Otherwise, you may encounter fewer optimizations and higher maintenance costs.

Clustering in Snowflake

Clustering in Snowflake shouldn’t be confused with partitioning in big data terms.

Clustering in Snowflake is a process where it still manages the micro partitions but now in the order of the clustering key defined rather than the order of data inserts. This enhances the performance. There will be less need for a micro-partition scan for a defined clustered key value.

When choosing clustering over a table, make sure to consider the following:

  • When the table contains multiple terabytes (TB) of data, the table performance degrades over time due to too many updates and deletes over it.
  • Using a maximum of 3-4 clustering columns is recommended for a clustering key. Beyond that, there wouldn’t be a significant impact.
  • Order the columns in clustering with most used filters, then by most actively used join predicates, and then used ORDER BY, GROUP BY clauses.
  • Use a clustering key column that has the following: 
    • Large enough number of distinct values to enable effective pruning on the table.
    • Small enough number of distinct values to allow Snowflake to effectively group rows in the same micro-partitions.
  • Order the columns from lowest cardinality to highest cardinality.
  • If a high cardinality column needs to be still used for clustering, then define the key as an expression on the column rather than on the column directly to reduce the number of distinct values.

Costs Associated With Reclustering

When an existing table is clustered:

  • Original micro-partitions are marked deleted but retained in the system to enable Time Travel and Fail-safe. They are purged only after Time Travel and Fail-safe.
  • New micro-partitions will be created of the same size as older partitions. Hence, maintaining old and new partitions is an additional cost.

Eg: Before reclustering, say the table size is 100 GB, and if 2 GB is added daily and 1 GB is deleted, then the original table would now be of 101 GB, and the Time Travel size would be 2 GB for that day. But after reclustering, the original will still hold 101 GB of active partitions according to the cluster key, and Time Travel will have 100 GB of data, too, according to old micro-partitions. This 100 GB of Time Travel data will stay until it’s moved to the Fail-safe zone.

  • Clustering always comes up with automatic reclustering cost in the background in Snowflake’s self-managed warehouse, and the associated cost should also be considered. 

Search Optimization Queries Services

Clustering does not guarantee improved performance on non-clustered columns. 

If you have frequent queries on non-clustered columns and performance is the key irrespective of cost, then opt for a search optimization service over the entire table of specific columns.

It’s similar to enabling indexing on RDBMS databases like Oracle on specific columns.

Cost Impact of Using Search Optimization Queries

Storage Resources

The search optimization service creates a search access path data structure that requires space for each table on which search optimization is enabled.

The storage cost of the search access path depends upon multiple factors:

  • The number of distinct values in the table.
  • In the extreme case where all columns have data types that use the search access path, and all data values in each column are unique, the required storage can be as much as the original table’s size.
  • Typically, however, the size is approximately 1/4 of the original table’s size.

Compute Resources

  • Resource consumption is higher when there is high churn.
  • These costs are roughly proportional to the amount of data ingested (added or changed) and distinct values on the table.
  • Deletes also have some cost.
  • Automatic clustering, while improving the latency of queries in tables with search optimization, can further increase the maintenance costs of search optimization.
  • Use the SYSTEM$ESTIMATE_SEARCH_OPTIMIZATION_COSTS function to estimate the cost of adding search optimization to a table.

To Reduce the Cost While Using Search Optimization

  • DELETE less frequently.
  • INSERT, UPDATE, and MERGE: Batching these types of DML statements on the table can reduce the maintenance cost by the search optimization service.
  • If you recluster the entire table, consider dropping the SEARCH OPTIMIZATION property for that table before reclustering, and then add the SEARCH OPTIMIZATION property back to the table after reclustering.
Big data Database clustering optimization

Published at DZone with permission of Kedhar Natekar. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • The Magic of Apache Spark in Java
  • How Trustworthy Is Big Data?
  • Enhancing Avro With Semantic Metadata Using Logical Types
  • A Deep Dive into Apache Doris Indexes

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!