DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Data Warehouses: The Undying Titans of Information Storage
  • Top 10 Jobs With AWS Certification
  • AI, ML, and Data Science: Shaping the Future of Automation
  • Snowflake vs. Databricks: How to Choose the Right Data Platform

Trending

  • Software Delivery at Scale: Centralized Jenkins Pipeline for Optimal Efficiency
  • ITBench, Part 1: Next-Gen Benchmarking for IT Automation Evaluation
  • Navigating and Modernizing Legacy Codebases: A Developer's Guide to AI-Assisted Code Understanding
  • The Role of AI in Identity and Access Management for Organizations
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Snowflake vs. Redshift: Which Cloud Data Warehouse Is Right for You?

Snowflake vs. Redshift: Which Cloud Data Warehouse Is Right for You?

Analysis of Snowflake and Redshift's scalability, performance, support, security, and more to help determine which one is the best fit for your business.

By 
Ben Putano user avatar
Ben Putano
·
Jul. 07, 21 · Analysis
Likes (3)
Comment
Save
Tweet
Share
6.5K Views

Join the DZone community and get the full member experience.

Join For Free

If data drives your business, then choosing the right cloud data warehouse (CDW) is absolutely critical.

Data warehouses are the foundation of your data analytics program. Your choice will impact the cost of computing, speed to insight, end-user experience, and much more.

One of the biggest debates is between Snowflake and Amazon Redshift. Which is the better CDW?

Snowflake is the hottest thing in big data since Kubernetes. It has simplified cloud data management the way Squarespace simplified website development. The company recently went public and is rapidly expanding its capabilities.

Meanwhile, Redshift is the most comprehensive and reliable data warehouse in the world. It has set the standard for CDWs and paved the way for upstarts like Snowflake.

So which cloud data warehouse is right for you? It truly depends. In this article, we won’t make an argument for one CDW or the other. Instead, we’ll look at the most crucial deciding factors.

First, we’ll discuss capabilities, features, and pricing:

  • Cloud platform support
  • Scalability
  • Performance
  • Security and Encryption
  • Ecosystem and Third-Party Integrations
  • Maintenance
  • Pricing

Then we’ll look at the business cases of Snowflake and Redshift:

  • Use cases: Analytics vs. Machine Learning
  • End-User Experience
  • Company size and available resources

Let’s get started!

TL;DR: An Overview


Snowflake Redshift
Cloud platform support Cloud-agnostic AWS only
Scalability Virtually infinite and instant Traditional nodes limited in scale. New RA3 nodes match Snowflake’s scalability
Performance Near equal, but more automated optimization. Standard bearer. Improving automated optimization features.
Security and Encryption Based on pricing tier A la carte security and encryption
Ecosystem and Third-Party Integrations Fast-growing ecosystem, few native integrations Largest ecosystem and most third-party integrations.
Maintenance Virtually zero maintenance Manual maintenance required. Improving automated features.
Pricing Predefined packages; cryptic credit system; pay for scalability A la carte, discounts on longer contracts, transparent pricing, mix-and-match nodes
Analytics vs. Machine Learning Lacks native machine learning toolset Native integrations with AWS machine learning suite
End-User Experience Easier for non-technical users More customization and larger feature set for technical users
Resources Available Better for limited technical resources, but can get expensive quickly Cost-savings for companies with technical resources

Cloud Platform Support

The first thing to consider is whether Snowflake or Redshift works with your cloud platform of choice.

Redshift — being the Amazon product it is — operates only on AWS. This isn’t a surprise, and it’s not all that limiting, either.

AWS is still far and away the top cloud platform provider. It’s also the largest, which means it has the richest ecosystem of tech and tools around it. If you had to stick to one cloud provider, AWS is a solid choice.

However, many companies prefer the flexibility of being cloud-agnostic. If you run on Google Cloud, Microsoft Azure, or a combination of the Big 3 platforms, then Snowflake is for you.

Scalability

Scalability is one of the main battlegrounds in the debate between Snowflake and Redshift.

Snowflake was built with a fundamentally different architecture than Redshift. It de-coupled storage and compute functions to provide near-instantaneous scalability. This provided a major advantage for Snowflake and made it a viable Redshift competitor.

Traditional Redshift nodes — the storage-heavy DS2 and compute-intensive DC2 nodes — have coupled storage and compute. That makes scaling your clusters more time- and resource-intensive. Every time you need to scale your compute, you also need to add storage capacity and vice versa.

In response to Snowflake, Redshift introduced de-coupled nodes — the RA3 — in December 2019. Like Snowflake, RA3 nodes provide virtually instant and infinite scalability. Users can now pay for compute by the hour, which enables limited-time spikes in usage. RA3 nodes also allow for instant storage upgrades without adding compute resources.

Redshift was able to level the scalability playing field with RA3 nodes.

The added benefit of Redshift, in terms of scalability, is that users can seamlessly pair de-coupled RA3 nodes with the less-expensive DS2 and DC2 nodes. You can mix and match resources to create the optimal data warehouse cluster.

Performance

Complexity of data is a bigger challenge for most companies than size of data.

Scalability is important, but it’s sort of a vanity metric. The reality is, most companies don’t need all that storage and computing power. More important is a CDW’s ability to compute smaller, more complex queries fast.

In other words, the complexity of data is a bigger challenge for most companies than the size of data.

Whether you’re running business analytics or machine learning tasks, you are probably pulling in different types of data from several different sources. This requires you to JOIN and normalize data to prepare it for analysis.

The question is, which CDW is faster at handling complex queries: Redshift or Snowflake?

It’s hard to say. There are very few objective, apples-to-apples benchmarks comparing CDWs. The most reliable benchmark is this one, conducted back in 2018.

According to the study, Snowflake and Redshift were virtually identical in terms of performance on complex queries.

This shouldn’t surprise you. Snowflake wouldn’t be a top 3 (or arguably top 2) data warehouse if performance wasn’t on par with Redshift.

However, Snowflake has one advantage in the performance category, and that’s automated performance optimization. Snowflake handles much of the workload management required for optimal performance. Redshift requires a more manual optimization, but this offers more customization, too.

Snowflake has one advantage in the performance category, and that’s automated performance optimization.

The good news is that whether you choose Snowflake or Redshift, you can expect fast, reliable performance on complex data.

Security and Encryption

Data security is a topic that keeps engineers and their CTOs up at night. It’s an extremely in-depth topic, so we won’t get into it here.

Just know this: CDW security is mostly dependent on your cloud platform provider. The big three clouds are relatively equal in this category.

As for encryption, Snowflake and Redshift handle things a little differently.

Snowflake offers enterprise-grade encryption for data in transit and at rest for all customers. However, higher-level security measures — like annual rekeying and customer-managed keys — are only available for enterprise and business-critical customers.

On Redshift, customers have the option to encrypt data at rest at no extra cost. However, most Redshift customers utilize AWS Key Manage Services (KMS) for higher-level security at an affordable rate. KMS allows you to create customer-managed keys starting at just $1/month.

Redshift also has extensive encryption features around data in transit.

Ecosystem and Third-Party Integrations

When it comes to cloud data ecosystems, there’s still no beating AWS. It’s by far the largest and most comprehensive ecosystem, which means it has the widest variety of third-party integrations.

However, Snowflake’s ecosystem is growing rapidly, especially now that it’s a publicly traded company. Their partner network is growing as well.

Still, Redshift will always have an advantage when it comes to integrating with other AWS products, and the size of their partner network will be hard to beat.

Maintenance

Snowflake has mastered automated CDW maintenance.

Maintenance is one major advantage Snowflake has over Redshift.

Snowflake has mastered automated CDW maintenance, which includes cleaning out unused tables and re-sizing your cluster. This can save your team a lot of time.

Redshift is quickly adding more automated maintenance tasks, such as automated VACUUM and ANALYZE functions to clean up old tables and rows. However, Redshift still requires manual maintenance, especially when it comes to resizing your cluster.

Pricing

Let’s talk dollars and cents.

Redshift and Snowflake both price based on usage per hour, but that’s where the similarities end.

Redshift pricing is on-demand and a la carte. Nodes are simply priced by usage per hour. Other cloud services, including Amazon Managed Storage, are priced separately.

Redshift pricing chart

The benefit of this pricing model is that you can build the perfect CDW that fits your company’s needs. The downside is it requires more effort to put together.

Redshift also offers Reserved Instance pricing for steady workload scenarios. Companies can get steep discounts compared to on-demand pricing, but you have to commit to a 1- or 3-year contract.

Snowflake’s pricing model is completely different.

First, Snowflake has created pre-defined packages: Standard, Enterprise, Business Critical, and Virtual Private Snowflake (VPS). The packages vary in the types of cloud services provided, including encryption levels, materialized views, compliance, and more.

Snowflake pre-defined packages

Second, Snowflake uses a credit system instead of straightforward usage-per-hour pricing.

This is where Snowflake pricing gets tricky. Depending on the size of your warehouse, your cluster could consume anywhere from 1 to 128 credits per hour.

Snowflake credit system

How much is a credit? According to Snowflake documentation, one credit is equal to one server. Different-sized warehouses use different numbers of servers. For example, the XS warehouse has just one server, which equals one credit per hour.

Here’s a complete breakdown of warehouse sizes, servers, and credits/hour:

Breakdown of warehouse sizes, servers, and credits/hour

This leaves just one more question: what is the size of a server, anyway?

Snowflake doesn’t actually share this information, but thanks to a keep investigator on Stack Overflow, we finally have an answer:

Determining size of server -- Stack Overflow messages

Snowflake is built using Amazon Elastic Compute Cloud (EC2). One Snowflake server is equal to one c5d.2xlarge EC2 instance.

Phew. That’s a lot to unpack.

In summary, Snowflake packages are easier to get up and running. Redshift offers more transparent and customizable pricing.

So, what happens if we compare Snowflake and Redshift pricing, apples-to-apples? That’s also challenging because costs vary widely depending on the size of your cluster, additional cloud services, and your region.

One thing is for sure: Redshift is much more cost effective if you stick to storage/compute-coupled nodes (DC2 and DS2).

If you can live without the limitless and instant scalability of Snowflake or Redshift RA3 nodes, then Redshift is the better choice in terms of pricing.

The Business Case for Snowflake vs. Redshift

Hopefully, the feature breakdown above was helpful in determining the better CDW for your organization.

But let’s look at one more dimension of the debate: How do Snowflake and Redshift compare on business terms?

Use Cases: Analytics vs. Machine Learning

Both Redshift and Snowflake are designed for analytical processing. Both are OLAP databases with columnar architecture, which means they are designed for analyzing large volumes of data quickly. In this use case, Snowflake and Redshift are relative equals.

However, Redshift has an advantage in machine learning processing. Redshift integrates seamlessly with Amazon’s Machine Learning and AI toolkit, including Amazon Sagemaker.

Snowflake also integrates with Sagemaker and other machine learning tools like Dataiku, but you lose some of the native functionality between the tools.

End-User Experience

Who will be the primary user of your CDW? Some companies have dedicated DevOps teams, while others rely on their data scientists or business analysts.

From a pure user experience standpoint, Snowflake beats Redshift, hands down. Like we mentioned in the introduction, Snowflake has streamlined the user experience like Squarespace streamlined web design. It’s extremely simple to use.

But simplicity isn’t the most important factor for technical users. For DevOps teams and system architects, customization and feature sets are more highly valued. That’s where Redshift shines.

Company size and available resources

Amazon Redshift is typically preferred by companies with dedicated DevOps teams or highly technical end-users.

Redshift is a more powerful and feature-rich data warehouse, and it comes with cost advantages. However, it’s a more specialized tool that requires technical expertise to run and maintain.

Snowflake is designed as a turnkey solution, so even the most minimally technical users can use it. It’s the preferred choice for companies with limited technical resources. However, costs can add up quickly on Snowflake.

The Final Verdict: Snowflake or Amazon Redshift

From technical capabilities to end-user experience, we have looked at this debate from nearly every angle.

So what’s it going to be, Snowflake or Redshift?

The bottom line is that both Snowflake and Redshift are world-class CDWs. The final choice, my friends, is up to you. Only you can make the final verdict.

Redshift (theory) Data science Big data Cloud computing Data warehouse Machine learning

Published at DZone with permission of Ben Putano, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Data Warehouses: The Undying Titans of Information Storage
  • Top 10 Jobs With AWS Certification
  • AI, ML, and Data Science: Shaping the Future of Automation
  • Snowflake vs. Databricks: How to Choose the Right Data Platform

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!