Snowflake vs. Redshift: Which Cloud Data Warehouse Is Right for You?
Analysis of Snowflake and Redshift's scalability, performance, support, security, and more to help determine which one is the best fit for your business.
Join the DZone community and get the full member experience.Join For Free
If data drives your business, then choosing the right cloud data warehouse (CDW) is absolutely critical.
Data warehouses are the foundation of your data analytics program. Your choice will impact the cost of computing, speed to insight, end-user experience, and much more.
One of the biggest debates is between Snowflake and Amazon Redshift. Which is the better CDW?
Snowflake is the hottest thing in big data since Kubernetes. It has simplified cloud data management the way Squarespace simplified website development. The company recently went public and is rapidly expanding its capabilities.
Meanwhile, Redshift is the most comprehensive and reliable data warehouse in the world. It has set the standard for CDWs and paved the way for upstarts like Snowflake.
So which cloud data warehouse is right for you? It truly depends. In this article, we won’t make an argument for one CDW or the other. Instead, we’ll look at the most crucial deciding factors.
First, we’ll discuss capabilities, features, and pricing:
- Cloud platform support
- Security and Encryption
- Ecosystem and Third-Party Integrations
Then we’ll look at the business cases of Snowflake and Redshift:
- Use cases: Analytics vs. Machine Learning
- End-User Experience
- Company size and available resources
Let’s get started!
TL;DR: An Overview
|Cloud platform support||Cloud-agnostic||AWS only|
|Scalability||Virtually infinite and instant||Traditional nodes limited in scale. New RA3 nodes match Snowflake’s scalability|
|Performance||Near equal, but more automated optimization.||Standard bearer. Improving automated optimization features.|
|Security and Encryption||Based on pricing tier||A la carte security and encryption|
|Ecosystem and Third-Party Integrations||Fast-growing ecosystem, few native integrations||Largest ecosystem and most third-party integrations.|
|Maintenance||Virtually zero maintenance||Manual maintenance required. Improving automated features.|
|Pricing||Predefined packages; cryptic credit system; pay for scalability||A la carte, discounts on longer contracts, transparent pricing, mix-and-match nodes|
|Analytics vs. Machine Learning||Lacks native machine learning toolset||Native integrations with AWS machine learning suite|
|End-User Experience||Easier for non-technical users||More customization and larger feature set for technical users|
|Resources Available||Better for limited technical resources, but can get expensive quickly||Cost-savings for companies with technical resources|
Cloud Platform Support
The first thing to consider is whether Snowflake or Redshift works with your cloud platform of choice.
Redshift — being the Amazon product it is — operates only on AWS. This isn’t a surprise, and it’s not all that limiting, either.
AWS is still far and away the top cloud platform provider. It’s also the largest, which means it has the richest ecosystem of tech and tools around it. If you had to stick to one cloud provider, AWS is a solid choice.
However, many companies prefer the flexibility of being cloud-agnostic. If you run on Google Cloud, Microsoft Azure, or a combination of the Big 3 platforms, then Snowflake is for you.
Scalability is one of the main battlegrounds in the debate between Snowflake and Redshift.
Snowflake was built with a fundamentally different architecture than Redshift. It de-coupled storage and compute functions to provide near-instantaneous scalability. This provided a major advantage for Snowflake and made it a viable Redshift competitor.
Traditional Redshift nodes — the storage-heavy DS2 and compute-intensive DC2 nodes — have coupled storage and compute. That makes scaling your clusters more time- and resource-intensive. Every time you need to scale your compute, you also need to add storage capacity and vice versa.
In response to Snowflake, Redshift introduced de-coupled nodes — the RA3 — in December 2019. Like Snowflake, RA3 nodes provide virtually instant and infinite scalability. Users can now pay for compute by the hour, which enables limited-time spikes in usage. RA3 nodes also allow for instant storage upgrades without adding compute resources.
Redshift was able to level the scalability playing field with RA3 nodes.
The added benefit of Redshift, in terms of scalability, is that users can seamlessly pair de-coupled RA3 nodes with the less-expensive DS2 and DC2 nodes. You can mix and match resources to create the optimal data warehouse cluster.
Complexity of data is a bigger challenge for most companies than size of data.
Scalability is important, but it’s sort of a vanity metric. The reality is, most companies don’t need all that storage and computing power. More important is a CDW’s ability to compute smaller, more complex queries fast.
In other words, the complexity of data is a bigger challenge for most companies than the size of data.
Whether you’re running business analytics or machine learning tasks, you are probably pulling in different types of data from several different sources. This requires you to JOIN and normalize data to prepare it for analysis.
The question is, which CDW is faster at handling complex queries: Redshift or Snowflake?
It’s hard to say. There are very few objective, apples-to-apples benchmarks comparing CDWs. The most reliable benchmark is this one, conducted back in 2018.
According to the study, Snowflake and Redshift were virtually identical in terms of performance on complex queries.
This shouldn’t surprise you. Snowflake wouldn’t be a top 3 (or arguably top 2) data warehouse if performance wasn’t on par with Redshift.
However, Snowflake has one advantage in the performance category, and that’s automated performance optimization. Snowflake handles much of the workload management required for optimal performance. Redshift requires a more manual optimization, but this offers more customization, too.
Snowflake has one advantage in the performance category, and that’s automated performance optimization.
The good news is that whether you choose Snowflake or Redshift, you can expect fast, reliable performance on complex data.
Security and Encryption
Data security is a topic that keeps engineers and their CTOs up at night. It’s an extremely in-depth topic, so we won’t get into it here.
Just know this: CDW security is mostly dependent on your cloud platform provider. The big three clouds are relatively equal in this category.
As for encryption, Snowflake and Redshift handle things a little differently.
Snowflake offers enterprise-grade encryption for data in transit and at rest for all customers. However, higher-level security measures — like annual rekeying and customer-managed keys — are only available for enterprise and business-critical customers.
On Redshift, customers have the option to encrypt data at rest at no extra cost. However, most Redshift customers utilize AWS Key Manage Services (KMS) for higher-level security at an affordable rate. KMS allows you to create customer-managed keys starting at just $1/month.
Redshift also has extensive encryption features around data in transit.
Ecosystem and Third-Party Integrations
When it comes to cloud data ecosystems, there’s still no beating AWS. It’s by far the largest and most comprehensive ecosystem, which means it has the widest variety of third-party integrations.
However, Snowflake’s ecosystem is growing rapidly, especially now that it’s a publicly traded company. Their partner network is growing as well.
Still, Redshift will always have an advantage when it comes to integrating with other AWS products, and the size of their partner network will be hard to beat.
Snowflake has mastered automated CDW maintenance.
Maintenance is one major advantage Snowflake has over Redshift.
Snowflake has mastered automated CDW maintenance, which includes cleaning out unused tables and re-sizing your cluster. This can save your team a lot of time.
Redshift is quickly adding more automated maintenance tasks, such as automated VACUUM and ANALYZE functions to clean up old tables and rows. However, Redshift still requires manual maintenance, especially when it comes to resizing your cluster.
Let’s talk dollars and cents.
Redshift and Snowflake both price based on usage per hour, but that’s where the similarities end.
Redshift pricing is on-demand and a la carte. Nodes are simply priced by usage per hour. Other cloud services, including Amazon Managed Storage, are priced separately.
The benefit of this pricing model is that you can build the perfect CDW that fits your company’s needs. The downside is it requires more effort to put together.
Redshift also offers Reserved Instance pricing for steady workload scenarios. Companies can get steep discounts compared to on-demand pricing, but you have to commit to a 1- or 3-year contract.
Snowflake’s pricing model is completely different.
First, Snowflake has created pre-defined packages: Standard, Enterprise, Business Critical, and Virtual Private Snowflake (VPS). The packages vary in the types of cloud services provided, including encryption levels, materialized views, compliance, and more.
Second, Snowflake uses a credit system instead of straightforward usage-per-hour pricing.
This is where Snowflake pricing gets tricky. Depending on the size of your warehouse, your cluster could consume anywhere from 1 to 128 credits per hour.
How much is a credit? According to Snowflake documentation, one credit is equal to one server. Different-sized warehouses use different numbers of servers. For example, the XS warehouse has just one server, which equals one credit per hour.
Here’s a complete breakdown of warehouse sizes, servers, and credits/hour:
This leaves just one more question: what is the size of a server, anyway?
Snowflake doesn’t actually share this information, but thanks to a keep investigator on Stack Overflow, we finally have an answer:
Snowflake is built using Amazon Elastic Compute Cloud (EC2). One Snowflake server is equal to one c5d.2xlarge EC2 instance.
Phew. That’s a lot to unpack.
In summary, Snowflake packages are easier to get up and running. Redshift offers more transparent and customizable pricing.
So, what happens if we compare Snowflake and Redshift pricing, apples-to-apples? That’s also challenging because costs vary widely depending on the size of your cluster, additional cloud services, and your region.
One thing is for sure: Redshift is much more cost effective if you stick to storage/compute-coupled nodes (DC2 and DS2).
If you can live without the limitless and instant scalability of Snowflake or Redshift RA3 nodes, then Redshift is the better choice in terms of pricing.
The Business Case for Snowflake vs. Redshift
Hopefully, the feature breakdown above was helpful in determining the better CDW for your organization.
But let’s look at one more dimension of the debate: How do Snowflake and Redshift compare on business terms?
Use Cases: Analytics vs. Machine Learning
Both Redshift and Snowflake are designed for analytical processing. Both are OLAP databases with columnar architecture, which means they are designed for analyzing large volumes of data quickly. In this use case, Snowflake and Redshift are relative equals.
However, Redshift has an advantage in machine learning processing. Redshift integrates seamlessly with Amazon’s Machine Learning and AI toolkit, including Amazon Sagemaker.
Snowflake also integrates with Sagemaker and other machine learning tools like Dataiku, but you lose some of the native functionality between the tools.
Who will be the primary user of your CDW? Some companies have dedicated DevOps teams, while others rely on their data scientists or business analysts.
From a pure user experience standpoint, Snowflake beats Redshift, hands down. Like we mentioned in the introduction, Snowflake has streamlined the user experience like Squarespace streamlined web design. It’s extremely simple to use.
But simplicity isn’t the most important factor for technical users. For DevOps teams and system architects, customization and feature sets are more highly valued. That’s where Redshift shines.
Company size and available resources
Amazon Redshift is typically preferred by companies with dedicated DevOps teams or highly technical end-users.
Redshift is a more powerful and feature-rich data warehouse, and it comes with cost advantages. However, it’s a more specialized tool that requires technical expertise to run and maintain.
Snowflake is designed as a turnkey solution, so even the most minimally technical users can use it. It’s the preferred choice for companies with limited technical resources. However, costs can add up quickly on Snowflake.
The Final Verdict: Snowflake or Amazon Redshift
From technical capabilities to end-user experience, we have looked at this debate from nearly every angle.
So what’s it going to be, Snowflake or Redshift?
The bottom line is that both Snowflake and Redshift are world-class CDWs. The final choice, my friends, is up to you. Only you can make the final verdict.
Published at DZone with permission of Ben Putano, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.