Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Amazon Redshift Spectrum vs. Amazon Athena

DZone's Guide to

Amazon Redshift Spectrum vs. Amazon Athena

When it comes to AWS Redshift and Athena Spectrum, which serverless cloud database is right for your use case? Here are four questions to ask that will help you decide.

· Cloud Zone ·
Free Resource

Site24x7 - Full stack It Infrastructure Monitoring from the cloud. Sign up for free trial.

Over the past year, AWS announced two serverless database technologies: Amazon Redshift Spectrum and Amazon Athena. With both services claiming to run queries of unstructured data stored on Amazon S3 without having to load or transform them, and both offering similar pricing, it wasn't very clear how they differ and what to choose.

How is Amazon Redshift Spectrum different than Amazon Athena? Most of the discussion of this question is centered around the technical differences. Rather than looking at this question from a technical perspective, I thought exploring it as a buying question might be useful.

So how do you decide if using Amazon Redshift Spectrum or Amazon Athena makes sense? Here are four questions you can ask yourself to help get a sense of which is best for your case.

1. Am I an Existing Redshift Customer? Yes!

If you are already a Redshift customer, the use Amazon Spectrum can help you balance the need for adding capacity to the system. This can save you a money, since you can lifecycle data out of Redshift to S3. For example, you have a 100 GB transactional table of infrequently accessed data. Why pay to store that in Redshift when moving it to S3 and querying it with Spectrum is an option? Be advised that you are still paying “per query” via Spectrum the same as you would be charged in Athena. The benefit of this approach is offloading data so you can be more efficient with local storage in Redshift.

As an existing Redshift user, I would be less inclined to use Athena because of my existing investment in Redshift and any ancillary data operations that process data into it.

2. Am I an Existing Redshift Customer? No!

If you are not a Redshift customer, then it becomes more interesting. Assuming you have objects on S3 that Athena can consume, then you might start with Athena, rather than spinning up Redshift. Remember that access to Spectrum requires an active, running Redshift instance: Redshift Spectrum is not an option without Redshift. Access to the “Redshift+Redshift Spectrum” tandem has costs that might not be worthwhile (right now). Athena might make more sense given that fact.

3. Do My Analytic Tools Support Amazon Athena?

It might be the case that your analytic tool of choice does not support Athena, but does support Redshift. For example, Tableau 10.3 officially released support for Athena. Looker also released support for Athena. However, there are many tools that don’t support Athena. The flip side is they also don’t support Spectrum. So this gets back to the first point (1) around what your current stack includes. If you went down the Athena path, your tool choices are currently more limited than Redshift. This could be a deal breaker for some. The Redshift path gives your more analytics options at the moment.

4. Can I Use Both Amazon Athena and Redshift Spectrum? Yes!

If Athena or Spectrum are candidates for your workflows, then you are likely structuring your data in a manner that could support either tool. Why? Athena and Spectrum can both access the same object on S3. I can query a 1 TB Parquet file on S3 in Athena the same as Spectrum. If your data is optimized on S3 in the Apache Parquet format, then you are well positioned for Athena AND Spectrum. The transition between the two becomes somewhat trivial. What becomes critical is cost and performance considerations related to the file format you employ. Uncompressed text (i.e. CSV) is perfect if you have cash to burn. If your data is stored in Apache Parquet files, it is also trivial to switch contexts between Spectrum and Athena. The uses case for using both might be limited, but they are not mutually exclusive choices if the need arises.

This post goes into the Apache Parquet topic in more detail: How to Be a Hero with Powerful Parquet, Google, and Amazon

Athena Now Has a REST API

Originally, Athena was JDBC centric. However, Amazon recently released a REST API for Athena: Amazon Athena adds API/CLI, AWS SDK support, and audit logging with AWS CloudTrail.

This might open some interesting new use cases for Athena. I can see people building custom UI wrappers with D3.js and similar frameworks.

Summary

Amazon is continuing to scale its product offerings at an accelerated rate. It can be difficult to keep up from product, technical, and business value persepctives. Amazon Redshift Spectrum and Amazon Athena are evolutions of the AWS solution stack. Having the capability to leverage this type of query service provides new flexibility to teams to tailor their data workflows to fit their needs.

Site24x7 - Full stack It Infrastructure Monitoring from the cloud. Sign up for free trial.

Topics:
aws ,redshift spectrum ,athena ,serverless ,cloud

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}