DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. BigQuery vs. Athena: User Experience, Cost, and Performance

BigQuery vs. Athena: User Experience, Cost, and Performance

Are you wondering whether to utilize Google BigQuery or AWS Athena services? Read this article to get a head start using these services, identify their differences and pick the best for your use case.

Asaf Yigal user avatar by
Asaf Yigal
·
Jan. 27, 17 · Opinion
Like (0)
Save
Tweet
Share
5.64K Views

Join the DZone community and get the full member experience.

Join For Free

the trend of moving to serverless is going strong, and both google bigquery and aws athena are proof of that. both platforms aim to solve many of the same challenges such as managing and querying large data repositories.

announced in 2012 , google describes bigquery as a “fully managed, petabyte scale, low-cost analytics data warehouse.” you can load your data from google cloud storage or google cloud datastore or stream it from outside the cloud and use bigquery to run real-time analysis of your data.

in comparison, amazon athena , which was released only recently at the 2016 aws re:invent conference, is described as an “interactive querying service” that makes it easy to analyze data stored on amazon s3 using standard sql-run ad-hoc queries.

in this post, we will show you how you can use both tools and discuss the differences between them such as cost and performance.

user experience

executing queries using bigquery

bigquery can be used via its web ui or sdk . in this section, we will briefly explain the terms used in the bigquery service and discuss how you can quickly load data and execute queries.

before you begin, you will need to go on the google cloud platform to create the project.

project page on google cloud

after creating a new project (bigquery api is enabled by default for new projects), you can go to the bigquery page page.

bigquery allows querying tables that are native (in google cloud) or external (outside) as well as logical views . users can load data into bigquery storage using batch loads or via stream and define the jobs to load, export, query, or copy data. the data formats that can be loaded into bigquery are csv, json, avro, and cloud datastore backups.

bigquery welcome page

in our demo, we used a simple public dataset and general data that can be used by anyone such as that of major league baseball (nice!).

we can click the compose query red button to enter the query we want to execute against the desired table. in our case, we looked to display 5,000 rows from the wikipedia public dataset bigquery-public-data:samples .

we ran the query:

select * from [bigquery-public-data:samples.wikipedia] limit 5000;

and got the results in the table as shown below.

note that bigquery returned the results in 2.3 seconds, scanning over 35.7 gb of data. different types of aggregations can be executed, for example, to sum the number of characters to return the lengths of articles.

the list of 5,000 rows, from the wikipedia dataset

the aggregation shown below was completed in 2 seconds, scanning over 2.34 gb of data.

the sum of the number of characters column in the wikipedia table

in addition to the web ui and command line, users can also connect to bigquery using odbc and jdbc drivers to enable it to be used locally with popular sql tools such as sql workbench.

executing queries using amazon athena

aws athena is based on the hive metastore and presto , where the athena syntax is comprised of ansi sql for queries and relational operations such as select and join as well as hive ql dll statements for altering the metadata such as create or alter .

like bigquery, athena supports access using jdbc drivers, where tools like sql workbench can be used to query amazon s3. the data formats that can be loaded in s3 and used by athena are csv, tsv, parquet serde, orc, json, apache web server logs, and customer delimiters. compressed formats like snappy, zlib, and gzip can also be loaded.

amazon athena’s web ui is similar to bigquery when it comes to defining the dataset and tables. through the getting started with athena page, you can start using sample data and learn how the interactive querying tool works.

as shown below, you can access athena using the aws management console. in our case, we chose to query elb logs:

aws athena home page

let’s try a few queries to see how quickly the results are returned. the queries and results are displayed below the query editor window.

the results from 5,000 rows of the elb logs – 4.73 seconds

the sum by request processing time column – 8.78 seconds

athena vs. bigquery

in the following sections, we will provide an in-depth comparison of these two tools.

data sources

as mentioned above, bigquery supports native tables. these are optimized for reading data because they are backed by bigquery storage , which automatically structures, compresses, encrypts, and protects the data. in addition, bigquery can also run on external storage. in comparison, athena only supports amazon s3, which means that a query can be executed only on files stored in an s3 bucket.

price

the price models for both solutions are the same. users pay for the s3 storage and the queries that are executed using athena. aws athena is paid per query, where $5 is invoiced for every tb of data that is scanned. check amazon’s athena pricing page to learn more and see several examples.

google also charges by the amount of data scanned, and the price is the same as for athena. the storage is $0.02 per gb, which is more or less the same in aws (the price tiers depends on the overall amount stored). all other operations such as loading data, export, copy or metadata are free.

performance

when it comes to speed, the native tables in bigquery show good performance. below, we examined another public data set called bigquery-public-data.github_repos.licenses . we loaded the same data set to an s3 bucket and executed the following sql statement, counting the number of licenses in the table (grouped by the license number):

select license, 
count(*) as licenses 
from [bigquery-public-data.github_repos.licenses] 
group by license 
order by licenses 
desc

bigquery result for counting the licenses – 1.7 seconds

we checked the same results in athena. the results are shown below:

athena result for counting licenses – 7.05 seconds

we see that using bigquery shows better performance than aws athena, but obviously that will not always be the case.

it’s important to note that amazon athena supports data partition by any key (unlike bigquery, which supports date only). with athena, you can also restrict the amount of scanned data by each query — which leads to improved performance and reduced costs.

udf support

the next way in which bigquery and athena differ are in the user defined functions (udf). in bigquery, this is a javascript function that can be called as part of a query, an action that provides powerful functionality where mixing sql and the code is possible. one example of this is implementing custom converters that don’t exist, such as the url decode function .

for data engineers, udf is a powerful tool that athena currently does not support, and the only way to add it is to contact their team at at athena-feedback (at) amazon.com. but as we know from amazon’s release cadence, udf will be introduced soon.

summary

so, what can we expect from athena and bigquery going forward? at this stage, athena offers a high-level service for querying data already stored in s3. if you’re already an aws services user and you have data to analyze, just create a table point on s3 and you’re ready to query. what would be nice to have from athena is additional operations in a table such as appending results of a query to an existing table.

overall, athena as a new product has potential, and it’s worth waiting to see what it will offer in the near future. as far as bigquery, although some features are missing such as partitioning with any column in a table, the solution is mature and feature-rich, and offers users a good and robust data warehouse solution.

Relational database Database Data (computing) User experience

Published at DZone with permission of Asaf Yigal, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Required Knowledge To Pass AWS Certified Data Analytics Specialty Exam
  • Explainer: Building High Performing Data Product Platform
  • Simulating and Troubleshooting StackOverflowError in Kotlin
  • How to Perform Accessibility Testing of Websites and Web Apps

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: