DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Introduction to NoSQL Database
  • Apache Druid vs StarRocks: A Deep Dive
  • Getting Started With Apache Cassandra
  • Migrating to Snowflake, Redshift, or BigQuery? Avoid these Common Pitfalls

Trending

  • Microservices With Apache Camel and Quarkus (Part 5)
  • Logging to Infinity and Beyond: How To Find the Hidden Value of Your Logs
  • Causes and Remedies of Poison Pill in Apache Kafka
  • Exploring Edge Computing: Delving Into Amazon and Facebook Use Cases
  1. DZone
  2. Data Engineering
  3. Big Data
  4. SQL and Hadoop Query Performance Smackdown

SQL and Hadoop Query Performance Smackdown

See which SQL engine won this smackdown where MapReduce, Spark, LLAP, Tez, and Presto competed to see who performed fastest with SQL queries!

Roni Fontaine user avatar by
Roni Fontaine
·
Updated Jul. 31, 17 · Analysis
Like (3)
Save
Tweet
Share
3.11K Views

Join the DZone community and get the full member experience.

Join For Free

LLAP Wins the Fastest Execution Among the SQL Engines!

Comcast is one of the nation’s leading providers of communications, entertainment, and cable products and services. Headquartered in Philadelphia, PA, they employ over 100,000 employees nationwide whose goal is to deliver the highest level of service and improve the customer experience. Comcast decided to run what they call their “Hadoop Query Performance Smackdown” for SQL engines.

The Comcast Big Data team uses an enterprise data like with over 1000+ daily active users running on 70 racks with petabytes of usable enterprise data available via Hive tables. Their uses cases range from ad hoc jobs, batch and streaming data, and reporting.  They wanted to pick a SQL engine which would give them the best performance for the most practical use cases. They ran tests against MapReduce, Spark, LLAP, Tez, and Presto. The end result was to pick a SQL engine to recommend to the CTO and the community.

They used a test methodology which utilized TPC-DS queries defined in the Hive benchmark for each of the SQL engines. Each query was run one at a time to utilize all the resources from the cluster. The team ensured that care was taken to tune and configure the engines.  Furthermore, each query was run three times to make sure there were no anomalies.  By doing this, the team calculated an average run time from the three rounds.  Take notice of how the tests that were run against LLAP have much faster execution times that the other engines. LLAP had been explained as only been best optimized for ORC.  The Comcast team found that they achieved much better performance across the board. LLAP was by far the fastest engine against Tez and Presto.  SparkSQL did not manage to complete the benchmark successfully.

SQL and Hadoop Query Performance Smackdown
*SQL and Hadoop Query Performance Smackdown

To learn more, read the Datanami article.

In addition, watch the Comcast youtube video session from the Hortonworks Dataworks Summit on June 14, 2017, in San Jose, CA to learn about how you can use these results to help guide your company’s big data initiative on that journey of supporting interactive queries.

Database sql Big data hadoop Engine

Published at DZone with permission of Roni Fontaine, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Introduction to NoSQL Database
  • Apache Druid vs StarRocks: A Deep Dive
  • Getting Started With Apache Cassandra
  • Migrating to Snowflake, Redshift, or BigQuery? Avoid these Common Pitfalls

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: