DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Top 5 Reasons Presto Is the Foundation of the Data Analytics Stack

Top 5 Reasons Presto Is the Foundation of the Data Analytics Stack

Let’s look at some important characteristics of Presto that account for its growing adoption.

Ashish Tadose user avatar by
Ashish Tadose
·
Oct. 26, 20 · Analysis
Like (6)
Save
Tweet
Share
6.11K Views

Join the DZone community and get the full member experience.

Join For Free

The need for data engineers and analysts to run interactive, ad hoc analytics on large amounts of data continues to grow explosively. Data platform teams are increasingly using the federated SQL query engine PrestoDB to run such analytics for a variety of use cases across a wide range of data lakes and databases in-place, without the need to move data. PrestoDB is hosted by the Linux Foundation’s Presto Foundation and is the same project running at massive scale at Facebook, Uber and Twitter.

Let’s look at some important characteristics of Presto that account for its growing adoption.  

Easier Integration With Ecosystem 

Presto was designed to seamlessly integrate with an existing data ecosystem without any modification needed to the on-going system. It’s like turbocharging your existing stack with an additional faster data access interface.

Presto provides an additional compute layer for faster analytics. It doesn’t store the data, which gives it the massive advantage of being able to scale resources for queries up and down f based on the demand.

This compute and storage separation makes the Presto query engine extremely suitable for cloud environments. Most of the cloud deployments leverage object storage, which is already disintegrated from the compute layer, and auto-scale to optimize resource costs.

Unified SQL Interface 

SQL is by far the oldest and the most widely-used language for data analysis. Analysts, data engineers and data scientists use SQL for exploring data, building dashboards, and testing hypotheses with notebooks like Jupyter and Zeppelin, or with BI tools like Tableau, PowerBI, and Looker, etc. 

Presto is a federated query engine that has the ability to query data not just from distributed file systems, but also from other sources such as NoSQL stores like Cassandra, Elasticsearch, and RDBMS and even message queues like Kafka.

Performance

The Facebook team developed Presto because Apache Hive was not suitable for interactive queries. Hive’s  underlining architecture , which executes queries by executing multiple MapReduce and Tez jobs, works very well for large, complex jobs, but does not suffice for low-latency queries. The Hive project has recently introduced in-memory caching with Hive LLAP; however it works well for certain kinds of queries, but it also makes Hive more resource-intensive. 

Similarly, Apache Spark works very well for large, complex jobs using in-memory computation. However, it is not as efficient as Presto interactive BI queries. 

Presto is built for high performance, with several key features and optimizations, such as code-generation, in-memory processing and pipelined execution. Presto queries share a long-lived Java Virtual Machine (JVM) process on worker nodes, which avoids overhead of spawning new JVM containers.

Query Federation

Presto provides a single unified SQL dialect that abstracts all supported data sources. This is a powerful feature which eliminates the need for users to understand connections and SQL dialects of underlying systems. 

Design Suitable for Cloud 

Presto’s fundamental design of running storage and compute separately makes it extremely convenient to operate in cloud environments. Since the Presto cluster doesn’t store any data, it can be auto-scaled depending on the load without causing any data loss.

As you can see Presto offers numerous advantages for interactive ad hoc queries. No wonder data platform teams are increasingly using Presto as the de facto SQL query engine to run analytics across data sources in-place, without the need to move data.

Data science Presto (SQL query engine) Analytics Database Foundation (framework) Java virtual machine

Published at DZone with permission of Ashish Tadose. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • DZone's Article Submission Guidelines
  • How to Submit a Post to DZone
  • The Path From APIs to Containers
  • Spring Boot, Quarkus, or Micronaut?

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: