DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Migrate, Modernize and Build Java Web Apps on Azure: This live workshop will cover methods to enhance Java application development workflow.

Modern Digital Website Security: Prepare to face any form of malicious web activity and enable your sites to optimally serve your customers.

Kubernetes in the Enterprise: The latest expert insights on scaling, serverless, Kubernetes-powered AI, cluster security, FinOps, and more.

A Guide to Continuous Integration and Deployment: Learn the fundamentals and understand the use of CI/CD in your apps.

Related

  • Your Old Laptop Is Your New Database Server
  • How To Dockerize Mean Stack App
  • Running Axon Server in Docker
  • Setting Up a CrateDB Cluster With Kubernetes to Store and Query Machine Data

Trending

  • Enhancing Observability With AI/ML
  • DDD and Microservices
  • Architecture Method: C4 Model
  • Top 10 Software Development Trends for 2024
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Hands-On Presto Tutorial: Presto 101

Hands-On Presto Tutorial: Presto 101

This tutorial will guide you on installing and configuring Presto locally.

Praburam Upendran user avatar by
Praburam Upendran
·
Dipti Borkar user avatar by
Dipti Borkar
·
Jul. 13, 21 · Tutorial
Like (6)
Save
Tweet
Share
13.5K Views

Join the DZone community and get the full member experience.

Join For Free

In this blog we'll show you how to get started with Presto, the open source SQL query engine for the data lake. By the end you'll be able to run Presto locally on your machine.

Presto Installation

Presto can be installed manually or using docker images on:

  • Single Node: Both co-ordinator and workers run on the same machine.
  • Or even multiple machines depending on the workload requirements.

Manual Installing Presto

Download the Presto server tarball, presto-server-0.253.1.tar.gz and unpack it. The tarball will contain a single top-level directory, presto-server-0.253.1 which we will call the installation directory.

Run the commands below to install the official tarballs for presto-server and presto-cli from prestodb.io

[root@prestodb_c01 ~]# curl -O https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.235.1/presto-server-0.235.1.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed100 721M 100 721M 0 0 72.9M 0 0:00:09 0:00:09 --:--:-- 111M[root@prestodb_c01 ~]# curl -O https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.235.1/presto-cli-0.235.1-executable.jar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                 Dload  Upload   Total   Spent    Left  Speed100 12.7M  100 12.7M    0     0  21.9M      0 --:--:-- --:--:-- --:--:-- 21.9M

Data Directory

Presto needs a data directory for storing logs, etc. We recommend creating a data directory outside of the installation directory, which allows it to be easily preserved when upgrading Presto.

[root@prestodb_c01 ~]# mkdir -p /var/presto/data

Configuration Settings

Create an etc directory inside the installation directory. This will hold the following configuration:

  • Node Properties: environmental configuration specific to each node
  • JVM Config: command-line options for the Java Virtual Machine
  • Config Properties: configuration for the Presto server
  • Catalog Properties: configuration for Connectors (data sources)
[root@prestodb_c01 ~]# mkdir etc

Node Properties

The node properties file, etc/node.properties contains configuration specific to each node. A node is a single installed instance of Presto on a machine. This file is typically created by the deployment system when Presto is first installed. The following is a minimal etc/node.properties:

[root@prestodb_c01 ~]# cat etc/node.propertiesnode.environment=productionnode.id=ffffffff-ffff-ffff-ffff-ffffffffffffnode.data-dir=/var/presto/data

The above properties are described below:

  • node.environment: The name of the environment. All Presto nodes in a cluster must have the same environment name.
  • node.id: The unique identifier for this installation of Presto. This must be unique for every node. This identifier should remain consistent across reboots or upgrades of Presto. If running multiple installations of Presto on a single machine (i.e. multiple nodes on the same machine), each installation must have a unique identifier.
  • node.data-dir: The location (filesystem path) of the data directory. Presto will store logs and other data here.

JVM configuration

The JVM config file, etc/jvm.config, contains a list of command-line options used for launching the Java Virtual Machine. The format of the file is a list of options, one per line. These options are not interpreted by the shell, so options containing spaces or other special characters should not be quoted.

The following provides a good starting point for creating etc/jvm.config:

[root@prestodb_c01 ~]# cat etc/jvm.config
-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError

Because an OutOfMemoryError will typically leave the JVM in an inconsistent state, we write a heap dump (for debugging) and forcibly terminate the process when this occurs.

Config Properties

The config properties file, etc/config.properties, contains the configuration for the Presto server. Every Presto server can function as both a coordinator and a worker, but dedicating a single machine to only perform coordination work provides the best performance on larger clusters.

In order to set up a single machine for testing that will function as both a coordinator and worker, then set the below parameters to true in etc/config.properties

[root@singlenode01 ~]# cat etc/config.properties
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://example.net:8080
  • coordinator: Allow this Presto instance to function as a coordinator (accept queries from clients and manage query execution).
  • node-scheduler.include-coordinator: Allow scheduling work on the coordinator. 
  • http-server.http.port: Specifies the port for the HTTP server. Presto uses HTTP for all communication, internal and external.
  • query.max-memory: The maximum amount of distributed memory that a query may use.
  • query.max-memory-per-node: The maximum amount of user memory that a query may use on any one machine.
  • query.max-total-memory-per-node: The maximum amount of user and system memory that a query may use on any one machine, where system memory is the memory used during execution by readers, writers, and network buffers, etc.
  • discovery-server.enabled: Presto uses the Discovery service to find all the nodes in the cluster. Every Presto instance will register itself with the Discovery service on startup. In order to simplify deployment and avoid running an additional service, the Presto coordinator can run an embedded version of the Discovery service. It shares the HTTP server with Presto and thus uses the same port.
  • discovery.uri: The URI to the Discovery server. Because we have enabled the embedded version of Discovery in the Presto coordinator, this should be the URI of the Presto coordinator. Replace example.net:8080 to match the host and port of the Presto coordinator. This URI must not end in a slash.

You may also wish to set the following properties:

  • jmx.rmiregistry.port: Specifies the port for the JMX RMI registry. JMX clients should connect to this port.
  • jmx.rmiserver.port: Specifies the port for the JMX RMI server. Presto exports many metrics that are useful for monitoring via JMX.

Log Levels

The optional log levels file, etc/log.properties allows setting the minimum log level for named logger hierarchies. Every logger has a name, which is typically the fully qualified name of the class that uses the logger. 

[root@coordinator01 ~]# cat  etc/log.properties
com.facebook.presto=INFO

There are four levels: DEBUG, INFO, WARN and ERROR.

Catalog Properties

Presto accesses data via connectors, which are mounted in catalogs. The connector provides all of the schemas and tables inside of the catalog. 

Catalogs are registered by creating a catalog properties file in the etc/catalog directory. For example, create etc/catalog/jmx.properties with the following contents to mount the jmx connector as the jmx catalog

[root@coordinator01 ~]# mkdir etc/catalog
[root@coordinator01 ~]# echo "connector.name=jmx" >>
etc/catalog/jmx.properties

Running Presto

The installation directory contains the launcher script in bin/launcher. Presto can be started as a daemon by running the following:

[root@hsrhvm01 presto-server-0.235.1]# bin/launcher start
Started as 23378

After launching, you can find the log files in var/log:

  • launcher.log: This log is created by the launcher and is connected to the stdout and stderr streams of the server. It will contain a few log messages that occur while the server logging is being initialized and any errors or diagnostics produced by the JVM.
  • server.log: This is the main log file used by Presto. It will typically contain the relevant information if the server fails during initialization. It is automatically rotated and compressed.
  • http-request.log: This is the HTTP request log which contains every HTTP request received by the server. It is automatically rotated and compressed.
Presto (SQL query engine) Java virtual machine file IO Property (programming) Directory Machine Database Docker (software)

Published at DZone with permission of Praburam Upendran. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Your Old Laptop Is Your New Database Server
  • How To Dockerize Mean Stack App
  • Running Axon Server in Docker
  • Setting Up a CrateDB Cluster With Kubernetes to Store and Query Machine Data

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: