Installing and Running Presto
Learn how to configure and run Presto, an open-source distributed SQL query engine that helps with running interactive analytic queries.
Join the DZone community and get the full member experience.
Join For FreeIn my previous blog, I talked about getting introduced to Presto. In today's blog, I'll be talking about install and running Presto.
The basic prerequisites for setting up Presto are:
- Linux or Mac OS X.
- Java 8, 64-bit.
- Python 2.4+.
Installation
- Download the Presto Tarball from here.
- Unpack the Tarball.
- After unpacking, you will see a directory
presto-server-0.175
, which we will call the installation directory.
- After unpacking, you will see a directory
Configuring
Inside the installation directory, create a directory called etc
. This directory will hold the following configurations:
- Node properties: Environmental configuration specific to each node.
- JVM config: Command line options for the Java Virtual Machine.
- Config properties: Configuration for the Presto server.
- Catalog properties: Configuration for connectors (data sources).
- Log properties: Configuring the log levels.
Now, we will setup the above properties one by one.
1. Setting Up Node Properties
Create a file called node.properties
inside the etc
folder. This file will contain the configuration specific to each node. Given below is a description of the properties we need to set in this file.
node.environment
: The name of the Presto environment. All the nodes in the cluster must have an identical environment name.node.id
: The unique identifier for every node.node.data-dir
: The path of the data directory.
Note: Presto will store the logs and other data at the location specified in node.data-dir
. It is recommended to create a data directory external to the installation directory, as this allows easy preservation during the upgrade.
You can put the following default content:
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/var/presto/data
2. Setting Up JVM Config
Create a file named jvm.config
inside the etc
folder. In the file, we will specify all the options we need to configure for the launching of the JVM.
You can put the following default content:
-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
Note: Please keep in mind that the format of the file must be a single line per option.
3. Setting Up Config Properties
Create a file named config.properties
in the etc
folder. This file contains the configuration related to the server. Presto servers can double up as worker and coordinator simultaneously. Before setting up the config file, let's discuss the properties in brief:
coordinator
: If set astrue
, it sets the node as coordinator to accept queries from clients and manage query execution. In the case of only worker nodes, this value is set tofalse
.node-scheduler.include-coordinator
: Enables scheduling on thecoordinator
. Can be set totrue
/false
.http-server.http.port
: Specifies the port to start the Presto server.query.max-memory
: Specifies the maximum limit for the memory that the query will be allowed.query.max-memory-per-node
: Specifies the maximum limit for the memory that the query will be allowed on the single node.discovery-server.enabled
: Can be set totrue
/false
. It is used to find all nodes in the cluster. Iffalse
, the coordinator will run the embedded version of the discovery service.discovery.uri
: URI to the discovery server.query.queue-config-file
: File configuration to read from in queue configurations.
Now, let's set the properties in config.properties
.
If the node is a coordinator, you can use the following as default content:
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://example.net:8080
If the node is a worker, you can use the following as default content:
coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
discovery.uri=http://example.net:8080
For a single node doubling up as worker and coordinator, we can use below configuration as default content:
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://example.net:8080
4. Setting Up Log Level
Create a file called log.properties
in the etc
folder. It will be used to set the minimum log level. The only property you need to set in this file is com.facebook.presto=INFO
.
This property can have the following values: DEBUG
, INFO
, WARN
, and ERROR
.
5. Setting Up the Catalog
Presto accesses the data via connectors that are specified by means of catalogs. Catalogs are registered by creating a catalog property file for each connector. Create a directory called catalog
in etc
. Inside the etc/catalog
directory, create a catalog. For instance, create a catalog for JMX.
Create jmx.properties
in etc/catalog/
and set the name of the connector like connector.name=jmx
.
Once you have completed these steps, we can begin with running Presto.
Running Presto
Inside the Presto installation directory, we have a launcher script. Now, Presto can be run in either the daemon or as a foreground process. The main difference between the two is that in the foreground mode, the server is started with logs and output is redirected to stdout/sterr
.
To run as a daemon, use bin/launcher start
. To run in the foreground, use bin/launcher start
.
Once you run the above commands, you will be able to see the presto server running on the localhost:8080 (default port) or <localhost:Port>
.
That's all you need to do to start running Presto! In my next blog, I will discuss how to use the Presto CLI and set up the Presto server programmatically for applications.
Published at DZone with permission of Pallavi Singh, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments