Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Installing and Running Presto

DZone's Guide to

Installing and Running Presto

Learn how to configure and run Presto, an open-source distributed SQL query engine that helps with running interactive analytic queries.

· Database Zone
Free Resource

Navigating today's database scaling options can be a nightmare. Explore the compromises involved in both traditional and new architectures.

In my previous blog, I talked about getting introduced to Presto. In today's blog, I'll be talking about install and running Presto.

The basic prerequisites for setting up Presto are:

  • Linux or Mac OS X.
  • Java 8, 64-bit.
  • Python 2.4+.

Installation

  1. Download the Presto Tarball from here.
  2. Unpack the Tarball.
    1. After unpacking, you will see a directory presto-server-0.175, which we will call the installation directory.

Configuring

Inside the installation directory, create a directory called etc. This directory will hold the following configurations:

  1. Node properties: Environmental configuration specific to each node.
  2. JVM config: Command line options for the Java Virtual Machine.
  3. Config properties: Configuration for the Presto server.
  4. Catalog properties: Configuration for connectors (data sources).
  5. Log properties: Configuring the log levels.

Now, we will setup the above properties one by one.

1. Setting Up Node Properties

Create a file called node.properties inside the etc folder. This file will contain the configuration specific to each node. Given below is a description of the properties we need to set in this file.

  • node.environment: The name of the Presto environment. All the nodes in the cluster must have an identical environment name.
  • node.id: The unique identifier for every node.
  • node.data-dir: The path of the data directory.

Note: Presto will store the logs and other data at the location specified in node.data-dir. It is recommended to create a data directory external to the installation directory, as this allows easy preservation during the upgrade.

You can put the following default content:

node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/var/presto/data

2. Setting Up JVM Config

Create a file named jvm.config inside the etc folder. In the file, we will specify all the options we need to configure for the launching of the JVM.

You can put the following default content:

-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError

Note: Please keep in mind that the format of the file must be a single line per option.

3. Setting Up Config Properties

Create a file named config.properties in the etc folder. This file contains the configuration related to the server. Presto servers can double up as worker and coordinator simultaneously. Before setting up the config file, let's discuss the properties in brief:

  • coordinator: If set as true, it sets the node as coordinator to accept queries from clients and manage query execution. In the case of only worker nodes, this value is set to false.
  • node-scheduler.include-coordinator: Enables scheduling on the  coordinator. Can be set to true/false.
  • http-server.http.port: Specifies the port to start the Presto server.
  • query.max-memory: Specifies the maximum limit for the memory that the query will be allowed.
  • query.max-memory-per-node: Specifies the maximum limit for the memory that the query will be allowed on the single node.
  • discovery-server.enabled: Can be set to true/false. It is used to find all nodes in the cluster. If false, the coordinator will run the embedded version of the discovery service.
  • discovery.uri: URI to the discovery server.
  • query.queue-config-file: File configuration to read from in queue configurations.

Now, let's set the properties in config.properties.

If the node is a coordinator, you can use the following as default content:

coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://example.net:8080

If the node is a worker, you can use the following as default content:

coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
discovery.uri=http://example.net:8080

For a single node doubling up as worker and coordinator, we can use below configuration as default content:

coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://example.net:8080

4. Setting Up Log Level

Create a file called log.properties in the etc folder. It will be used to set the minimum log level. The only property you need to set in this file is com.facebook.presto=INFO.

This property can have the following values: DEBUGINFOWARN, and ERROR.

5. Setting Up the Catalog

Presto accesses the data via connectors that are specified by means of catalogs. Catalogs are registered by creating a catalog property file for each connector. Create a directory called catalog in etc. Inside the etc/catalog directory, create a catalog. For instance, create a catalog for JMX.

Create jmx.properties in etc/catalog/ and set the name of the connector like connector.name=jmx.

Once you have completed these steps, we can begin with running Presto.

Running Presto

Inside the Presto installation directory, we have a launcher script. Now, Presto can be run in either the daemon or as a foreground process. The main difference between the two is that in the foreground mode, the server is started with logs and output is redirected to stdout/sterr.

To run as a daemon, use bin/launcher start. To run in the foreground, use bin/launcher start.

Once you run the above commands, you will be able to see the presto server running on the localhost:8080 (default port) or <localhost:Port>.

Screenshot from 2017-05-15 16-38-22That's all you need to do to start running Presto! In my next blog, I will discuss how to use the Presto CLI and set up the Presto server programmatically for applications.

Understand your options for deploying a database across multiple data centers - without the headache.

Topics:
presto ,database ,tutorial ,query engine ,data analytics

Published at DZone with permission of Pallavi Singh, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}