DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Running Axon Server in Docker
  • Comprehensive Guide to Property-Based Testing in Go: Principles and Implementation
  • Efficiently Processing Billions of Rows Daily With Presto
  • Managing AWS Managed Microsoft Active Directory Objects With AWS Lambda Functions

Trending

  • Enhancing Business Decision-Making Through Advanced Data Visualization Techniques
  • Distributed Consensus: Paxos vs. Raft and Modern Implementations
  • AI Speaks for the World... But Whose Humanity Does It Learn From?
  • AI Agents: A New Era for Integration Professionals

Installing and Running Presto

Learn how to configure and run Presto, an open-source distributed SQL query engine that helps with running interactive analytic queries.

By 
Pallavi Singh user avatar
Pallavi Singh
·
May. 16, 17 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
20.2K Views

Join the DZone community and get the full member experience.

Join For Free

In my previous blog, I talked about getting introduced to Presto. In today's blog, I'll be talking about install and running Presto.

The basic prerequisites for setting up Presto are:

  • Linux or Mac OS X.
  • Java 8, 64-bit.
  • Python 2.4+.

Installation

  1. Download the Presto Tarball from here.
  2. Unpack the Tarball.
    1. After unpacking, you will see a directory presto-server-0.175, which we will call the installation directory.

Configuring

Inside the installation directory, create a directory called etc. This directory will hold the following configurations:

  1. Node properties: Environmental configuration specific to each node.
  2. JVM config: Command line options for the Java Virtual Machine.
  3. Config properties: Configuration for the Presto server.
  4. Catalog properties: Configuration for connectors (data sources).
  5. Log properties: Configuring the log levels.

Now, we will setup the above properties one by one.

1. Setting Up Node Properties

Create a file called node.properties inside the etc folder. This file will contain the configuration specific to each node. Given below is a description of the properties we need to set in this file.

  • node.environment: The name of the Presto environment. All the nodes in the cluster must have an identical environment name.
  • node.id: The unique identifier for every node.
  • node.data-dir: The path of the data directory.

Note: Presto will store the logs and other data at the location specified in node.data-dir. It is recommended to create a data directory external to the installation directory, as this allows easy preservation during the upgrade.

You can put the following default content:

node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/var/presto/data

2. Setting Up JVM Config

Create a file named jvm.config inside the etc folder. In the file, we will specify all the options we need to configure for the launching of the JVM.

You can put the following default content:

-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError

Note: Please keep in mind that the format of the file must be a single line per option.

3. Setting Up Config Properties

Create a file named config.properties in the etc folder. This file contains the configuration related to the server. Presto servers can double up as worker and coordinator simultaneously. Before setting up the config file, let's discuss the properties in brief:

  • coordinator: If set as true, it sets the node as coordinator to accept queries from clients and manage query execution. In the case of only worker nodes, this value is set to false.
  • node-scheduler.include-coordinator: Enables scheduling on the  coordinator. Can be set to true/false.
  • http-server.http.port: Specifies the port to start the Presto server.
  • query.max-memory: Specifies the maximum limit for the memory that the query will be allowed.
  • query.max-memory-per-node: Specifies the maximum limit for the memory that the query will be allowed on the single node.
  • discovery-server.enabled: Can be set to true/false. It is used to find all nodes in the cluster. If false, the coordinator will run the embedded version of the discovery service.
  • discovery.uri: URI to the discovery server.
  • query.queue-config-file: File configuration to read from in queue configurations.

Now, let's set the properties in config.properties.

If the node is a coordinator, you can use the following as default content:

coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://example.net:8080

If the node is a worker, you can use the following as default content:

coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
discovery.uri=http://example.net:8080

For a single node doubling up as worker and coordinator, we can use below configuration as default content:

coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://example.net:8080

4. Setting Up Log Level

Create a file called log.properties in the etc folder. It will be used to set the minimum log level. The only property you need to set in this file is com.facebook.presto=INFO.

This property can have the following values: DEBUG, INFO, WARN, and ERROR.

5. Setting Up the Catalog

Presto accesses the data via connectors that are specified by means of catalogs. Catalogs are registered by creating a catalog property file for each connector. Create a directory called catalog in etc. Inside the etc/catalog directory, create a catalog. For instance, create a catalog for JMX.

Create jmx.properties in etc/catalog/ and set the name of the connector like connector.name=jmx.

Once you have completed these steps, we can begin with running Presto.

Running Presto

Inside the Presto installation directory, we have a launcher script. Now, Presto can be run in either the daemon or as a foreground process. The main difference between the two is that in the foreground mode, the server is started with logs and output is redirected to stdout/sterr.

To run as a daemon, use bin/launcher start. To run in the foreground, use bin/launcher start.

Once you run the above commands, you will be able to see the presto server running on the localhost:8080 (default port) or <localhost:Port>.

Screenshot from 2017-05-15 16-38-22That's all you need to do to start running Presto! In my next blog, I will discuss how to use the Presto CLI and set up the Presto server programmatically for applications.

Presto (SQL query engine) Property (programming) Directory

Published at DZone with permission of Pallavi Singh, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Running Axon Server in Docker
  • Comprehensive Guide to Property-Based Testing in Go: Principles and Implementation
  • Efficiently Processing Billions of Rows Daily With Presto
  • Managing AWS Managed Microsoft Active Directory Objects With AWS Lambda Functions

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!