DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > Hands-On Presto Tutorial: Presto 103

Hands-On Presto Tutorial: Presto 103

Learn how to run a Presto cluster on Google Cloud Platform using VM instances and GKE containers in this third tutorial in the hands-on Presto tutorial series.

Praburam Upendran user avatar by
Praburam Upendran
·
Sep. 23, 21 · Big Data Zone · Tutorial
Like (4)
Save
Tweet
3.85K Views

Join the DZone community and get the full member experience.

Join For Free

Introduction

This tutorial is Part III of our getting started with Presto series. As a reminder, PrestoDB is an open source distributed SQL query engine. In tutorial 101, we showed you how to install and configure presto locally, and in tutorial 102, we covered how to run a three-node PrestoDB cluster on a laptop. In this tutorial, we’ll show you how to run a PrestoDB cluster in a GCP environment using VM instances and GKE containers.

Environment

This guide was developed on GCP VM instances and GKE containers.

Presto on GCP With VMs: Implementation Steps for PrestoDB on VM Instances

Step 1: 

Create a GCP VM instance using the CREATE INSTANCE tab, name it as presto-coordinator. Next, create three more VM instances as presto-worker1, presto-worker2, and presto-worker3 respectively.


Step 2:

By default GCP blocks all network ports, so PrestoDB will need ports 8080-8083 enabled. Use the firewalls rule tab and enable them.


Step 3: 

Install JAVA and python.

Step 4:

Download the Presto server tarball, presto-server-0.253.1.tar.gz and unpack it. The tarball will contain a single top-level directory, presto-server-0.253.1 which we will call the installation directory.

Run the commands below to install the official tarballs for presto-server and presto-cli from prestodb.io

user@presto-coordinator-1:~$ curl -O https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.235.1/presto-server-0.235.1.tar.gz

 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                Dload  Upload   Total   Spent    Left  Speed

100  721M  100  721M    0     0   245M      0  0:00:02  0:00:02 --:--:--  245M

user@presto-coordinator-1:~$ curl -O https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.235.1/presto-cli-0.235.1-executable.jar

 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                Dload  Upload   Total   Spent    Left  Speed

100 12.7M  100 12.7M    0     0  15.2M      0 --:--:-- --:--:-- --:--:-- 15.1M

user@presto-coordinator-1:~$


Step 5:

Use gunzip and tar to unzip and untar the presto-server

user@presto-coordinator-1:~$gunzip presto-server-0.235.1.tar.gz ;tar -xf presto-server-0.235.1.tar


Step 6: (optional)

Rename the directory without version number

user@presto-coordinator-1:~$ mv presto-server-0.235.1 presto-server


Step 7:  

Create etc, etc/catalog and data directories

user@presto-coordinator-1:~/presto-server$ mkdir etc etc/catalog data


Step 8:

Define etc/node.config, etc/config.properties, etc/jvm.config and etc/catalog/jmx.properties files as below for presto co-ordinator server.  

user@presto-coordinator-1:~/presto-server$ cat etc/node.properties

node.environment=production

node.id=ffffffff-ffff-ffff-ffff-ffffffffffff

node.data-dir=/home/user/presto-server/data


user@presto-coordinator-1:~/presto-server$ cat etc/config.properties

coordinator=true

node-scheduler.include-coordinator=false

http-server.http.port=8080

query.max-memory=50GB

query.max-memory-per-node=1GB

query.max-total-memory-per-node=2GB

discovery-server.enabled=true

discovery.uri=http://localhost:8080


user@presto-coordinator-1:~/presto-server$ cat etc/jvm.config

-server

-Xmx16G

-XX:+UseG1GC

-XX:G1HeapRegionSize=32M

-XX:+UseGCOverheadLimit

-XX:+ExplicitGCInvokesConcurrent

-XX:+HeapDumpOnOutOfMemoryError

-XX:+ExitOnOutOfMemoryError

-Djdk.attach.allowAttachSelf=true


user@presto-coordinator-1:~/presto-server$ cat etc/log.properties

com.facebook.presto=INFO


user@presto-coordinator-1:~/presto-server$ cat etc/catalog/jmx.properties

connector.name=jmx


Step: 9 

Check the cluster UI status. It should  show the Active worker count at 0 since we enabled only the coordinator.

Step 10: 

Repeat steps 1 to 8 on the remaining 3 vm instances which will act as worker nodes.

On the configuration step for worker nodes, set coordinator to false and http-server.http.port to 8081, 8082 and 8083 for worker1, worker2 and worker3 respectively.

Also make sure node.id and http-server.http.port are different for each worker node.

user@presto-worker1:~/presto-server$ cat etc/node.properties

node.environment=production

node.id=ffffffff-ffff-ffff-ffff-ffffffffffffd

node.data-dir=/home/user/presto-server/data


user@presto-worker1:~/presto-server$ cat etc/config.properties

coordinator=false

http-server.http.port=8083

query.max-memory=50GB

query.max-memory-per-node=1GB

query.max-total-memory-per-node=2GB

discovery.uri=http://presto-coordinator-1:8080





user@presto-worker1:~/presto-server$ cat etc/jvm.config

-server

-Xmx16G

-XX:+UseG1GC

-XX:G1HeapRegionSize=32M

-XX:+UseGCOverheadLimit

-XX:+ExplicitGCInvokesConcurrent

-XX:+HeapDumpOnOutOfMemoryError

-XX:+ExitOnOutOfMemoryError

-Djdk.attach.allowAttachSelf=true


user@presto-worker1:~/presto-server$ cat etc/log.properties

com.facebook.presto=INFO


user@presto-worker1:~/presto-server$ cat etc/catalog/jmx.properties

connector.name=jmx


Step 11: 

Check cluster status, it should reflect the three worker nodes as part of the prestodb cluster.

Step 12:

Verify the prestodb environment by running the prestodb CLI with simple JMX query

user@presto-coordinator-1:~/presto-server$ ./presto-cli

presto> SHOW TABLES FROM jmx.current;

                                                              Table                                                              

-----------------------------------------------------------------------------------------------------------------------------------

com.facebook.airlift.discovery.client:name=announcer                                                                             

com.facebook.airlift.discovery.client:name=serviceinventory                                                                      

com.facebook.airlift.discovery.store:name=dynamic,type=distributedstore                                                          

com.facebook.airlift.discovery.store:name=dynamic,type=httpremotestore                                                           

com.facebook.airlift.discovery.store:name=dynamic,type=replicator


Implementation Steps for Prestodb on GKE Containers

Step 1:

Go to the Google cloud Console and activate the cloud console window

Step 2:

Create an artifacts repository using the below command and replace REGION with the valid region you would prefer to create the repository.

gcloud artifacts repositories create ahana-prestodb \

   --repository-format=docker \

   --location=REGION \

   --description="Docker repository"



Step 3:

Create the container cluster by using the gcloud command: 

user@cloudshell:~ (weighty-list-324021)$ gcloud config set compute/zone us-central1-c

Updated property [compute/zone].


user@cloudshell:~ (weighty-list-324021)$ gcloud container clusters create prestodb-cluster01


Creating cluster prestodb-cluster01 in us-central1-c...done.

Created 

.

.

.


kubeconfig entry generated for prestodb-cluster01.

NAME                LOCATION       MASTER_VERSION   MASTER_IP     MACHINE_TYPE  NODE_VERSION     NUM_NODES  STATUS

prestodb-cluster01  us-central1-c  1.20.8-gke.2100  34.72.76.205  e2-medium     1.20.8-gke.2100  3          RUNNING

user@cloudshell:~ (weighty-list-324021)$



Step 4:

After container cluster creation, run the following command to see the cluster’s three nodes

user@cloudshell:~ (weighty-list-324021)$ kubectl get nodes

NAME                                                STATUS   ROLES    AGE     VERSION

gke-prestodb-cluster01-default-pool-34d21367-25cw   Ready    <none>   7m54s   v1.20.8-gke.2100

gke-prestodb-cluster01-default-pool-34d21367-7w90   Ready    <none>   7m54s   v1.20.8-gke.2100

gke-prestodb-cluster01-default-pool-34d21367-mwrn   Ready    <none>   7m53s   v1.20.8-gke.2100

user@cloudshell:~ (weighty-list-324021)$


Step 5:

Pull the prestodb docker image 

user@cloudshell:~ (weighty-list-324021)$ docker pull ahanaio/prestodb-sandbox


Step 6:

Deploy ahanaio/prestodb-sandbox locally on the shell and create an image named as coordinator which will later be deployed on the container clusters.

user@cloudshell:~ (weighty-list-324021)$ docker run -d -p 8080:8080 -it --name coordinator ahanaio/prestodb-sandbox

391aa2201e4602105f319a2be7d34f98ed4a562467e83231913897a14c873fd0



Step 7:

Edit the etc/config.parameters file inside the container and set the node-scheduler.include-coordinator property to false. Now restart the coordinator.

user@cloudshell:~ (weighty-list-324021)$ docker exec -i -t coordinator bash                                                                                                                       

bash-4.2# vi etc/config.properties

bash-4.2# cat etc/config.properties

coordinator=true

node-scheduler.include-coordinator=false

http-server.http.port=8080

discovery-server.enabled=true

discovery.uri=http://localhost:8080

bash-4.2# exit

exit

user@cloudshell:~ (weighty-list-324021)$ docker restart coordinator

coordinator


Step 8:

Now do a docker commit, create a tag called coordinator based on imageid, this will create a new local image called coordinator.

user@cloudshell:~ (weighty-list-324021)$ docker commit coordinator

Sha256:46ab5129fe8a430f7c6f42e43db5e56ccdf775b48df9228440ba2a0b9a68174c


user@cloudshell:~ (weighty-list-324021)$ docker images

REPOSITORY                 TAG       IMAGE ID       CREATED          SIZE

<none>                     <none>    46ab5129fe8a   15 seconds ago   1.81GB

ahanaio/prestodb-sandbox   latest    76919cf0f33a   34 hours ago     1.81GB


user @cloudshell:~ (weighty-list-324021)$ docker tag 46ab5129fe8a coordinator

user@cloudshell:~ (weighty-list-324021)$ docker images

REPOSITORY                 TAG       IMAGE ID       CREATED              SIZE

coordinator                latest    46ab5129fe8a   About a minute ago   1.81GB

ahanaio/prestodb-sandbox   latest    76919cf0f33a   34 hours ago         1.81GB


Step 9:

Create tag with artifacts path and copy it over to artifacts location

user@cloudshell:~ docker tag coordinator:latest us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/coord:v1


user@cloudshell:~ docker push us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/coord:v1


Step 10:

Deploy the coordinator into the cloud container using the below kubectl commands.

user@cloudshell:~ (weighty-list-324021)$ kubectl create deployment coordinator --image=coordinator

deployment.apps/coordinator created


user@cloudshell:~ (weighty-list-324021)$ kubectl expose deployment coordinator --name=presto-coordinator --type=LoadBalancer --port 8080 --target-port 8080

service/presto-coordinator exposed


user@cloudshell:~ (weighty-list-324021)$ kubectl get service

NAME                 TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)          AGE

kubernetes           ClusterIP      10.7.240.1    <none>          443/TCP          41m

presto-coordinator   LoadBalancer   10.7.248.10   35.239.88.127   8080:30096/TCP   92s


Step 11:

Copy the external IP on a browser and check the status


Step 12:

Now to deploy worker1 into the GKE container, again start a local instance named worker1 using the docker run command.

user@cloudshell:~ docker run -d -p 8080:8080 -it --name worker1 coordinator

1d30cf4094eba477ab40d84ae64729e14de992ac1fa1e5a66e35ae553964b44b

user@cloudshell:~



Step 13:

Edit worker1 config.properties inside the worker1 container to set coordinator to false and http-server.http.port to 8081. Also the discovery.uri should point to the coordinator container running inside the GKE container.

user@cloudshell:~ (weighty-list-324021)$ docker exec -it worker1  bash                                                                                                                             

bash-4.2# vi etc/config.properties

bash-4.2# vi etc/config.properties

bash-4.2# cat etc/config.properties

coordinator=false

http-server.http.port=8081

discovery.uri=http://presto-coordinator01:8080



Step 14:

Stop the local worker1 container, commit the worker1 as image and tag it as worker1 image

user@cloudshell:~ (weighty-list-324021)$ docker stop worker1

worker1

user@cloudshell:~ (weighty-list-324021)$ docker commit worker1

sha256:cf62091eb03702af9bc05860dc2c58644fce49ceb6a929eb6c558cfe3e7d9abf

ram@cloudshell:~ (weighty-list-324021)$ docker images

REPOSITORY                                                            TAG       IMAGE ID       CREATED         SIZE

<none>                                                                <none>    cf62091eb037   6 seconds ago   1.81GB


user@cloudshell:~ (weighty-list-324021)$ docker tag cf62091eb037 worker1:latest

user@cloudshell:~ (weighty-list-324021)$ docker images

REPOSITORY                                                            TAG       IMAGE ID       CREATED         SIZE

worker1                                                               latest    cf62091eb037   2 minutes ago   1.81GB



Step 15:

Push the worker1 image into google artifacts location

user@cloudshell:~ (weighty-list-324021)$ docker tag worker1:latest us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker1:v1


user@cloudshell:~ (weighty-list-324021)$ docker push us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker1:v1

The push refers to repository [us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker1]

b12c3306c4a9: Pushed

.

.

coordinator=false

v1: digest: sha256:fe7db4aa7c9ee04634e079667828577ec4d2681d5ac0febef3ab60984eaff3e0 size: 2201



Step 16:

Deploy and expose the worker1 from the artifacts location into the google cloud container using this kubectl command.

user@cloudshell:~ (weighty-list-324021)$ kubectl create deployment presto-worker01  --image=us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker1:v1                               

deployment.apps/presto-worker01 created


user@cloudshell:~ (weighty-list-324021)$ kubectl expose deployment presto-worker01 --name=presto-worker01 --type=LoadBalancer --port 8081 --target-port 8081                                       

service/presto-worker01 exposed



Step 17:

Check presto UI for successful deployment of worker1

Step 18:

Repeat steps 12 to steps 17 to deploy worker2 inside GKE container:

  • deploy ahana local instance using docker and name it as worker2, 

  • then edit the etc/config.properties inside the worker2 container to set coordinator to false, port to 8082 and discover.uri to the coordinator container name.

  • shut the instance then commit that instance and create docker image as worker2 

  • push that worker2 image to google artifacts location 

  • use kubectl commands to deploy and expose the worker2 instance inside a google container. Now check the prestodb UI for the second worker being active.

  • Check prestodb UI for successful deployment of worker2

user@cloudshell:~ (weighty-list-324021)$ docker run -d -p 8080:8080 -it --name worker2 worker1                                                                                                     

32ace8d22688901c9fa7b406fe94dc409eaf3abfd97229ab3df69ffaac00185d

user@cloudshell:~ (weighty-list-324021)$ docker exec -it worker2 bash

bash-4.2# vi etc/config.properties

bash-4.2# cat etc/config.properties

coordinator=false

http-server.http.port=8082

discovery.uri=http://presto-coordinator01:8080

bash-4.2# exit

exit

user@cloudshell:~ (weighty-list-324021)$ docker commit worker2

sha256:08c0322959537c74f91a6ccbdf78d0876f66df21872ff7b82217693dc3d4ca1e

user@cloudshell:~ (weighty-list-324021)$ docker images

REPOSITORY                                                              TAG       IMAGE ID       CREATED          SIZE

<none>                                                                  <none>    08c032295953   11 seconds ago   1.81GB


user@cloudshell:~ (weighty-list-324021)$ docker tag 08c032295953 worker2:latest


user@cloudshell:~ (weighty-list-324021)$ docker commit worker2

Sha256:b1272b5e824fdebcfd7d434fab7580bb8660cbe29aec8912c24d3e900fa5da11


user@cloudshell:~ (weighty-list-324021)$ docker tag worker2:latest us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker2:v1


user@cloudshell:~ (weighty-list-324021)$ docker push us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker2:v1

The push refers to repository [us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker2]

aae10636ecc3: Pushed

.

.

v1: digest: sha256:103c3fb05004d2ae46e9f6feee87644cb681a23e7cb1cbcf067616fb1c50cf9e size: 2410


user@cloudshell:~ (weighty-list-324021)$ kubectl create deployment presto-worker02  --image=us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker2:v1

deployment.apps/presto-worker02 created


user@cloudshell:~ (weighty-list-324021)$ kubectl expose deployment presto-worker02 --name=presto-worker02 --type=LoadBalancer --port 8082 --target-port 8082

service/presto-worker02 exposed


ram@cloudshell:~ (weighty-list-324021)$ kubectl get service

NAME                   TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)          AGE

kubernetes             ClusterIP      10.7.240.1     <none>           443/TCP          3h35m

presto-coordinator01   LoadBalancer   10.7.241.37    130.211.208.47   8080:32413/TCP   49m

presto-worker01        LoadBalancer   10.7.255.27    34.132.29.202    8081:31224/TCP   9m15s

presto-worker02        LoadBalancer   10.7.254.137   35.239.88.127    8082:31020/TCP   39s


Steps 19:

Repeat steps 12 to steps 18 to provision worker3 inside the google cloud container


user@cloudshell:~ (weighty-list-324021)$ docker run -d -p 8080:8080 -it --name worker3 worker1

6d78e9db0c72f2a112049a677d426b7fa8640e8c1d3aa408a17321bb9353c545


user@cloudshell:~ (weighty-list-324021)$ docker exec -it worker3 bash                                                                                                                              

bash-4.2# vi etc/config.properties

bash-4.2# cat etc/config.properties

coordinator=false

http-server.http.port=8083

discovery.uri=http://presto-coordinator01:8080

bash-4.2# exit

Exit


user@cloudshell:~ (weighty-list-324021)$ docker commit worker3

sha256:689f39b35b03426efde0d53c16909083a2649c7722db3dabb57ff0c854334c06

user@cloudshell:~ (weighty-list-324021)$ docker images

REPOSITORY                                                              TAG       IMAGE ID       CREATED          SIZE

<none>                                                                  <none>    689f39b35b03   25 seconds ago   1.81GB

ahanaio/prestodb-sandbox                                                latest    76919cf0f33a   37 hours ago     1.81GB


user@cloudshell:~ (weighty-list-324021)$ docker tag 689f39b35b03 worker3:latest


user@cloudshell:~ (weighty-list-324021)$ docker tag worker3:latest us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker3:v1


user@cloudshell:~ (weighty-list-324021)$ docker push us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker3:v1

The push refers to repository [us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker3]

b887f13ace4e: Pushed

.

.

v1: digest: sha256:056a379b00b0d43a0a5877ccf49f690d5f945c0512ca51e61222bd537336491b size: 2410


user@cloudshell:~ (weighty-list-324021)$ kubectl create deployment presto-worker03  --image=us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker3:v1

deployment.apps/presto-worker03 created


user@cloudshell:~ (weighty-list-324021)$ kubectl expose deployment presto-worker02 --name=presto-worker03 --type=LoadBalancer --port 8083 --target-port 8083

service/presto-worker03 exposed


Step 20:

Verify the prestodb environment by running the prestodb CLI with simple JMX query

user@presto-coordinator-1:~/presto-server$ ./presto-cli

presto> SHOW TABLES FROM jmx.current;

                                                              Table                                                              

-----------------------------------------------------------------------------------------------------------------------------------

com.facebook.airlift.discovery.client:name=announcer                                                                             

com.facebook.airlift.discovery.client:name=serviceinventory                                                                      

com.facebook.airlift.discovery.store:name=dynamic,type=distributedstore                                                          

com.facebook.airlift.discovery.store:name=dynamic,type=httpremotestore                                                           

com.facebook.airlift.discovery.store:name=dynamic,type=replicator




Summary

In this tutorial, you learned how to provision and run PrestoDB inside Google VM instances and on GKE containers. Now you should be able to validate the functional aspects of PrestodDB. 

If you want to run production Presto workloads at scale and performance, check out Ahana which provides a managed service for Presto.

Docker (software) Kubernetes Presto (SQL query engine) Repository (version control) push Commit (data management) Command (computing)

Published at DZone with permission of Praburam Upendran. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • DZone's Article Submission Guidelines
  • What Is Kafka? Everything You Need to Know
  • Which Backend Frameworks Are Impacting Web App Development Immensely?
  • Open Source Monitoring and Metrics Landscape

Comments

Big Data Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo