Tutorial: Presto + Alluxio + Hive Metastore on Your laptop in 10 Minutes
Get up and running with Presto, Alluxio, and Hive in minutes!
Join the DZone community and get the full member experience.
Join For FreeThis tutorial guides users to set up a stack of Presto, Alluxio, and Hive Metastore on their local server. Additionally, it demonstrates how to use Alluxio as the caching layer for Presto queries. In this stack, Presto is the SQL Engine to plan and execute queries, Alluxio is the data orchestration layer to cache hot tables for Presto, and Hive Metastore is the catalog service for Presto to read table schema and location information.
Step1: Download and Launch Alluxio
Go to Alluxio download page, select the latest version (e.g., 2.0.1 when this blog is written), download, and unpack the tarball.
$ tar -xzf alluxio-2.0.1-bin.tar.gz
$ cd alluxio-2.0.1
Start a new deployment with a local master and local worker:
$ bin/alluxio-start.sh local -f
Mount an S3 bucket which has example input data for this tutorial to Alluxio file system. Note that this S3 bucket is a public bucket accessible to all AWS users.
$ bin/alluxio fs mount --readonly /example \
s3://apc999/presto-tutorial/example-reason/
You may also like: Four Ways to Write to Alluxio.
Step 2: Start Hive Metastore
If it is the first time for you to launch Hive Metastore, initialize a new Metastore
$ ${HIVE_HOME}/bin/schematool -dbType derby -initSchema
Edit ${HIVE_HOME}/conf/hive-env.sh
to include Alluxio client jar on the Hive classpath, so Hive Metastore can access files stored on Alluxio.
export HIVE_AUX_JARS_PATH=/path/to/alluxio/client/alluxio-2.0.1-client.jar
Start a Hive Metastore which will run in the background and listen on port 9083 (by default).
$ ${HIVE_HOME}/hcatalog/sbin/hcat_server.sh start
One can check the Hive Metastore logs at ${HIVE_HOME}/hcatalog/var/log/
Step 3: Launch Presto Server
Goto Presto server installation and make a symbol link to make Alluxio client available for Presto Hive Connector.
$ cd ${PRESTO_HOME}
$ ln -s /path/to/alluxio/client/alluxio-2.0.1-client.jar \
plugin/hive-hadoop2/alluxio-client.jar
Edit etc/config.properties
file to include
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
discovery-server.enabled=true
discovery.uri=http://localhost:8080
Also edit etc/catalog/hive.properties
to point to the Hive Metastore service we just started.
connector.name=hive-hadoop2
hive.metastore.uri=thrift://localhost:9083
Run the Presto server:
$ ./bin/launcher run
Step 4: Launch a Presto CLI and Run Queries
Start a Prest CLI connection to the server started in previous step.
$ ./presto --server localhost:8080 --catalog hive --debug
presto> USE default;
USE
Create a new table based on the file mounted in Alluxio:
presto:default> DROP TABLE IF EXISTS reason;
DROP TABLE
presto:default> CREATE TABLE reason (
r_reason_sk bigint,
r_reason_id varchar,
r_reason_desc varchar
) WITH (
external_location = 'alluxio://localhost:19998/example',
format = 'PARQUET'
);
CREATE TABLE
Scan the newly create table:
presto:default> SELECT * FROM reason;
r_reason_sk | r_reason_id | r_reason_desc
-------------+------------------+---------------------------------------------
1 | AAAAAAAABAAAAAAA | Package was damaged
4 | AAAAAAAAEAAAAAAA | Not the product that was ordred
5 | AAAAAAAAFAAAAAAA | Parts missing
6 | AAAAAAAAGAAAAAAA | Does not work with a product that I have
7 | AAAAAAAAHAAAAAAA | Gift exchange
16 | AAAAAAAAABAAAAAA | Did not fit
17 | AAAAAAAABBAAAAAA | Wrong size
18 | AAAAAAAACBAAAAAA | Lost my job
19 | AAAAAAAADBAAAAAA | unauthoized purchase
20 | AAAAAAAAEBAAAAAA | duplicate purchase
21 | AAAAAAAAFBAAAAAA | its is a boy
22 | AAAAAAAAGBAAAAAA | it is a girl
23 | AAAAAAAAHBAAAAAA | reason 23
24 | AAAAAAAAIBAAAAAA | reason 24
25 | AAAAAAAAJBAAAAAA | reason 25
26 | AAAAAAAAKBAAAAAA | reason 26
27 | AAAAAAAALBAAAAAA | reason 27
28 | AAAAAAAAMBAAAAAA | reason 28
29 | AAAAAAAANBAAAAAA | reason 29
30 | AAAAAAAAOBAAAAAA | reason 31
31 | AAAAAAAAPBAAAAAA | reason 31
32 | AAAAAAAAACAAAAAA | reason 32
33 | AAAAAAAABCAAAAAA | reason 33
34 | AAAAAAAACCAAAAAA | reason 34
35 | AAAAAAAADCAAAAAA | reason 35
2 | AAAAAAAACAAAAAAA | Stopped working
3 | AAAAAAAADAAAAAAA | Did not get it on time
8 | AAAAAAAAIAAAAAAA | Did not like the color
9 | AAAAAAAAJAAAAAAA | Did not like the model
10 | AAAAAAAAKAAAAAAA | Did not like the make
11 | AAAAAAAALAAAAAAA | Did not like the warranty
12 | AAAAAAAAMAAAAAAA | No service location in my area
13 | AAAAAAAANAAAAAAA | Found a better price in a store
14 | AAAAAAAAOAAAAAAA | Found a better extended warranty in a store
15 | AAAAAAAAPAAAAAAA | Not working any more
(35 rows)
After a few seconds, you will see the following output:
Query 20191012_010912_00006_9jx7r, FINISHED, 1 node
http://localhost:8080/ui/query.html?20191012_010912_00006_9jx7r
Splits: 17 total, 17 done (100.00%)
CPU Time: 0.4s total, 83 rows/s, 2.35KB/s, 62% active
Per Node: 0.4 parallelism, 37 rows/s, 1.04KB/s
Parallelism: 0.4
Peak Memory: 0B
0:01 [35 rows, 1002B] [37 rows/s, 1.04KB/s]
Summary
In this tutorial, we demonstrated how to run setup Presto, Hive Metastore and Alluxio to run SQL queries. Feel free to ask questions at our Alluxio community slack channel.
Further Reading
Opinions expressed by DZone contributors are their own.
Trending
-
How To Manage Vulnerabilities in Modern Cloud-Native Applications
-
DevOps Midwest: A Community Event Full of DevSecOps Best Practices
-
How Web3 Is Driving Social and Financial Empowerment
-
Top Six React Development Tools
Comments