Building a Lightweight Trino Distribution

Steps to a stable and high-performing Trino configuration that is 391 MB.

Rob Dickinson

Jun. 23, 21 · Tutorial

Likes (2)

Comment

Save

7.5K Views

Too many data frameworks built for large scale have unacceptable complexity at small scale. But with a few tweaks, Trino scales down to run nicely on small single-container configurations.

Trino (fka PrestoSQL) is an open source query distribution engine.

Official Docker Image Is Large

The docker image provided by the Trino team (trinodb/trino) is 1.32 GB when extracted. This includes a full CentOS distribution, which is a safe and comfortable choice. But this is pretty large for cases where Trino is embedded into another application.

Picking a Smaller Base Image

Much of the weight from the official Trino container is from the base CentOS image.

    Java
   
   FROM azul/zulu-openjdk-centos:11

Switching to an Alpine-based distribution like adoptopenjdk cuts the download size dramatically.

    Java
   
   FROM adoptopenjdk/openjdk11:jdk-11.0.10_9-alpine-slim

Pick your Alpine distribution carefully! We've seen significant performance degradations for Java applications when using Alpine distributions that don't include glibc. The adoptopenjdk containers have good performance while still being relatively small.

Reducing the Number of Connectors

The next step is optional, but has a big impact on container size. Trino ships with many pre-installed connectors, each of which requires supporting libraries.

However, these connectors aren't all strictly required. For our single-container distributions, we strip out all the optional connectors except for our own Resurface connector.

    Java
   
 

   rm -rf /opt/trino/plugin/accumulo &&\
rm -rf /opt/trino/plugin/atop &&\
rm -rf /opt/trino/plugin/bigquery &&\
rm -rf /opt/trino/plugin/blackhole &&\
rm -rf /opt/trino/plugin/cassandra &&\
rm -rf /opt/trino/plugin/clickhouse &&\
rm -rf /opt/trino/plugin/druid &&\
rm -rf /opt/trino/plugin/elasticsearch &&\
rm -rf /opt/trino/plugin/example-http &&\
rm -rf /opt/trino/plugin/geospatial &&\
rm -rf /opt/trino/plugin/google-sheets &&\
rm -rf /opt/trino/plugin/hive-hadoop2 &&\
rm -rf /opt/trino/plugin/iceberg &&\
rm -rf /opt/trino/plugin/jmx &&\
rm -rf /opt/trino/plugin/kafka &&\
rm -rf /opt/trino/plugin/kinesis &&\
rm -rf /opt/trino/plugin/kudu &&\
rm -rf /opt/trino/plugin/local-file &&\
rm -rf /opt/trino/plugin/memsql &&\
rm -rf /opt/trino/plugin/ml &&\
rm -rf /opt/trino/plugin/mongodb &&\
rm -rf /opt/trino/plugin/mysql &&\
rm -rf /opt/trino/plugin/oracle &&\
rm -rf /opt/trino/plugin/phoenix &&\
rm -rf /opt/trino/plugin/phoenix5 &&\
rm -rf /opt/trino/plugin/pinot &&\
rm -rf /opt/trino/plugin/postgresql &&\
rm -rf /opt/trino/plugin/prometheus &&\
rm -rf /opt/trino/plugin/raptor-legacy &&\
rm -rf /opt/trino/plugin/redis &&\
rm -rf /opt/trino/plugin/redshift &&\
rm -rf /opt/trino/plugin/sqlserver &&\
rm -rf /opt/trino/plugin/teradata-functions &&\
rm -rf /opt/trino/plugin/thrift &&\
rm -rf /opt/trino/plugin/tpcds &&\
rm -rf /opt/trino/plugin/tpch
  

Tuning Memory Parameters

Trino is very tunable when it comes to memory usage. But beyond that, the Trino team doesn't discourage small configurations. When I had the chance to ask Martin Traverso about this, his reaction was that they expect Trino to pass all tests when running on a small laptop-sized configuration, just the same as on a large configuration. The fact that Martin reacted this way gave us renewed confidence to experiment with smaller configurations.

For our smallest containers, we limit Trino to 1GB of memory using these standard parameters.

    Java
   
 

   query.max-length=1000000
query.max-memory=1000MB
query.max-memory-per-node=1000MB
query.max-total-memory=1000MB
query.max-total-memory-per-node=1000MB
  

If you're still seeing out-of-memory conditions, you may also want to reduce the memory used by the query cache. This is especially important if your SQL statements are large, or if your transaction rates are relatively high so that a lot of query history data is being cached.

    Java
   
   query.max-history=20
query.min-expire-age=1s

Final Results

Following these steps yields a stable and high-performing Trino configuration that is 391 MB. That's just 30% of the download size of the standard Trino container! This doesn't come without tradeoffs, but is great to have this range in flexibility.

If you're looking for a minimal Trino container image, you can use ours as a base. (The version tag corresponds to the Trino version)

    Java
   
   FROM resurfaceio/trino-minimal:358

Or you can inspect this Dockerfile for ideas on how to build your own lightweight Trino image.

https://github.com/resurfaceio/containers/blob/master/trino/trino-minimal.dockerfile

Distribution (differential geometry) Docker (software)

Published at DZone with permission of Rob Dickinson. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending