DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Databases
  4. How to Run SQL Queries With Presto on Google BigQuery

How to Run SQL Queries With Presto on Google BigQuery

In this hands-on tutorial, you'll learn how to run SQL queries with Presto, the open source SQL query engine for data lakes, on Google BigQuery.

Rohan Pednekar user avatar by
Rohan Pednekar
·
George Wang user avatar by
George Wang
·
Aug. 02, 21 · Tutorial
Like (5)
Save
Tweet
Share
5.86K Views

Join the DZone community and get the full member experience.

Join For Free

Presto has evolved into a unified SQL engine on top of cloud data lakes for both interactive queries as well as batch workloads with multiple data sources. This tutorial will show you how to run SQL queries with Presto (running with Kubernetes) on Google BigQuery.

Presto’s BigQuery connector allows querying the data stored in BigQuery. This can be used to join data between different systems like BigQuery and Hive. The connector uses the BigQuery Storage API to read the data from the tables.

Step 1: Set up a Presto Cluster With Kubernetes 

Set up your own Presto cluster on Kubernetes using these instructions or you can use Ahana’s managed service for Presto.

Step 2: Set up a Google BigQuery Project With Google Cloud Platform

Create a Google BigQuery project from Google Cloud Console and make sure it’s up and running with dataset and tables as described here.

The below screen shows the Google BigQuery project with table “Flights” 


Step 3: Set up a Key and Download Google BigQuery Credential JSON File

To authenticate the BigQuery connector to access the BigQuery tables, create a credential key and download it in JSON format. 

Use a service account JSON key and GOOGLE_APPLICATION_CREDENTIALS as described here. 


Sample credential file should look like this 

JSON
 
{

  "type": "service_account",

  "project_id": "poised-journey-315406",

  "private_key_id": "5e66dd1787bb1werwerd5ddf9a75908b7dfaf84c",

  "private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwgKozSEK84b\ntNDXrwaTGbP8ZEddTSzMZQxcX7j3t4LQK98OO53i8Qgk/fEy2qaFuU2yM8NVxdSr\n/qRpsTL/TtDi8pTER0fPzdgYnbfXeR1Ybkft7+SgEiE95jzJCD/1+We1ew++JzAf\nZBNvwr4J35t15KjQHQSa5P1daG/JufsxytY82fW02JjTa/dtrTMULAFOSK2OVoyg\nZ4feVdxA2TdM9E36Er3fGZBQHc1rzAys4MEGjrNMfyJuHobmAsx9F/N5s4Cs5Q/1\neR7KWhac6BzegPtTw2dF9bpccuZRXl/mKie8EUcFD1xbXjum3NqMp4Gf7wxYgwkx\n0P+90aE7AgMBAAECggEAImgvy5tm9JYdmNVzbMYacOGWwjILAl1K88n02s/x09j6\nktHJygUeGmp2hnY6e11leuhiVcQ3XpesCwcQNjrbRpf1ajUOTFwSb7vfj7nrDZvl\n4jfVl1b6+yMQxAFw4MtDLD6l6ljKSQwhgCjY/Gc8yQY2qSd+Pu08zRc64x+IhQMn\nne1x0DZ2I8JNIoVqfgZd0LBZ6OTAuyQwLQtD3KqtX9IdddXVfGR6/vIvdT4Jo3en\nBVHLENq5b8Ex7YxnT49NEXfVPwlCZpAKUwlYBr0lvP2WsZakNCKnwMgtUKooIaoC\nSBxXrkmwQoLA0DuLO2B7Bhqkv/7zxeJnkFtKVWyckQKBgQC4GBIlbe0IVpquP/7a\njvnZUmEuvevvqs92KNSzCjrO5wxEgK5Tqx2koYBHhlTPvu7tkA9yBVyj1iuG+joe\n5WOKc0A7dWlPxLUxQ6DsYzNW0GTWHLzW0/YWaTY+GWzyoZIhVgL0OjRLbn5T7UNR\n25opELheTHvC/uSkwA6zM92zywKBgQC3PWZTY6q7caNeMg83nIr59+oYNKnhVnFa\nlzT9Yrl9tOI1qWAKW1/kFucIL2/sAfNtQ1td+EKb7YRby4WbowY3kALlqyqkR6Gt\nr2dPIc1wfL/l+L76IP0fJO4g8SIy+C3Ig2m5IktZIQMU780s0LAQ6Vzc7jEV1LSb\nxPXRWVd6UQKBgQCqrlaUsVhktLbw+5B0Xr8zSHel+Jw5NyrmKHEcFk3z6q+rC4uV\nMz9mlf3zUo5rlmC7jSdk1afQlw8ANBuS7abehIB3ICKlvIEpzcPzpv3AbbIv+bDz\nlM3CdYW/CZ/DTR3JHo/ak+RMU4N4mLAjwvEpRcFKXKsaXWzres2mRF43BQKBgQCY\nEf+60usdVqjjAp54Y5U+8E05u3MEzI2URgq3Ati4B4b4S9GlpsGE9LDVrTCwZ8oS\n8qR/7wmwiEShPd1rFbeSIxUUb6Ia5ku6behJ1t69LPrBK1erE/edgjOR6SydqjOs\nxcrW1yw7EteQ55aaS7LixhjITXE1Eeq1n5b2H7QmkQKBgBaZuraIt/yGxduCovpD\nevXZpe0M2yyc1hvv/sEHh0nUm5vScvV6u+oiuRnACaAySboIN3wcvDCIJhFkL3Wy\nbCsOWDtqaaH3XOquMJtmrpHkXYwo2HsuM3+g2gAeKECM5knzt4/I2AX7odH/e1dS\n0jlJKzpFpvpt4vh2aSLOxxmv\n-----END PRIVATE KEY-----\n",

  "client_email": "bigquery@poised-journey-678678.iam.gserviceaccount.com",

  "client_id": "11488612345677453667",

  "auth_uri": "https://accounts.google.com/o/oauth2/auth",

  "token_uri": "https://oauth2.googleapis.com/token",

  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",

  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x505/bigquery%40poised-journey-315406.iam.gserviceaccount.com"

}


Pro-Tip: Before you move to the next step please try to use your downloaded credential JSON file with other third-party sql tools like DBeaver to access your BigQuery Table. This is to make sure that your credentials have valid access rights or to isolate any issue with your credentials.

Step 4: Configure Presto Catalog for Google BigQuery Connector

To configure the BigQuery connector, you need to create a catalog properties file in etc/catalog named, for example, bigquery.properties, to mount the BigQuery connector as the bigquery catalog. You can create the file with the following contents, replacing the connection properties as appropriate for your setup. This should be done via the edit configmap to make sure it's reflected in the deployment:

kubectl edit configmap presto-catalog -n <cluster_name> -o yaml 

The following are the catalog properties that need to be added:

connector.name=bigquery 
bigquery.project-id=<your Google Cloud Platform project id> 
bigquery.credentials-file=patch/for/bigquery-credentials.json

The following are the sample entries for the catalog yaml file:

bigquery.properties:  | 
  connector.name=bigquery 
  bigquery.project-id=poised-journey-317806 
  bigquery.credentials-file=/opt/presto-server/etc/bigquery-credential.json

Step 5: Configure Presto Coordinator and Workers With Google BigQuery Credential File

To configure the BigQuery connector:

  1. Load the content of a credential file as bigquery-credential.json in presto coordinator’s configmap: 

kubectl edit configmap presto-coordinator-etc -n <cluster_name> -o yaml


  1. Add a new session of volumeMounts for the credential file in coordinator’s deployment file: 

kubectl edit deployment presto-coordinator -n <cluster_name> 

Following the sample configuration, That you can append in your coordinator's deployment file at the end of volumeMounts section:

volumeMounts: 
- mountPath:/opt/presto-server/etc/bigquery-credential.json  
  name: presto-coordinator-etc-vol  
  subPath: bigquery-credential.json

  1. Load the content of a credential file as bigquery-credential.json in presto worker's configmap: 

kubectl edit configmap presto-worker-etc -n <cluster_name>  -o yaml


  1. Add a new session of volumeMounts for the credential file in the worker’s deployment file: 

kubectl edit deployment presto-worker -n <cluster_name> 

Following the sample configuration, That you can append in your worker’s deployment file at the end of volumeMounts section:

volumeMounts: 
- mountPath:/opt/presto-server/etc/bigquery-credential.json   
  name: presto-worker-etc-vol   
  subPath: bigquery-credential.json

Step 6: Set up Database Connection With Apache Superset

Create your own database connection URL to query from Superset with the below syntax

presto://<username>:<password>@bq.rohan1.dev.app:443/<catalog_name>

Step 7: Check for Available Datasets, Schemas and Tables, etc.

After successfully database connection with Superset, Run the following queries and make sure that the bigquery catalog gets picked up and perform show schemas and show tables to understand available data.  

show catalogs;

show schemas from bigquery;

show tables from bigquery.rohan88;


Step 8: Run SQL Query From Apache Superset to Access BigQuery Table

Once you access your database schema, you can run SQL queries against the tables as shown below. 

select * from catalog.schema.table;

select * from bigquery.rohan88.flights LIMIT1;


You can perform similar queries from Presto Cli as well, here is another example of running sql queries on different Bigquery dataset from Preso Cli. 

$./presto-cli.jar --server https://<presto.cluster.url> 
--catalog bigquery --schema <schema_name> --user 
<presto_username> --password

The following example shows how you can join Google BigQuery table with the Hive table from S3 and run sql queries. 


And now you should be able to query BigQuery using Presto!

Database connection Presto (SQL query engine) Google (verb) sql

Published at DZone with permission of Rohan Pednekar. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Reliability Is Slowing You Down
  • Testing Level Dynamics: Achieving Confidence From Testing
  • 11 Observability Tools You Should Know
  • Chaos Engineering Tutorial: Comprehensive Guide With Best Practices

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: