DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • AI Summarization: Extractive and Abstractive Techniques
  • Applying ML and AI for Effective Spamhaus Email Etiquette
  • AI-Driven Intent-Based Networking: The Future of Network Management Using AI
  • How Machine Learning and AI are Transforming Healthcare Diagnostics in Mobile Apps

Trending

  • Rust, WASM, and Edge: Next-Level Performance
  • Endpoint Security Controls: Designing a Secure Endpoint Architecture, Part 2
  • IoT and Cybersecurity: Addressing Data Privacy and Security Challenges
  • System Coexistence: Bridging Legacy and Modern Architecture
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Getting Started With GenAI on BigQuery: A Step-by-Step Guide

Getting Started With GenAI on BigQuery: A Step-by-Step Guide

This article shows how to use Google Cloud's BigQuery ML and Gemini to analyze and summarize text data directly within BigQuery using SQL and ML.GENERATE_TEXT function.

By 
Nivedita Kumari user avatar
Nivedita Kumari
·
May. 08, 25 · Tutorial
Likes (0)
Comment
Save
Tweet
Share
2.2K Views

Join the DZone community and get the full member experience.

Join For Free

Overview

Companies are continuously looking for methods to extract important information from their continuously growing databases. Machine learning has become a game-changer in this mission, allowing us to predict user behavior, discover emerging trends, and automate complex decision-making.

Google Cloud's BigQuery ML has completely redefined the way we deal with ML by bringing model training directly to the data warehouse. This means we can come up with sophisticated ML models by writing simple SQL queries and deploying them, which removes the need to move complex data or integrate with other separate ML platforms.

What if you could elevate your data analysis even more? Picture being able to easily analyze and grasp the nuances of textual data, revealing a treasure of insights buried in customer feedback, social media comments, or product reviews. This is where Gemini, Google's advanced family of large language models, steps in. Gemini provides natural language processing (NLP) features, allowing us to carry out tasks such as sentiment analysis, topic extraction, and text classification with unmatched precision.

This comprehensive guide unveils the power of generative AI within the BigQuery environment, offering a practical, hands-on approach to harnessing its potential. A key highlight of this post is that it will walk you step by step on how you can use Gemini model to summarize Gemini product release notes, which will provide us with the most up-to-date insights into this groundbreaking technology. And, for this we will be using a BigQuery public dataset which stores information on product releases. 

Setup Instructions

  1. Before starting, choose your GCP project, link a billing account, and enable the necessary API; full instructions here.
  2. Create a BigQuery dataset that will store our model, following the steps here.
  3. Create a cloud resource connection and get the connection's service account; the full guide is here.
  4. Grant access to the service account by following the steps here. 
  5. Let’s take a look at the dataset that we would be using:
SQL
 
SELECT * FROM `bigquery-public-data.google_cloud_release_notes.release_notes`
LIMIT 10;


Product Release Notes Summary

Let’s walk through an example of summarizing the release notes for Gemini.

1. Create a Model

Create a remote model in BigQuery that utilizes a Vertex AI foundation model.

Syntax:

SQL
 
CREATE OR REPLACE MODEL
`PROJECT_ID.DATASET_ID.MODEL_NAME`
REMOTE WITH CONNECTION `PROJECT_ID.REGION.CONNECTION_ID`
OPTIONS (ENDPOINT = 'ENDPOINT');


Code example:

  1. Replace '[PROJECT_ID.DATASET_ID.MODEL_NAME]' with your project_id, dataset_id, and model name.
  2. Replace '[PROJECT_ID.REGION.CONNECTION_ID]' with your project_id, region, and connection_id.
SQL
 
CREATE OR REPLACE MODEL
`[PROJECT_ID.DATASET_ID.MODEL_NAME]`
REMOTE WITH CONNECTION `[PROJECT_ID.REGION.CONNECTION_ID]`
OPTIONS (ENDPOINT = 'gemini-pro');


2. Let’s Create a Sample of the Dataset

For this example, we would only work with a subset of the data. We will focus on release_note_type for the product “Gemini” starting from the year 2023. Hence, we would create a table for this subset of the dataset. 

We are also creating a new column called 'combined_description', where we are aggregating the value of the column description based on release_note_type.

Replace '[PROJECT_ID.DATASET_ID.TABLE_NAME]' with your project_id, dataset_id, and table_name:

SQL
 
CREATE or REPLACE TABLE `[PROJECT_ID.DATASET_ID.TABLE_NAME]` AS
SELECT
 release_note_type,
 ARRAY_TO_STRING(ARRAY_AGG(description), "; ") AS combined_description
FROM
`bigquery-public-data.google_cloud_release_notes.release_notes`
WHERE published_at > '2023-01-01'
AND product_name = "Gemini"
GROUP BY
 release_note_type;
 
 SELECT * FROM `[PROJECT_ID.DATASET_ID.TABLE_NAME]`;


3. Generate Text

With just a few lines of SQL, we can analyze text or visual content in our BigQuery table using that model and the ML.GENERATE_TEXT function.

The ML.GENERATE_TEXT syntax differs depending on the Vertex AI model that your remote model targets. Read the documentation to understand all the parameters of the ML.GENERATE_TEXT function.

Syntax:

SQL
 
ML.GENERATE_TEXT(
MODEL project_id.dataset.model,
{ TABLE project_id.dataset.table | (query_statement) },
STRUCT(
  [max_output_tokens AS max_output_tokens]
  [, top_k AS top_k]
  [, top_p AS top_p]
  [, temperature AS temperature]
  [, flatten_json_output AS flatten_json_output]
  [, stop_sequences AS stop_sequences])
)


Code example:

  1. Replace '[PROJECT_ID.DATASET_ID.TABLE_NAME]' with your project_id and dataset_id.
  2. Replace '[PROJECT_ID.DATASET_ID.MODEL_NAME]' with your project_id, dataset_id, and model name.
SQL
 

CREATE OR REPLACE TABLE `[PROJECT_ID.DATASET_ID.TABLE_NAME]` AS

WITH 
PROMPT AS (
 SELECT release_note_type, combined_description, CONCAT('You are a helpful AI assistant summarizing Google Cloud release notes.  Your task is to provide a concise summary of the  release notes ' , combined_description)
 AS prompt
   FROM 
   `[PROJECT_ID.DATASET_ID.TABLE_NAME]`
),
SUMMARY_GENERATION AS (
 SELECT *
 FROM
 ML.GENERATE_TEXT(
   MODEL `[PROJECT_ID.DATASET_ID.MODEL_NAME]`,
   (SELECT * FROM PROMPT),
   STRUCT(150 AS max_output_tokens, 
          0.25 AS temperature,
          35 AS top_k, 
          1.0 AS top_p, 
          TRUE AS flatten_json_output))
)

SELECT release_note_type, combined_description, ml_generate_text_llm_result as review_summary, FROM SUMMARY_GENERATION;

SELECT * FROM `[PROJECT_ID.DATASET_ID.TABLE_NAME]`


4. Result

We instructed the model to act as an "AI assistant summarizing Google Cloud release notes". The model then processed these notes and generated a concise summary.

Result

The ML.GENERATE_TEXT function returns the input table plus the following columns:

  1. ml_generate_text_result: This is the JSON response, and the generated text is in the text element.
  2. ml_generate_text_llm_result: A STRING value that contains the generated text. This column is returned when flatten_json_output is TRUE.
  3. ml_generate_text_status: A STRING value that contains the API response status for the corresponding row. This value is empty if the operation was successful. 
  4. ml_generate_text_grounding_result: A STRING value that contains a list of the grounding sources that the model used to gather additional information.

Conclusion

Overall, we can use Gemini to do prompt engineering and analyze our BigQuery data row by row at scale. BigQuery's integration with the Gemini models through Vertex AI allows us to interact with Gemini using SQL and take advantage of the scale of BigQuery. It helps us analyze structured, semi-structured, and unstructured data to gain new insights and enable novel, analytical applications.

We can use Gemini for several tasks, including text summarization, questioning and answering, feature extraction, and sentiment analysis. It can also help generate content and enhance data. Furthermore, the Gemini family of models, with its advanced multimodality feature, can take input like a combination of text, images, or video. This opens doors for numerous use cases like object recognition, captioning, description, and digital content understanding.

Check out the ML.GENERATE_TEXT function to learn more about available models and how to generate text using the ML.GENERATE_TEXT function here.

AI Machine learning NLP

Opinions expressed by DZone contributors are their own.

Related

  • AI Summarization: Extractive and Abstractive Techniques
  • Applying ML and AI for Effective Spamhaus Email Etiquette
  • AI-Driven Intent-Based Networking: The Future of Network Management Using AI
  • How Machine Learning and AI are Transforming Healthcare Diagnostics in Mobile Apps

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!