Building a Retrieval-Augmented Generation (RAG) System in Java With Spring AI, Vertex AI, and BigQuery

Build a Java RAG application using Spring Boot, Vertex AI embeddings, BigQuery vector search, and a web UI for interactive PDF-based question answering.

Nov. 24, 25 · Tutorial

Likes (8)

Comment

Save

7.1K Views

Retrieval-augmented generation (RAG) is quickly becoming one of the most powerful design patterns for AI applications. It bridges the gap between general-purpose large language models (LLMs) and your specific enterprise data. In this article, we’ll walk through how to build a complete RAG pipeline in Java using Spring Boot, Vertex AI’s Gemini embeddings, Apache PDFBox, and BigQuery Vector Search.

You will see how to do the following, wrapped in a Spring Boot app with a simple web UI:

Upload a PDF
Generate embeddings using Vertex AI
Store them in BigQuery
Ask natural-language questions against your document

What Is RAG?

A retrieval-augmented generation (RAG) system enhances an LLM’s output by combining retrieval (search) and generation (LLM response). It works by fetching the most relevant document chunks before generating an answer, ensuring contextually accurate, up-to-date, and source-grounded responses.

Here’s the conceptual flow:

Introducing Spring AI

While frameworks like LangChain dominate Python-based GenAI development, Java developers now have a native, production-ready alternative: Spring AI. Built and maintained by the Spring team, Spring AI extends the familiar Spring Boot ecosystem to the world of large-language-model applications.

What Spring AI Does

Spring AI provides a simple, consistent abstraction layer for calling AI models - text generation, embeddings, or chat - without dealing with raw REST endpoints or authentication boilerplate. It automatically manages:

Model configuration through application.properties
Prompt orchestration and message handling
Credential resolution using Google Cloud, OpenAI, or other providers
Seamless integration with the rest of your Spring Boot application stack

In this project, Spring AI handles communication with Vertex AI’s gemini-embedding-001 model, simplifying API calls for embedding generation while remaining fully compatible with Spring WebFlux and dependency injection.

Why It’s Useful Here

Integrating Spring AI lets us:

Use the same Spring idioms (Beans, Controllers, Configuration) to build GenAI apps
Easily switch between embedding and chat models from different providers
Keep the application lightweight and cloud-ready for deployment on Cloud Run
Maintain testability and observability consistent with enterprise Spring Boot services

How It Works

1. PDF Upload and Parsing

When a user uploads a PDF through the web interface, it’s processed by Apache PDFBox — a reliable library for extracting text from PDF documents.

    Java
   
   PDFTextStripper stripper = new PDFTextStripper();
String text = stripper.getText(document);

The text is then split into manageable chunks (e.g., 500 characters with 100-character overlap) to make retrieval more precise.

    Java
   
 

   private List<String> chunkText(String text, int chunkSize, int overlap) {
    List<String> chunks = new ArrayList<>();
    for (int i = 0; i < text.length(); i += (chunkSize - overlap)) {
        chunks.add(text.substring(i, Math.min(text.length(), i + chunkSize)));
    }
    return chunks;
}
  

2. Generating Embeddings With Vertex AI

Each chunk is sent to Vertex AI’s gemini-embedding-001 model to get a 3072-dimensional embedding vector representing its semantic meaning.

    Java
   
 

   String url = String.format(
  "/v1/projects/%s/locations/%s/publishers/google/models/gemini-embedding-001:predict",
  projectId, location
);
String body = "{ \"instances\": [{\"content\": \"" + text + "\"}] }";
String response = webClient.post()
    .uri(url)
    .bodyValue(body)
    .retrieve()
    .bodyToMono(String.class)
    .block();
  

The resulting embedding vectors are stored in BigQuery as an ARRAY<FLOAT64> column.

3. Storing and Searching in BigQuery

Each embedding, along with its text chunk and metadata, is inserted into a BigQuery table:

    SQL
   
 

   CREATE TABLE rag_dataset.doc_embeddings (
  doc_id STRING,
  chunk_id STRING,
  content STRING,
  embedding ARRAY<FLOAT64>
);
  

The app uses the BigQuery Java SDK to insert rows:

    Java
   
 

   TableId tableId = TableId.of("rag_dataset", "doc_embeddings");
InsertAllRequest insertRequest = InsertAllRequest.newBuilder(tableId)
    .addRow(Map.of(
        "doc_id", docId,
        "chunk_id", chunkId,
        "content", content,
        "embedding", embedding
    ))
    .build();
bigQuery.insertAll(insertRequest);
  

When a user asks a question, the app embeds it the same way, and runs a vector similarity search in BigQuery using the VECTOR_SEARCH function:

    SQL
   
 

   SELECT content
FROM VECTOR_SEARCH(
  TABLE rag_dataset.doc_embeddings,
  'embedding',
  (SELECT [0.12, 0.45, -0.23, ...] AS embedding),
  top_k => 3,
  distance_type => 'COSINE'
);
  

4. Presenting Answers via the Web UI

The application returns the most semantically relevant chunks to the web interface, giving users an immediate, context-rich response.

The simple Thymeleaf-based frontend lets you:

Upload PDFs
Ask questions
View results in real time

    HTML
   
 

   <form action="/api/upload" method="post" enctype="multipart/form-data">
  <input type="file" name="file" />
  <button type="submit">Upload</button>
</form>

<form id="askForm">
  <input type="text" id="question" name="question" placeholder="Ask your question" />
  <button type="submit">Ask</button>
</form>
  

Building and Running the Application

Prerequisites

Java 17+
Maven 3.8+
Google Cloud SDK with Vertex AI and BigQuery APIs enabled
Application Default Credentials (gcloud auth application-default login)

Build and Run

    Shell
   
   mvn clean install
mvn spring-boot:run

Then open http://localhost:8080 to access the UI.

Dataset and Table Setup

Use the BigQuery console or CLI to create your dataset and table:

    Shell
   
 

   bq mk rag_dataset
bq query --use_legacy_sql=false \
'CREATE TABLE rag_dataset.doc_embeddings (
  doc_id STRING,
  chunk_id STRING,
  content STRING,
  embedding ARRAY<FLOAT64>
);'
  

Conclusion

With just a few hundred lines of Java and Spring Boot code, you can stand up a production-ready RAG pipeline powered by Google Cloud. This architecture cleanly separates ingestion, embedding, and retrieval, making it a robust kickstart for enterprise AI applications.

View the full application on GitHub here.

AI Java (programming language) Spring Boot RAG

Opinions expressed by DZone contributors are their own.

Related

Trending