DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Building AI-Driven Intelligent Applications: A Hands-On Development Guide for Integrating GenAI Into Your Applications
  • Supercharging Pytest: Integration With External Tools
  • Integrated Gradients: AI Explainability for Regulated Industries
  • Building a Food Inventory Management App With Next.js, Material-UI, Firebase, Flask, and Hugging Face

Trending

  • Supervised Fine-Tuning (SFT) on VLMs: From Pre-trained Checkpoints To Tuned Models
  • Traditional Testing and RAGAS: A Hybrid Strategy for Evaluating AI Chatbots
  • Go 1.24+ Native FIPS Support for Easier Compliance
  • Creating a Web Project: Caching for Performance Optimization
  1. DZone
  2. Software Design and Architecture
  3. Integration
  4. Use pgvector With PostgreSQL To Improve LLM Accuracy and Performance

Use pgvector With PostgreSQL To Improve LLM Accuracy and Performance

This article shows how to improve LLM accuracy and performance with pgvector and PostgreSQL. It explores embeddings and why the role of pgvector is crucial.

By 
Merlin Walter user avatar
Merlin Walter
·
Jan. 04, 24 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
7.2K Views

Join the DZone community and get the full member experience.

Join For Free

If you’re not yet familiar with the open-source pgvector extension for PostgreSQL, now’s the time to do so. The tool is extremely helpful for searching text data fast without needing a specialized database to store embeddings.

Embeddings represent word similarity and are stored as vectors (a list of numbers). For example, the words “tree” and “bush” are related more closely than “tree” and “automobile.” The open-source pgvector tool makes it possible to search for closely related vectors and find text with the same semantic meaning. This is a major advance for text-based data, and an especially valuable tool for building Large Language Models (LLMs)... and who isn’t right now?

By turning PostgreSQL into a high-performance vector store with distance-based embedding search capabilities, pgvector allows users to explore vast textual data easily. This also enables exact nearest neighbor search and approximate nearest neighbor search using L2 (or Euclidian) distance, inner product, and cosine distance. Cosine distance is recommended by OpenAI for capturing semantic similarities efficiently.

Using Embeddings in Retrieval Augmented Generation (RAG) and LLMs

Embeddings can play a valuable role in the Retrieval Augmented Generation (RAG) process, which is used to fine-tune LLMs on new knowledge. The process includes retrieving relevant information from an external source, transforming it into an LLM digestible format, and then feeding it to the LLM to generate text output. 

Let’s put an example to it. Searching documentation for answers to technical problems is something I’d bet anyone here has wasted countless hours on. For this example below, using documentation as the source, you can generate embeddings to store in PostgreSQL. When a user queries that documentation, the embeddings make it possible to represent the words in a query as vector numbers, perform a similarity search, and retrieve relevant pieces of the documentation from the database. The user’s query and retrieved documentation are both passed to the LLM, which accurately delivers relevant documentation and sources that answer the query. 

We tested out pgvector and embeddings using our own documentation at Instaclustr. Here are some example user search phrases to demonstrate how embeddings will plot them relative to one another: 

  • “Configure hard drive failure setting in Apache Cassandra” 
  • “Change storage settings in Redis” 
  • “Enterprise pricing for a 2-year commitment” 
  • “Raise a support ticket” 
  • “Connect to PostgreSQL using WebSockets” 

Embeddings plot the first two phases nearest each other, even though they include none of the same words.

Embeddings plot the first two phases nearest each othe

The LLM Context Window

Each LLM has a context window: the number of tokens it can process at once. This can be a challenge, in that models with a limited context window can falter with large inputs, but models trained with large context windows (100,000 tokens, or enough to use a full book in a prompt) suffer from latency and must store that full context in memory. The goal is to use the smallest possible context window that generates useful answers. Embeddings help by making it possible to provide the LLM with only data recognized as relevant so that even an LLM with a tight context window isn’t overwhelmed. 

Feeding the Embedding Model With LangChain

The model that generates embeddings — OpenAI’s text-embedding-ada-002 — has a context window of its own. That makes it essential to break documentation into chunks so this embedding model can digest more easily. 

The LangChain Python framework offers a solution. An LLM able to answer documentation queries needs these tasks completed first:

  • Document loading: LangChain makes it simple to scrape documentation pages, with the ability to load diverse document formats from a range of locations. 
  • Document transformation: Segmenting large documents into smaller digestible chunks enables retrieval of pertinent document sections. 
  • Embedding generation: Calculate embeddings for the chunked documentation using OpenAI’s embedding model.  
  • Data storing: Store embeddings and original content in PostgreSQL. 

This process yields the semantic index of documentation we’re after. 

An Example User Query Workflow

Now consider this sample workflow for a user query (sticking with our documentation as the example tested). First, a user submits the question: “How do I create a Redis cluster using Terraform?” OpenAI’s embeddings API calculates the question’s embeddings. The system then queries the semantic index in PostgreSQL using cosine similarity, asking for the original content closest to the embeddings of the user’s question. Finally, the system grabs the original content returned in the vector search, concatenates it together, and includes it in a specially crafted prompt with the user’s original question. 

Implementing pgvector and a User Interface 

Now let’s see how we put pgvector into action. First, we enabled the pgvector extension in our PostgreSQL database, and created a table for storing all documents and their embeddings: 

Python
 
CREATE EXTENSION vector; 



CREATE TABLE insta_documentation (id bigserial PRIMARY KEY, title, content, url, embedding vector(3));


The following Python code scrapes the documentation, uses Beautiful Soup to extract main text parts such as title and content, and stores them and the URL in the PostgreSQL table:

Python
 
urls = [...] 


def init_connection(): 

    return psycopg2.connect(**st.secrets["postgres"]) 

 
def extract_info(url): 

    hdr = {'User-Agent': 'Mozilla/5.0'} 

    req = Request(url,headers=hdr) 

    response = urlopen(req) 

    soup = BeautifulSoup(response, 'html.parser') 

    title = soup.find('title').text 

    middle_section = soup.find('div', class_='documentation-middle').contents 


    # middle section consists of header, content and instaclustr banner and back and forth links - we want only the first two 

    content = str(middle_section[0]) + str(middle_section[1]) 


    return title, content, url 


conn = init_connection() 

cursor = conn.cursor() 


for url in urls: 

    page_content = extract_info(url) 

    postgres_insert_query = """ INSERT INTO insta_documentation (title, content, url) VALUES (%s, %s, %s)""" 


    cursor.execute(postgres_insert_query, page_content) 

    conn.commit() 


if conn: 

    cursor.close() 

    conn.close()


Next, we loaded the documentation pages from the database, divided them into chunks, and created and stored the crucial embeddings.

Python
 
def init_connection(): 

    return psycopg2.connect(**st.secrets["postgres"]) 


conn = init_connection() 

cursor = conn.cursor() 


# Define and execute query to the insta_documentation table, limiting to 10 results for testing (creating embeddings through the OpenAI API can get costly when dealing with a huge amount of data) 

postgres_query = """ SELECT title, content, url FROM insta_documentation LIMIT 10""" 


cursor.execute(postgres_query) 

results = cursor.fetchall() 


conn.commit() 


# Load results into pandas DataFrame for easier manipulation 

df = pd.DataFrame(results, columns=['title', 'content', 'url']) 


# Break down content text which exceed max input token limit into smaller chunk documents 


# Define text splitter 

html_splitter = RecursiveCharacterTextSplitter.from_language(language=Language.HTML, chunk_size=1000, chunk_overlap=100) 


# We need to initialize our embeddings model 

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002") 


docs = [] 


for i in range(len(df.index)): 

    # Create document with metadata for each content chunk 

    docs = docs + html_splitter.create_documents([df['content'][i]], metadatas=[{"title": df['title'][i], "url": df['url'][i]}]) 


# Create pgvector dataset 

db = Pgvector.from_documents( 

embedding=embeddings, 

documents=docs, 

collection_name=COLLECTION_NAME, 

connection_string=CONNECTION_STRING, 

distance_strategy=DistanceStrategy.COSINE, 

)


Lastly, the retriever found the correct information to answer a given query. In our test example, we searched our documentation to learn how to sign up for an account:

Python
 
query = st.text_input('Your question', placeholder='How can I sign up for an Instaclustr console account?') 


retriever = store.as_retriever(search_kwargs={"k": 3}) 

qa = RetrievalQA.from_chain_type( 

llm=OpenAI(), 

chain_type="stuff", 

retriever=retriever, 

return_source_documents=True, 

verbose=True, 

) 


result = qa({"query": query}) 


source_documents = result["source_documents"] 

document_page_content = [document.page_content for document in source_documents] 

document_metadata = [document.metadata for document in source_documents]


Using Streamlit, a powerful tool for building interactive Python interfaces, we built this interface to test the system and view the successful query results:

Interface to test the system and view the successful query results

Data Retrieval With Transformative Efficiency

Harnessing PostgreSQL and the open-source pgvector project empowers users to leverage natural language queries to answer questions immediately, with no need to comb through irrelevant data. The result: super accurate, performant, and efficient LLMs, groundbreaking textual capabilities, and meaningful time saved!

Python (language) PostgreSQL Language model Integration

Opinions expressed by DZone contributors are their own.

Related

  • Building AI-Driven Intelligent Applications: A Hands-On Development Guide for Integrating GenAI Into Your Applications
  • Supercharging Pytest: Integration With External Tools
  • Integrated Gradients: AI Explainability for Regulated Industries
  • Building a Food Inventory Management App With Next.js, Material-UI, Firebase, Flask, and Hugging Face

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!