DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • LLMops: The Future of AI Model Management
  • Pivoting Database Systems Practices to AI: Create Efficient Development and Maintenance Practices With Generative AI
  • How To Select the Right Vector Database for Your Enterprise GENERATIVE-AI Stack
  • Machine Learning Algorithms and GAN

Trending

  • Agentic Testing: Moving Quality From Checkpoint to Control Layer
  • The ORM Is Over: AI-Written SQL Is the New Data Access Layer
  • Ujorm3: A New Lightweight ORM for JavaBeans and Records
  • Smart Deployment Strategies for Modern Applications
  1. DZone
  2. Data Engineering
  3. Databases
  4. How I Converted Regular RDBMS Into Vector Database To Store Embeddings

How I Converted Regular RDBMS Into Vector Database To Store Embeddings

In this article, I'll walk you through how to convert a regular RDBMS into a fully featured Vector database to store Embeddings for developing GenerativeAI applications.

By 
Suvoraj Biswas user avatar
Suvoraj Biswas
·
Jul. 05, 23 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
15.4K Views

Join the DZone community and get the full member experience.

Join For Free

In today's Generative AI world, Vector database has become one of the integral parts while designing LLM-based applications. Whether you are planning to build an application using OpenAI or Google's Generative AI or you are thinking to solve use cases like designing a recommendation engine or building a computer vision (CV) or Vector database, would be an important component to consider.

What Is Vector Database and Why Are They Different Than the Traditional Database?

In the machine learning world, Vector or Embeddings represent the numerical or mathematical representation of data, which can be text, images, or media contents (Audio or Video). LLM from OpenAI or others can transform the regular data into Vector Embeddings with high-level multi-dimensions and store them in the vector space. These numerical forms help determine the semantic meaning among data or identify patterns or clustering, or draw relationships. Regular columnar-based RDBMS or NoSQL databases are not equipped to store Vector Embeddings data with multi-dimensions and efficiently scaling if needed. This is where we need a Vector database, which is a special kind of database that is designed to handle and store this kind of Embeddings data and, at the same time, offers high performance and scalability.

During data retrieval or index search, the traditional database returns results that exactly match the query, whereas the Vector database uses algorithm such as Kth-Nearest Neighbor (K-NN) or Approximate Nearest Neighbor (A-NN) to find similar vectors in the same dimensions or having shortest distance by applying cosine algorithm and return results having similar results. This helps solve use cases such as finding similar images among sets of photos taken, building a recommendation engine based on certain usage, or identifying patterns among a pool of huge datasets. 

As you can see Vector database comes with the capability to efficiently store and search vector data which is essential to design and build AI applications using the Large Language Models (LLM). We have many Vector database(s) in the form of On-Premise usages like Redis Enterprise or Milvus or SAAS offerings like Pinecone. In this article, we will explore the most popular RDBMS, Postgres, and how we can convert it into a full fledge Vector database capable of performing with other Enterprise grade popular Vector database(s).

How To Convert the Postgres Into a Vector Database?

Postgres is one of the popular RDBMS, which is open source but has a performance similar to many Enterprise grade RDBMS. It has been in the market for a long time, dominating with its performance, ease to use, and robustness. 

The open-source community has built an extension called pgvector which, when installed and activated, can turn regular Postgres installation to have the compatibility to support Generative AI application development by storing and indexing Embeddings generated by any LLMs having any dimensions. The best part is not only the Embeddings data, but the regular data also can be stored and indexed in the same database. Pgvector uses the exact and approximate nearest neighbors algorithms while querying the data, so sometimes it outperforms other databases.

Below is the Docker compose file I used to spin off a Postgress docker version already packaged with pgvector extension built into it. In the Docker compose, I also added Pgadmin as the DB client so that you can access your database.

YAML
 
version: '3.8'
services:
    db:
        container_name: pgVectorDB
        image: ankane/pgvector:latest
        restart: always
        environment:
            POSTGRES_DB: dzone
            POSTGRES_USER: pguser
            POSTGRES_PASSWORD: secret_password
        ports:
            - "5432:5432"
        volumes:
            - postgres-data:/var/lib/postgresql/data
    
    pgadmin:
        container_name: pgadmin4_container
        image: dpage/pgadmin4
        restart: always
        environment:
            PGADMIN_DEFAULT_EMAIL: [email protected]
            PGADMIN_DEFAULT_PASSWORD: root
        ports:
            - "5050:80"
volumes:
    postgres-data:


Fig 1: Docker compose command to run the Postgres with pgvector.

Fig 1: Docker compose command to run the Postgres with pgvector.

Fig 2: PGAdmin client (accessible through port 5050)

Fig 2: PGAdmin client (accessible through port 5050)

Once the Postgres running server is added, use the following SQL command to enable the vector extension:

SQL
 
CREATE EXTENSION IF NOT EXISTS vector;


Conclusion

In this article, we explored how we can use the power of the open-source community to launch a scalable yet robust, high-performing vector database based on a traditional RDBMS system. If you are a data science engineer or software engineer or just designing or exploring solutions for your next AI-based project, then Postgres with pgvector would definitely help to solve some use cases such as similarity search, recommendation engine, and anomaly detection. We also demonstrated how Postgres with pgvector extension can seamlessly be installed and configured using simple tools like Docker compose and integrate with the existing microservice framework.

Data structure Database AI Machine learning

Opinions expressed by DZone contributors are their own.

Related

  • LLMops: The Future of AI Model Management
  • Pivoting Database Systems Practices to AI: Create Efficient Development and Maintenance Practices With Generative AI
  • How To Select the Right Vector Database for Your Enterprise GENERATIVE-AI Stack
  • Machine Learning Algorithms and GAN

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook