DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Build Multimodal RAG Apps With Amazon Bedrock and OpenSearch
  • Smart Cities With Multi-Modal Retrieval-Augmented Generation
  • AI-Powered Professor Rating Assistant With RAG and Pinecone
  • Enterprise RAG in Amazon Bedrock: Introduction to KnowledgeBases

Trending

  • Why Documentation Matters More Than You Think
  • Segmentation Violation and How Rust Helps Overcome It
  • Emerging Data Architectures: The Future of Data Management
  • Beyond Linguistics: Real-Time Domain Event Mapping with WebSocket and Spring Boot
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Retrieval-Augmented Generation (RAG) With Milvus and LlamaIndex

Retrieval-Augmented Generation (RAG) With Milvus and LlamaIndex

Learn to build an RAG application with Milvus and LlamaIndex, which can quickly handle big data and retrieve relevant information, especially when adopted together.

By 
Ifeanyi Benny Iheagwara user avatar
Ifeanyi Benny Iheagwara
·
Dec. 09, 24 · Tutorial
Likes (0)
Comment
Save
Tweet
Share
1.9K Views

Join the DZone community and get the full member experience.

Join For Free

Retrieval-augmented generation (RAG) applications integrate private data with public data and improve large language models' (LLMs) output, but building one is challenging as private data can be unstructured and siloed. You'll also need a reliable and efficient way to retrieve relevant information from the knowledge base. This might seem like an uphill battle, but it's doable with tools like Milvus and LlamaIndex, which can quickly handle big data and retrieve relevant information, especially when adopted together. 

What Are Milvus and LlamaIndex?

To build an RAG application that optimizes query efficiency, you need a scalable, flexible vector database and an indexing algorithm. Before showing you how to build one, we'll quickly discuss Milvus and LlamaIndex. 

What Is Milvus?

Milvus is an open-source vector database for storing, processing, running, indexing, and retrieving vector embedding across various environments. This platform is popular among generative AI developers because of its similarity search in massive datasets of high-dimensional vectors and high scalability. Besides its scalability and high performance, developers can use machine learning (ML), build recommendation systems, and mitigate hallucinations in LLMs. 

Milvus offers three deployment options: 

  1. Milvus Lite is a Python library and ultra-lightweight version of Milvus that works great for small-scale local experiments.
  2. Milvus Standalone is a single-node deployment that uses a client-server model, the MySQL equivalent of Milvus. 
  3. Milvus Distributed is Milvus's distributed mode, which adopts a cloud-native architecture and is great for building large-scale vector database systems.

What Is LlamaIndex?

LlamaIndex is an orchestration framework that simplifies building LLM applications by integrating private, domain-specific, and public data. It achieves this by augmenting external data and storing it as vectors in a vector database to be used for knowledge generation, complex operation search, and reasoning. Besides storing and data ingestion, LlamaIndex comes in handy when indexing and querying data. 

The enterprise version comprises LlamaCloud and LlamaParse. There's also an open-source package with LlamaHub (their data connectors), Python, and TypeScript packages. 

What Is RAG?

Retrieval-augmented generation (RAG) is an AI technique that combines the strength of generative LLMs with traditional information retrieval systems to enhance accuracy and reliability. This is important because it exposes your LLMs to external, real-time vector-based information outside their knowledge bases, addressing non-contextual, inaccuracy, and hallucination issues. 

Building a RAG System Using LlamaIndex and Milvus

We'll show you how to build a retrieval-augmented generation system using LlamaIndex and Milvus. First, you'll make use of data from the Litbank repository. Then, we'll index the data using the llama_index library and Milvus Lite. Next, we'll process the documents into vector representation using the OpenAI API and finally, query data and filter it through the metadata. 

Prerequisites and Dependencies

To follow along with this tutorial, you will need the following: 

  • Python 3.9 or higher.
  • Any IDE or code editor. We recommend Google Collab, but you can also use Jupyter Notebook. 
  • An OpenAI developer account so you can access your OpenAI API key.

Setup and Installation

Before building the RAGs, you'll need to install all your dependencies.  

PowerShell
 
%pip install pymilvus>=2.4.2
%pip install llama-index-vector-stores-milvus
%pip install llama-index

These code snippets will install and upgrade the following: 

  • pymilvus — is the Milvus Python SDK.
  • llama-index-vector-stores-milvus — provides integration between the LlamaIndex and Milvus vector store
  • llama-index — the data framework for indexing and querying LLMs

Next, you need to set up your OpenAI API to access their multiple advanced language models that have been trained for various natural language processing (NLP) and image-generative AI tasks. However, before you can use the OpenAI API, you must create an OpenAI developer account.  

  1. Visit the API keys section of your OpenAI developer dashboard.
  2. Click on “Create a new secret key” to generate an API key.
  3. Copy the key.

Then, head over to your Google Collab notebook. 

Python
 
import openai
openai.api_key = "OpenAI-API-Key"


Generating Data

For your dataset, you can use LitBank, a repository of annotated datasets of a hundred works of English-language fiction. For this project, we'll use "The Fall of the House of Usher" by Edgar Allan Poe and "Oliver Twist" by Charles Dickens. To achieve this, create a directory to retrieve and save your data. 

Python
 
! mkdir -p 'data/'
! wget 'https://raw.githubusercontent.com/dbamman/litbank/refs/heads/master/original/730_oliver_twist.txt' -O 'data/730_oliver_twist.txt'
! wget 'https://raw.githubusercontent.com/dbamman/litbank/refs/heads/master/original/932_the_fall_of_the_house_of_usher.txt' -O 'data/932_the_fall_of_the_house_of_usher.txt'


Then, generate a document from the novel using a SimpleDirectoryReader class from llama_index library library. 

Python
 
from llama_index.core import SimpleDirectoryReader

# load documents
documents = SimpleDirectoryReader(
    input_files=["data/730_oliver_twist.txt"]
).load_data()

print("Document ID:", documents[0].doc_id)


Indexing Data

Next, index over your document to reduce search latency and enable semantic similarity search for quick retrieval of relevant documents based on meaning and context. 

You can do this using the llama_index library. All you need to do is specify the file path and storage configuration and set your vector embedding dimensionality. You'll also set the URI of Milvus Lite as your local file. Alternatively, you can run Milvus via Docker, Kubernetes, or Zilliz Cloud, Milvus’s fully managed cloud solution. These alternatives are best for large projects. 

Python
 
# Create an index over the documents
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.milvus import MilvusVectorStore


vector_store = MilvusVectorStore(uri="./milvus_demo.db", dim=1536, overwrite=True)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)


Querying Data

You'll need to leverage the indexed documents as a knowledge base for asking questions. This will allow your RAG to have conversational AI capabilities, quickly retrieve relevant answers, and have a contextual understanding of conversations. 

Python
 
query_engine = index.as_query_engine()
res = query_engine.query("how did Oliver twist grow up?")
print(res)


Try asking more questions about the novel. 

Python
 
res = query_engine.query("What motivates Oliver to ask for more food in the workhouse?")

print(res)


You can try more tests like overwriting any previously asked information. 

Python
 
from llama_index.core import Document

vector_store = MilvusVectorStore(uri="./milvus_demo.db", dim=1536, overwrite=True)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    [Document(text="The number that is being searched for is ten.")],
    storage_context,
)
query_engine = index.as_query_engine()
res = query_engine.query("how did Oliver twist grow up?")
print(res)


Let’s try one more test to add additional data to an already existing index. 

Python
 
del index, vector_store, storage_context, query_engine

vector_store = MilvusVectorStore(uri="./milvus_demo.db", overwrite=False)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
query_engine = index.as_query_engine()
res = query_engine.query("What is the number?")
print(res)
res = query_engine.query("how did Oliver twist grow up?")
print(res)

  

Filtering Metadata

Metadata filtering allows you to narrow search results that match specific criteria based on metadata. This way, you can search for documents based on various metadata fields such as author, date, and tag. This is particularly useful when you have a large dataset and need to find documents that meet certain attributes. You can load both documents using the code snippet below. 

Python
 
from llama_index.core.vector_stores import ExactMatchFilter, MetadataFilters

# Load all the two documents loaded before
documents_all = SimpleDirectoryReader("./data/").load_data()

vector_store = MilvusVectorStore(uri="./milvus_demo.db", dim=1536, overwrite=True)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents_all, storage_context)


If you only want to retrieve documents from The Fall of the House of Usher, use the following script: 

Python
 
filters = MetadataFilters(
    filters=[ExactMatchFilter(key="file_name", value="932_the_fall_of_the_house_of_usher.txt")]
)
query_engine = index.as_query_engine(filters=filters)
res = query_engine.query("What distinctive physical feature does Roderick Usher exhibit in The Fall of the House of Usher?")

print(res)


If you only want to use Oliver Twist, you can use this script:  

Python
 
filters = MetadataFilters(
    filters=[ExactMatchFilter(key="file_name", value="730_oliver_twist.txt")]
)
query_engine = index.as_query_engine(filters=filters)
res = query_engine.query("What challenges did Oliver face?")

print(res)

   

You can explore the full project code on GitHub along with the interactive Google Collab notebook. 

Conclusion

In this post, you learned how to build a RAG application with LlamaIndex and Milvus. Milvus offers capabilities such as image search, and since Milvus Lite is an open-source project, you can make your own contributions as well.

vector database generative AI RAG

Opinions expressed by DZone contributors are their own.

Related

  • Build Multimodal RAG Apps With Amazon Bedrock and OpenSearch
  • Smart Cities With Multi-Modal Retrieval-Augmented Generation
  • AI-Powered Professor Rating Assistant With RAG and Pinecone
  • Enterprise RAG in Amazon Bedrock: Introduction to KnowledgeBases

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!