Building an Agentic RAG System from Scratch
Agentic RAG combines retrieval-augmented generation with AI agents to enhance LLMs, enabling autonomous decisions, efficient context retrieval, and flexible tool use.
Join the DZone community and get the full member experience.
Join For FreeIn this post, we’ll explore the concept of Agentic RAG, its architecture, and why this powerful combination is reshaping the future of AI systems. Plus, we’ll walk through implementing a basic version of an Agentic RAG system from scratch!
What Is RAG and Agentic RAG?
To start, let's clarify what RAG is. Retrieval-augmented generation (RAG) is a technique that enhances LLMs by connecting them to external data sources, enabling more accurate and reliable responses. With RAG, the system first retrieves relevant information from a database and then uses it to generate an answer.
Agentic RAG takes this concept a step further by integrating AI agents into the process. An AI agent typically consists of an LLM as its "brain," memory, and a set of tools. These agents can independently perform specific tasks, make decisions, and take actions in an automated manner.
In other words, an Agentic RAG system involves an intelligent agent that decides when to retrieve data, when to use external tools (e.g., search), and when to rely on the LLM for generating responses.
Basic Architecture of Agentic RAG
Here’s the basic flow of how Agentic RAG works:
- User query. The user asks a question.
- Retrieval decision. The system checks if it can answer the query directly via a retrieval from a vector database or the LLM itself.
- Context validation. If retrieval is successful, the agent checks whether the context is enough to answer the question.
- Tool activation. If the context isn't sufficient, an external tool (such as an online search) is called, and the agent processes the results.
- Response generation. The LLM generates a response based on the retrieved context or search results.
This architecture ensures that if the agent cannot answer the question from its internal resources, it can autonomously search the web or call other tools for additional information.
Why AI Agents Are Essential
You might wonder if AI agents are really necessary. Modern LLMs already perform a lot of reasoning on their own.
However, while LLMs can generate answers, they often require external tools to perform tasks like searching the web, doing calculations, or summarizing documents. AI agents help orchestrate this by managing when and how these tools are used, making the whole process more structured and autonomous.
Building Agentic RAG from Scratch
While frameworks like LangChain and LlamaIndex are great for quick prototyping, it’s valuable to understand how you can build a custom Agentic RAG system without relying on these dependencies. By minimizing the use of third-party libraries, you get more control over the behavior of your system.
For instance:
- Retrieving content. We can use tools like BeautifulSoup or the Gina Reader API to retrieve clean content from web pages.
- Chunking and embedding. After retrieving content, we split the text into smaller chunks and embed them into a vector database (such as Qdrant) using embeddings from models like OpenAI’s API.
- Tool selection and action. The agent evaluates if the retrieved context can answer the question, or if it needs to perform an online search using an external tool.
Implementation Walkthrough
Here’s a brief example of how an Agentic RAG system works in practice:
import openai
import requests
from bs4 import BeautifulSoup
from sentence_transformers import SentenceTransformer
import qdrant_client
from qdrant_client.models import PointStruct
# Initialize embedding model and vector database client
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
client = qdrant_client.QdrantClient("http://localhost:6333")
# Retrieve content from a webpage
def get_webpage_content(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
return soup.get_text()
# Split text into chunks
def split_text_into_chunks(text, chunk_size=150):
words = text.split()
chunks = [words[i:i+chunk_size] for i in range(0, len(words), chunk_size)]
return [" ".join(chunk) for chunk in chunks]
# Convert chunks to embeddings
def generate_embeddings(chunks):
return embedding_model.encode(chunks)
# Store embeddings in the vector database
def store_embeddings_in_db(embeddings, chunks):
points = [
PointStruct(id=i, vector=embedding.tolist(), payload={"text": chunk})
for i, (embedding, chunk) in enumerate(zip(embeddings, chunks))
]
client.upsert(collection_name="documents", points=points)
# Perform a search in the vector database
def search_vector_database(query, top_k=3):
query_embedding = embedding_model.encode([query])[0]
result = client.search(
collection_name="documents",
query_vector=query_embedding.tolist(),
top=top_k,
)
return result
# Process search results and generate a response
def generate_response_from_context(query, context):
prompt = f"Answer the following question using the context: {query}\n\nContext: {context}"
response = openai.Completion.create(
engine="text-davinci-003", prompt=prompt, max_tokens=150
)
return response.choices[0].text.strip()
# Main function to perform the full Agentic RAG process
def agentic_rag_system(query, url):
webpage_content = get_webpage_content(url)
chunks = split_text_into_chunks(webpage_content)
embeddings = generate_embeddings(chunks)
store_embeddings_in_db(embeddings, chunks)
search_results = search_vector_database(query)
if search_results:
context = "\n".join([result.payload["text"] for result in search_results])
return generate_response_from_context(query, context)
else:
# If no relevant context is found, perform online search
online_search_result = perform_online_search(query)
return generate_response_from_context(query, online_search_result)
# Function to perform an online search (could use a search API)
def perform_online_search(query):
search_url = f"https://www.googleapis.com/customsearch/v1?q={query}&key=YOUR_API_KEY"
search_results = requests.get(search_url).json()
return " ".join([item["snippet"] for item in search_results["items"]])
# Example usage
query = "What is Llama 3?"
url = "https://example.com/llama3-article"
response = agentic_rag_system(query, url)
print(response)
While LangChain and other frameworks simplify the process, they come with trade-offs in terms of flexibility and customization. In production environments, minimizing dependencies can be advantageous, allowing for easier maintenance and greater control over your system. By building an Agentic RAG system from scratch, you can design your agents to meet your specific needs, without being tied to external frameworks.
Conclusion
Agentic RAG represents an exciting and powerful approach to making AI systems more efficient and autonomous. Combining retrieval-augmented generation with AI agents will let you create systems that generate accurate responses and adapt and reason through external tools and dynamic decision-making.
Opinions expressed by DZone contributors are their own.
Comments