DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • AI-Driven RAG Systems: Practical Implementation With LangChain
  • A Developer's Guide to Mastering Agentic AI: From Theory to Practice
  • Reinforcement Learning for AI Agent Development: Implementing Multi-Agent Systems
  • Why Clean Data Is the Foundation of Successful AI Systems

Trending

  • Mastering Fluent Bit: Installing and Configuring Fluent Bit on Kubernetes (Part 3)
  • While Performing Dependency Selection, I Avoid the Loss Of Sleep From Node.js Libraries' Dangers
  • Rethinking Recruitment: A Journey Through Hiring Practices
  • Fixing Common Oracle Database Problems
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Building an Agentic RAG System from Scratch

Building an Agentic RAG System from Scratch

Agentic RAG combines retrieval-augmented generation with AI agents to enhance LLMs, enabling autonomous decisions, efficient context retrieval, and flexible tool use.

By 
Swati Tyagi user avatar
Swati Tyagi
·
Anuj Tyagi user avatar
Anuj Tyagi
·
Mar. 19, 25 · Tutorial
Likes (0)
Comment
Save
Tweet
Share
2.6K Views

Join the DZone community and get the full member experience.

Join For Free

In this post, we’ll explore the concept of Agentic RAG, its architecture, and why this powerful combination is reshaping the future of AI systems. Plus, we’ll walk through implementing a basic version of an Agentic RAG system from scratch!

What Is RAG and Agentic RAG?

To start, let's clarify what RAG is. Retrieval-augmented generation (RAG) is a technique that enhances LLMs by connecting them to external data sources, enabling more accurate and reliable responses. With RAG, the system first retrieves relevant information from a database and then uses it to generate an answer.

Agentic RAG takes this concept a step further by integrating AI agents into the process. An AI agent typically consists of an LLM as its "brain," memory, and a set of tools. These agents can independently perform specific tasks, make decisions, and take actions in an automated manner. 

In other words, an Agentic RAG system involves an intelligent agent that decides when to retrieve data, when to use external tools (e.g., search), and when to rely on the LLM for generating responses.

Basic Architecture of Agentic RAG

Here’s the basic flow of how Agentic RAG works:

  1. User query. The user asks a question.
  2. Retrieval decision. The system checks if it can answer the query directly via a retrieval from a vector database or the LLM itself.
  3. Context validation. If retrieval is successful, the agent checks whether the context is enough to answer the question.
  4. Tool activation. If the context isn't sufficient, an external tool (such as an online search) is called, and the agent processes the results.
  5. Response generation. The LLM generates a response based on the retrieved context or search results.

This architecture ensures that if the agent cannot answer the question from its internal resources, it can autonomously search the web or call other tools for additional information.

Why AI Agents Are Essential

You might wonder if AI agents are really necessary. Modern LLMs already perform a lot of reasoning on their own. 

However, while LLMs can generate answers, they often require external tools to perform tasks like searching the web, doing calculations, or summarizing documents. AI agents help orchestrate this by managing when and how these tools are used, making the whole process more structured and autonomous.

Building Agentic RAG from Scratch

While frameworks like LangChain and LlamaIndex are great for quick prototyping, it’s valuable to understand how you can build a custom Agentic RAG system without relying on these dependencies. By minimizing the use of third-party libraries, you get more control over the behavior of your system.

For instance:

  • Retrieving content. We can use tools like BeautifulSoup or the Gina Reader API to retrieve clean content from web pages.
  • Chunking and embedding. After retrieving content, we split the text into smaller chunks and embed them into a vector database (such as Qdrant) using embeddings from models like OpenAI’s API.
  • Tool selection and action. The agent evaluates if the retrieved context can answer the question, or if it needs to perform an online search using an external tool.

Implementation Walkthrough

Here’s a brief example of how an Agentic RAG system works in practice:

Python
 
import openai
import requests
from bs4 import BeautifulSoup
from sentence_transformers import SentenceTransformer
import qdrant_client
from qdrant_client.models import PointStruct

# Initialize embedding model and vector database client
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
client = qdrant_client.QdrantClient("http://localhost:6333")

# Retrieve content from a webpage
def get_webpage_content(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    return soup.get_text()

# Split text into chunks
def split_text_into_chunks(text, chunk_size=150):
    words = text.split()
    chunks = [words[i:i+chunk_size] for i in range(0, len(words), chunk_size)]
    return [" ".join(chunk) for chunk in chunks]

# Convert chunks to embeddings
def generate_embeddings(chunks):
    return embedding_model.encode(chunks)

# Store embeddings in the vector database
def store_embeddings_in_db(embeddings, chunks):
    points = [
        PointStruct(id=i, vector=embedding.tolist(), payload={"text": chunk})
        for i, (embedding, chunk) in enumerate(zip(embeddings, chunks))
    ]
    client.upsert(collection_name="documents", points=points)

# Perform a search in the vector database
def search_vector_database(query, top_k=3):
    query_embedding = embedding_model.encode([query])[0]
    result = client.search(
        collection_name="documents",
        query_vector=query_embedding.tolist(),
        top=top_k,
    )
    return result

# Process search results and generate a response
def generate_response_from_context(query, context):
    prompt = f"Answer the following question using the context: {query}\n\nContext: {context}"
    response = openai.Completion.create(
        engine="text-davinci-003", prompt=prompt, max_tokens=150
    )
    return response.choices[0].text.strip()

# Main function to perform the full Agentic RAG process
def agentic_rag_system(query, url):
    webpage_content = get_webpage_content(url)
    chunks = split_text_into_chunks(webpage_content)
    embeddings = generate_embeddings(chunks)
    store_embeddings_in_db(embeddings, chunks)

    search_results = search_vector_database(query)
    if search_results:
        context = "\n".join([result.payload["text"] for result in search_results])
        return generate_response_from_context(query, context)
    else:
        # If no relevant context is found, perform online search
        online_search_result = perform_online_search(query)
        return generate_response_from_context(query, online_search_result)

# Function to perform an online search (could use a search API)
def perform_online_search(query):
    search_url = f"https://www.googleapis.com/customsearch/v1?q={query}&key=YOUR_API_KEY"
    search_results = requests.get(search_url).json()
    return " ".join([item["snippet"] for item in search_results["items"]])

# Example usage
query = "What is Llama 3?"
url = "https://example.com/llama3-article"
response = agentic_rag_system(query, url)
print(response)


While LangChain and other frameworks simplify the process, they come with trade-offs in terms of flexibility and customization. In production environments, minimizing dependencies can be advantageous, allowing for easier maintenance and greater control over your system. By building an Agentic RAG system from scratch, you can design your agents to meet your specific needs, without being tied to external frameworks.

Conclusion

Agentic RAG represents an exciting and powerful approach to making AI systems more efficient and autonomous. Combining retrieval-augmented generation with AI agents will let you create systems that generate accurate responses and adapt and reason through external tools and dynamic decision-making.

AI systems RAG

Opinions expressed by DZone contributors are their own.

Related

  • AI-Driven RAG Systems: Practical Implementation With LangChain
  • A Developer's Guide to Mastering Agentic AI: From Theory to Practice
  • Reinforcement Learning for AI Agent Development: Implementing Multi-Agent Systems
  • Why Clean Data Is the Foundation of Successful AI Systems

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!