DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Supercharging LLMs With Knowledge Graphs for Smarter, Fairer AI
  • Getting Started With LangChain for Beginners
  • Have LLMs Solved the Search Problem?
  • Unlocking Local AI: Build RAG Apps Without Cloud or API Keys

Trending

  • The Cypress Edge: Next-Level Testing Strategies for React Developers
  • Spring and PersistenceContextType.EXTENDED
  • Contextual AI Integration for Agile Product Teams
  • Navigating the LLM Landscape: A Comparative Analysis of Leading Large Language Models
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Knowledge Graphs and RAG: A Guide to AI Knowledge Retrieval

Knowledge Graphs and RAG: A Guide to AI Knowledge Retrieval

Enter knowledge graphs, the secret weapon for superior RAG applications. This guide has everything you need to begin leveraging RAG for intelligent AI knowledge retrieval.

By 
Pavan Vemuri user avatar
Pavan Vemuri
·
Prince Bose user avatar
Prince Bose
·
Tharakarama Reddy Yernapalli Sreenivasulu user avatar
Tharakarama Reddy Yernapalli Sreenivasulu
·
Updated Nov. 21, 24 · Tutorial
Likes (9)
Comment
Save
Tweet
Share
8.3K Views

Join the DZone community and get the full member experience.

Join For Free

Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm, blending the strengths of information retrieval and natural language generation. By leveraging large datasets to retrieve relevant information and generate coherent and contextually appropriate responses, RAG systems have the potential to revolutionize applications ranging from customer support to content creation. 


Fundamentals of AI Agents Using RAG and LangChain | Enroll in Free Course*

*Affiliate link. See Terms of Use.


How Does RAG Work?

Let us look at how RAG works. In a traditional setup, you will have a user prompt which is sent to the Large Language Model (LLM), and the LLM provides a completion: 


A diagram of the traditional setup of a RAG.


But the problem with this setup is that the LLM’s knowledge has a cutoff date, and it does not have insights into business-specific data.

Importance of RAG for Accurate Information Retrieval

RAG helps alleviate all the drawbacks that are listed above by allowing the LLM to access the knowledge base. Since the LLM now has context, the completions are more accurate and can now include business-specific data. The below diagram illustrates the value add RAG provides to content retrieval:

A diagram of RAG providing value to a content retrieval.


As you can see, by vectorizing business-specific data, which the LLM would not have access to, instead of just sending the prompt to the LLM for retrieval, you send the prompt and context and enable the LLM to provide more effective completions.

Challenges With RAG

However, as powerful as RAG systems are, they face challenges, particularly in maintaining contextual accuracy and efficiently managing vast amounts of data.

Other Challenges include:

  1. RAG systems will often find it very difficult to articulate complex relationships between information if it is distributed across a lot of documents.
  2. RAG solutions are very limited in their reasoning capabilities on the retrieved data.
  3. RAG solutions often tend to hallucinate when they are not able to retrieve desired information.

Knowledge Graphs to the Rescue

Knowledge graphs are sophisticated data structures that represent information in a graph format, where entities are nodes and relationships are edges. This structure plays a crucial role in overcoming the challenges faced by RAG systems, as it allows for a highly interconnected and semantically rich representation of data, enabling more effective organization and retrieval of information.

Benefits of Using Knowledge Graphs for RAG

Below are some key advantages for leveraging knowledge graphs:

  1. Knowledge graphs help RAG grasp complex information by providing rich context with the interconnected representation of information.
  2. With the help of knowledge graphs, RAG solutions can improve their reasoning capabilities when they traverse relationships in a better way.

By linking information retrieved to specific aspects of the graph, knowledge graphs help increase factual accuracy.

Impact of Knowledge Graphs on RAG

Knowledge graphs fundamentally enhance RAG systems by providing a robust framework for understanding and navigating complex data relationships. They enable the AI not just to retrieve information based on keywords, but to also understand the context and interconnections between different pieces of information. This leads to more accurate, relevant, and contextually aware responses, significantly improving the performance of RAG applications.

Now let us look at the importance of knowledge graphs in enhancing RAG application through a coding example. To showcase the importance, we will take the example of retrieving a player recommendation for an NFL Fantasy Football draft. We will ask the same question to the RAG application with and without knowledge graphs implemented, and we will see the improvement in the output.

RAG Without Knowledge Graphs

Let us look at the following code where we implement a RAG solution in its basic level for retrieving a football player of our choosing, which will be provided via a prompt. You can clearly see the output does not retrieve the accurate player based on our prompt.

Python
 
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Sample player descriptions
players = [
   "Patrick Mahomes is a quarterback for the Kansas City Chiefs, known for his strong arm and playmaking ability.",
   "Derrick Henry is a running back for the Tennessee Titans, famous for his power running and consistency.",
   "Davante Adams is a wide receiver for the Las Vegas Raiders, recognized for his excellent route running and catching ability.",
   "Tom Brady is a veteran quarterback known for his leadership and game management.",
   "Alvin Kamara is a running back for the New Orleans Saints, known for his agility and pass-catching ability."
]

# Vectorize player descriptions
vectorizer = TfidfVectorizer()
player_vectors = vectorizer.fit_transform(players)

# Function to retrieve the most relevant player
def retrieve_player(query, player_vectors, players):
   query_vector = vectorizer.transform([query])
   similarities = cosine_similarity(query_vector, player_vectors).flatten()
   most_similar_player_index = np.argmax(similarities)
   return players[most_similar_player_index]

# Function to generate a recommendation
def generate_recommendation(query, retrieved_player):
   response = f"Query: {query}\n\nRecommended Player: {retrieved_player}\n\nRecommendation: Based on the query, the recommended player is a good fit for your team."
   return response

# Example query
query = "I need a versatile player."
retrieved_player = retrieve_player(query, player_vectors, players)
response = generate_recommendation(query, retrieved_player)

print(response)


We have oversimplified the RAG case for ease of understanding. Below is what the above code does:

  • Imports necessary libraries: TfidfVectorizer from sklearn, cosine_similarity from sklearn, and numpy
  • Defines sample player descriptions with details about their positions and notable skills
  • Player descriptions are vectorized using TF-IDF to convert the text into numerical vectors for precise similarity comparison.
  • Defines a function retrieve_player to find the most relevant player based on a query by calculating cosine similarity between the query vector and player vectors
  • Defines a function generate_recommendation to create a recommendation message incorporating the query and the retrieved player's description

Provides an example query, "I need a versatile player.", which retrieves the most relevant player, generates a recommendation, and prints the recommendation message.

Now let's look at the output:

PowerShell
 
python ragwithoutknowledgegraph.py
Query: I need a versatile player.

Recommended Player: Patrick Mahomes is a quarterback for the Kansas City Chiefs, known for his strong arm and playmaking ability.

Recommendation: Based on the query, the recommended player is a good fit for your team.


As you can see, when we were asked for a versatile player, the recommendation was Patrick Mahomes.

RAG With Knowledge Graphs

Now let us look at how knowledge graphs can help enhance RAG and give a better recommendation. As you see from the output below, the correct player is recommended based on the prompt.

Python
 
import rdflib
from rdflib import Graph, Literal, RDF, URIRef, Namespace

# Initialize the graph
g = Graph()
ex = Namespace("http://example.org/")

# Define players as subjects
patrick_mahomes = URIRef(ex.PatrickMahomes)
derrick_henry = URIRef(ex.DerrickHenry)
davante_adams = URIRef(ex.DavanteAdams)
tom_brady = URIRef(ex.TomBrady)
alvin_kamara = URIRef(ex.AlvinKamara)

# Add player attributes to the graph
g.add((patrick_mahomes, RDF.type, ex.Player))
g.add((patrick_mahomes, ex.team, Literal("Kansas City Chiefs")))
g.add((patrick_mahomes, ex.position, Literal("Quarterback")))
g.add((patrick_mahomes, ex.skills, Literal("strong arm, playmaking")))

g.add((derrick_henry, RDF.type, ex.Player))
g.add((derrick_henry, ex.team, Literal("Tennessee Titans")))
g.add((derrick_henry, ex.position, Literal("Running Back")))
g.add((derrick_henry, ex.skills, Literal("power running, consistency")))

g.add((davante_adams, RDF.type, ex.Player))
g.add((davante_adams, ex.team, Literal("Las Vegas Raiders")))
g.add((davante_adams, ex.position, Literal("Wide Receiver")))
g.add((davante_adams, ex.skills, Literal("route running, catching ability")))

g.add((tom_brady, RDF.type, ex.Player))
g.add((tom_brady, ex.team, Literal("Retired")))
g.add((tom_brady, ex.position, Literal("Quarterback")))
g.add((tom_brady, ex.skills, Literal("leadership, game management")))

g.add((alvin_kamara, RDF.type, ex.Player))
g.add((alvin_kamara, ex.team, Literal("New Orleans Saints")))
g.add((alvin_kamara, ex.position, Literal("Running Back")))
g.add((alvin_kamara, ex.skills, Literal("versatility, agility, pass-catching")))

# Function to retrieve the most relevant player using the knowledge graph
def retrieve_player_kg(query, graph):
   # Define synonyms for key skills
   synonyms = {
       "versatile": ["versatile", "versatility"],
       "agility": ["agility"],
       "pass-catching": ["pass-catching"],
       "strong arm": ["strong arm"],
       "playmaking": ["playmaking"],
       "leadership": ["leadership"],
       "game management": ["game management"]
   }
  
   # Extract key terms from the query and match with synonyms
   key_terms = []
   for term, syns in synonyms.items():
       if any(syn in query.lower() for syn in syns):
           key_terms.extend(syns)
  
   filters = " || ".join([f"contains(lcase(str(?skills)), '{term}')" for term in key_terms])
  
   query_string = f"""
   PREFIX ex: <http://example.org/>
   SELECT ?player ?team ?skills WHERE {{
       ?player ex:skills ?skills .
       ?player ex:team ?team .
       FILTER ({filters})
   }}
   """
  
 
  
   qres = graph.query(query_string)

  
   best_match = None
   best_score = -1
  
   for row in qres:
       skill_set = row.skills.lower().split(', ')
       score = sum(term in skill_set for term in key_terms)
       if score > best_score:
           best_score = score
           best_match = row
  
   if best_match:
       return f"Player: {best_match.player.split('/')[-1]}, Team: {best_match.team}, Skills: {best_match.skills}"
  
   return "No relevant player found."

# Function to generate a recommendation
def generate_recommendation_kg(query, retrieved_player):
   response = f"Query: {query}\n\nRecommended Player: {retrieved_player}\n\nRecommendation: Based on the query, the recommended player is a good fit for your team."
   return response

# Example query
query = "I need a versatile player."
retrieved_player = retrieve_player_kg(query, g)
response = generate_recommendation_kg(query, retrieved_player)

print(response)


Let us look at what the above code does. The code:

  • Imports necessary libraries: rdflib, Graph, Literal, RDF, URIRef, and Namespace
  • Initializes an RDF graph and a custom namespace ex for defining URIs
  • Defines players as subjects using URIs within the custom namespace
  • Adds player attributes (team, position, skills) to the graph using triples
  • Defines a function retrieve_player_kg to find the most relevant player based on a query by matching key terms with skills in the knowledge graph
  • Uses SPARQL to query the graph, applying filters based on synonyms of key skills extracted from the query
  • Evaluates query results to find the best match based on the number of matching skills
  • Defines a function generate_recommendation_kg to create a recommendation message incorporating the query and the retrieved player's information
  • Provides an example query "I need a versatile player.", retrieves the most relevant player, generates a recommendation, and prints the recommendation message

Now let us look at the output:

PowerShell
 
python ragwithknowledgegraph.py
Query: I need a versatile player.

Recommended Player: Player: AlvinKamara, Team: New Orleans Saints, Skills: versatility, agility, pass-catching

Recommendation: Based on the query, the recommended player is a good fit for your team.


Conclusion: Leveraging RAG for Enhanced Knowledge Graphs

Incorporating knowledge graphs into RAG applications results in more accurate, relevant, and context-aware recommendations, showcasing their importance in improving AI capabilities.

Here are a few key takeaways:

  • ragwithoutknowledgegraph.py uses TF-IDF and cosine similarity for text-based retrieval, relying on keyword matching for player recommendations.
  • ragwithknowledgegraph.py leverages a knowledge graph, using RDF data structure and SPARQL queries to match player attributes more contextually and semantically.
  • Knowledge graphs significantly enhance retrieval accuracy by adeptly understanding the intricate relationships and context between data entities.
  • They support more complex and flexible queries, improving the quality of recommendations.
  • Knowledge graphs provide a structured and interconnected data representation, leading to better insights.
  • The illustration demonstrates the limitations of traditional text-based retrieval methods.
  • It highlights the superior performance and relevance of using knowledge graphs in RAG applications.
  • The integration of knowledge graphs significantly enhances AI-driven recommendation systems.

Additional Resources

Below are some of the resources that help with learning knowledge graphs and their impact on RAG solutions.

Courses to Learn More About RAG and Knowledge Graphs

  • https://learn.deeplearning.ai/courses/knowledge-graphs-rag/lesson/1/introduction
  • https://ieeexplore.ieee.org/document/10698122

Open-Source Tools and Applications

  • https://neo4j.com/generativeai/
AI Knowledge Graph large language model RAG

Opinions expressed by DZone contributors are their own.

Related

  • Supercharging LLMs With Knowledge Graphs for Smarter, Fairer AI
  • Getting Started With LangChain for Beginners
  • Have LLMs Solved the Search Problem?
  • Unlocking Local AI: Build RAG Apps Without Cloud or API Keys

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!