Knowledge Graphs: The Secret Weapon for Superior RAG Applications

Integrating knowledge graphs in RAG applications enhances recommendation accuracy and context-awareness, providing structured, interconnected data.

Pavan Vemuri

Prince Bose

Tharakarama Reddy Yernapalli Sreenivasulu

Aug. 19, 24 · Tutorial

Like (6)

Save

5.0K Views

RAG and Its Importance

Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm, blending the strengths of information retrieval and natural language generation. By leveraging large datasets to retrieve relevant information and generating coherent and contextually appropriate responses, RAG systems have the potential to revolutionize applications ranging from customer support to content creation.

Challenges With RAG

However, as powerful as RAG systems are, they face challenges, particularly in maintaining contextual accuracy and efficiently managing vast amounts of data.

Knowledge Graphs to the Rescue

Knowledge graphs are sophisticated data structures that represent information in a graph format, where entities are nodes and relationships are edges. This structure plays a crucial role in overcoming the challenges faced by RAG systems, as it allows for a highly interconnected and semantically rich representation of data, enabling more effective organization and retrieval of information.

What Knowledge Graphs Bring to the Table

Knowledge graphs fundamentally enhance RAG systems by providing a robust framework for understanding and navigating complex data relationships. They enable the AI not just to retrieve information based on keywords but to understand the context and interconnections between different pieces of information. This leads to more accurate, relevant, and contextually aware responses, significantly improving the performance of RAG applications.

Now let us look at the importance of knowledge graphs in enhancing RAG application through a coding example.

To showcase the importance, we will take the example of retrieving a player recommendation for a fantasy draft. We will ask the same question to the RAG application with and without knowledge graphs implemented, and we will see the improvement in the output.

RAG Without Knowledge Graphs

    Python
   
 

   from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Sample player descriptions
players = [
    "Patrick Mahomes is a quarterback for the Kansas City Chiefs, known for his strong arm and playmaking ability.",
    "Derrick Henry is a running back for the Tennessee Titans, famous for his power running and consistency.",
    "Davante Adams is a wide receiver for the Las Vegas Raiders, recognized for his excellent route running and catching ability.",
    "Tom Brady is a veteran quarterback known for his leadership and game management.",
    "Alvin Kamara is a running back for the New Orleans Saints, known for his agility and pass-catching ability."
]

# Vectorize player descriptions
vectorizer = TfidfVectorizer()
player_vectors = vectorizer.fit_transform(players)

# Function to retrieve the most relevant player
def retrieve_player(query, player_vectors, players):
    query_vector = vectorizer.transform([query])
    similarities = cosine_similarity(query_vector, player_vectors).flatten()
    most_similar_player_index = np.argmax(similarities)
    return players[most_similar_player_index]

# Function to generate a recommendation
def generate_recommendation(query, retrieved_player):
    response = f"Query: {query}\n\nRecommended Player: {retrieved_player}\n\nRecommendation: Based on the query, the recommended player is a good fit for your team."
    return response

# Example query
query = "I need a versatile player."
retrieved_player = retrieve_player(query, player_vectors, players)
response = generate_recommendation(query, retrieved_player)

print(response)

  

I have oversimplified the RAG case for ease of understanding. Below is what the above code does:

Imports necessary libraries: TfidfVectorizer from sklearn, cosine_similarity from sklearn, and numpy
Defines sample player descriptions with details about their positions and notable skills
Player descriptions are vectorized using TF-IDF to convert the text into numerical vectors for precise similarity comparison.
Defines a function retrieve_player to find the most relevant player based on a query by calculating cosine similarity between the query vector and player vectors
Defines a function generate_recommendation to create a recommendation message incorporating the query and the retrieved player's description
Provides an example query, "I need a versatile player."; Retrieves the most relevant player, generates a recommendation, and prints the recommendation message

Now let's look at the output:

    PowerShell
   
   python ragwithoutknowledgegraph.py
Query: I need a versatile player.

Recommended Player: Patrick Mahomes is a quarterback for the Kansas City Chiefs, known for his strong arm and playmaking ability.

Recommendation: Based on the query, the recommended player is a good fit for your team.

As you can see, when we were asked for a versatile player, the recommendation was Patrick Mahomes.

RAG With Knowledge Graphs

Now let us look at how knowledge graphs can help enhance RAG and give a better recommendation.

    Python
   
 

   import rdflib
from rdflib import Graph, Literal, RDF, URIRef, Namespace

# Initialize the graph
g = Graph()
ex = Namespace("http://example.org/")

# Define players as subjects
patrick_mahomes = URIRef(ex.PatrickMahomes)
derrick_henry = URIRef(ex.DerrickHenry)
davante_adams = URIRef(ex.DavanteAdams)
tom_brady = URIRef(ex.TomBrady)
alvin_kamara = URIRef(ex.AlvinKamara)

# Add player attributes to the graph
g.add((patrick_mahomes, RDF.type, ex.Player))
g.add((patrick_mahomes, ex.team, Literal("Kansas City Chiefs")))
g.add((patrick_mahomes, ex.position, Literal("Quarterback")))
g.add((patrick_mahomes, ex.skills, Literal("strong arm, playmaking")))

g.add((derrick_henry, RDF.type, ex.Player))
g.add((derrick_henry, ex.team, Literal("Tennessee Titans")))
g.add((derrick_henry, ex.position, Literal("Running Back")))
g.add((derrick_henry, ex.skills, Literal("power running, consistency")))

g.add((davante_adams, RDF.type, ex.Player))
g.add((davante_adams, ex.team, Literal("Las Vegas Raiders")))
g.add((davante_adams, ex.position, Literal("Wide Receiver")))
g.add((davante_adams, ex.skills, Literal("route running, catching ability")))

g.add((tom_brady, RDF.type, ex.Player))
g.add((tom_brady, ex.team, Literal("Retired")))
g.add((tom_brady, ex.position, Literal("Quarterback")))
g.add((tom_brady, ex.skills, Literal("leadership, game management")))

g.add((alvin_kamara, RDF.type, ex.Player))
g.add((alvin_kamara, ex.team, Literal("New Orleans Saints")))
g.add((alvin_kamara, ex.position, Literal("Running Back")))
g.add((alvin_kamara, ex.skills, Literal("versatility, agility, pass-catching")))

# Function to retrieve the most relevant player using the knowledge graph
def retrieve_player_kg(query, graph):
    # Define synonyms for key skills
    synonyms = {
        "versatile": ["versatile", "versatility"],
        "agility": ["agility"],
        "pass-catching": ["pass-catching"],
        "strong arm": ["strong arm"],
        "playmaking": ["playmaking"],
        "leadership": ["leadership"],
        "game management": ["game management"]
    }
    
    # Extract key terms from the query and match with synonyms
    key_terms = []
    for term, syns in synonyms.items():
        if any(syn in query.lower() for syn in syns):
            key_terms.extend(syns)
    
    filters = " || ".join([f"contains(lcase(str(?skills)), '{term}')" for term in key_terms])
    
    query_string = f"""
    PREFIX ex: <http://example.org/>
    SELECT ?player ?team ?skills WHERE {{
        ?player ex:skills ?skills .
        ?player ex:team ?team .
        FILTER ({filters})
    }}
    """
    
   
    
    qres = graph.query(query_string)
  
    
    best_match = None
    best_score = -1
    
    for row in qres:
        skill_set = row.skills.lower().split(', ')
        score = sum(term in skill_set for term in key_terms)
        if score > best_score:
            best_score = score
            best_match = row
    
    if best_match:
        return f"Player: {best_match.player.split('/')[-1]}, Team: {best_match.team}, Skills: {best_match.skills}"
    
    return "No relevant player found."

# Function to generate a recommendation
def generate_recommendation_kg(query, retrieved_player):
    response = f"Query: {query}\n\nRecommended Player: {retrieved_player}\n\nRecommendation: Based on the query, the recommended player is a good fit for your team."
    return response

# Example query
query = "I need a versatile player."
retrieved_player = retrieve_player_kg(query, g)
response = generate_recommendation_kg(query, retrieved_player)

print(response)

  

Let us look at what the above code does:

Imports necessary libraries: rdflib, Graph, Literal, RDF, URIRef, and Namespace
Initializes an RDF graph and a custom namespace ex for defining URIs
Defines players as subjects using URIs within the custom namespace
Adds player attributes (team, position, skills) to the graph using triples
Defines a function retrieve_player_kg to find the most relevant player based on a query by matching key terms with skills in the knowledge graph
Uses SPARQL to query the graph, applying filters based on synonyms of key skills extracted from the query
Evaluates query results to find the best match based on the number of matching skills
Defines a function generate_recommendation_kg to create a recommendation message incorporating the query and the retrieved player's information
Provides an example query "I need a versatile player.", retrieves the most relevant player, generates a recommendation, and prints the recommendation message

Now let us look at the output:

    PowerShell
   
   python ragwithknowledgegraph.py
Query: I need a versatile player.

Recommended Player: Player: AlvinKamara, Team: New Orleans Saints, Skills: versatility, agility, pass-catching

Recommendation: Based on the query, the recommended player is a good fit for your team.

Conclusion

ragwithoutknowledgegraph.py uses TF-IDF and cosine similarity for text-based retrieval, relying on keyword matching for player recommendations.
ragwithknowledgegraph.py leverages a knowledge graph, using RDF data structure and SPARQL queries to match player attributes more contextually and semantically.
Knowledge graphs significantly enhance retrieval accuracy by adeptly understanding the intricate relationships and context between data entities.
They support more complex and flexible queries, improving the quality of recommendations.
Knowledge graphs provide a structured and interconnected data representation, leading to better insights.
The illustration demonstrates the limitations of traditional text-based retrieval methods.
It highlights the superior performance and relevance of using knowledge graphs in RAG applications.
The integration of knowledge graphs significantly enhances AI-driven recommendation systems.

Key Takeaway

Incorporating knowledge graphs into RAG applications results in more accurate, relevant, and context-aware recommendations, showcasing their importance in improving AI capabilities.

AI Data structure Resource Description Framework applications Knowledge Graph

Opinions expressed by DZone contributors are their own.

Related

Trending