Graphs and Language
This comprehensive blog is a written transcription of Louis Guitton's talk at a recent conference, where he shared his insights around knowledge on Graph RAG and LLMs.
Join the DZone community and get the full member experience.
Join For FreeA rising tide lifts all boats, and the recent advances in LLMs are no exception. In this blog post, we will explore how Knowledge Graphs can benefit from LLMs, and vice versa.
In particular, Knowledge Graphs can ground LLMs with facts using Graph RAG, which can be cheaper than Vector RAG. We'll look at a 10-line code example in LlamaIndex and see how easy it is to start. LLMs can help build automated KGs, which have been a bottleneck in the past. Graphs can provide your Domain Experts with an interface to supervise your AI systems.
A Trip Down Memory Lane at Spacy IRL 2019
I've been working with Natural Language Processing for a few years now, and I've seen the rise of Large Language Models. The start of my NLP and Graphs work dates back to 2018, applied to the Sports Media domain when I worked as a Machine Learning Engineer at OneFootball, a football media company from Berlin, Germany.
As a practitioner, I remember that time well because it was a time of great change in the NLP field. We were moving from the era of rule-based systems and word embeddings to the era of deep learning, moving from LSTMs to a slew of models like Elmo or ULMfit based on the transformer architecture. I was one of the lucky few who could attend the Spacy IRL 2019 conference in Berlin. There were corporate training workshops followed by talks about Transformers, conversational AI assistants, and applied NLP in finance or media.
In his keynote, The missing elements in NLP (spaCy IRL 2019), Yoav Goldberg predicts that the next big development will be to enable non-experts to use NLP. He was right. He thought we would get there by humans writing rules aided by Deep Learning resulting in transparent and debuggable models. He was wrong. We got there with chat, and we now have less transparent and less debuggable models. We moved further right and down on his chart (see below) to a place deeper than Deep Learning. The jury is still out on whether we can move towards more transparent models that work for non-experts and with little data.
In the context of my employer at the time, OneFootball, a football media in 12 languages with 10 million monthly active users, we used NLP to assist our newsroom and unlock new product features. I built systems to extract entities and relations from football articles, tag the news, and recommend articles to users. I shared some of that work in a previous talk at a Berlin NLP meetup. We had medium data, not a lot. We had partial labels in the form of "retags." We also could not pay for much compute. So we had to be creative. It was the realm of Applied NLP.
That's where I stumbled upon the beautiful world of Graphs, specifically the great work from my now friend Paco Nathan with his library pytextrank. Graphs (along with rule-based matchers, weak supervision, and other NLP tricks I applied over the years) helped me work with little annotated data and incorporate declarative knowledge from domain experts while building a system that could be used and maintained by non-experts, with some level of human+machine collaboration. We shipped a much better tagging system and a new recommendation system, and I was hooked.
Today with the rise of LLMs, I see a lot of potential to combine the two worlds of Graphs and LLMs, and I want to share that with you.
1. Fact Grounding With Graph RAG
1.1 Fine-Tuning vs, Retrieval-Augmented Generation
The first place where Graphs and LLMs meet is in the area of fact grounding. LLMs suffer from a few issues like hallucination, knowledge cut-off, bias, and lack of control. To circumvent those issues, people have turned to their available domain data. In particular, two approaches emerged: Fine Tuning and Retrieval-Augmented Generation (RAG).
In his talk LLMs in Production at the AI Conference 3 months ago, Dr. Waleed Kadous, Chief Scientist at AnyScale, sheds some light on navigating the trade-offs between the two approaches. "Fine-tuning is for form, not facts," he says. "RAG is for facts".
Fine-tuning will get easier and cheaper. Open-source libraries like OpenAccess-AI-Collective/axolotl and huggingface/trl already make this process easier. But, it's still resource-intensive and requires more NLP maturity as a business. RAG is more accessible, on the other hand.
According to this Hacker News thread from 2 months ago, Ask HN: How do I train a custom LLM/ChatGPT on my documents in Dec 2023?, the vast majority of practitioners are indeed using RAG rather than fine-tuning.
1.2 Vector RAG vs. Graph RAG
When people say RAG, they usually mean Vector RAG, which is a retrieval system based on a Vector Database. In their blog post and accompanying notebook tutorial, NebulaGraph introduces an alternative that they call Graph RAG, which is a retrieval system based on a Graph Database (disclaimer: they are a Graph database vendor). They show that the facts retrieved by the RAG system will vary based on the chosen architecture.
They also show in a separate tutorial part of the LlamaIndex docs that Graph RAG is more concise and hence cheaper in terms of tokens than Vector RAG.
1.3 RAG Zoo
To make sense of the different RAG architectures, consider the following diagrams I created:
In all cases, we ask a question in natural language QNL and we get an answer in natural language ANL. In all cases, there is some kind of Encoding model that extracts structure from the question, coupled with some kind of Generator model ("Answer Gen") that generates the answer.
Vector RAG embeds the query (usually with a smaller model than the LLM; something like FlagEmbeddings or any small of the models at the top of the Huggingface Embeddings Leaderboard) into a vector embedding vQ. It then retrieves the top-k document chunks from the Vector DB that are closest to vQ and returns those as vectors and chunks (vj, Cj). Those are passed along with QNL as context to the LLM, which generates the answer ANL.
Graph RAG extracts the keywords ki from the query and retrieves triples from the graph that match the keyword. It then passes the triples (sj, pj, oj) along with QNL to the LLM, which generates the answer ANL.
Structured RAG uses a Generator model (LLM or smaller fine-tuned model) to generate a query in the database's query language. It could generate a SQL query for a RDBMS or a Cypher query for a Graph DB. For example, let's imagine we query a RDBMS: the model will generate QSQL which is then passed to the database to retrieve the answer. We note the answer ASQL but those are data records that result from running QSQL in the database. The answer ASQL as well as QNL are passed to the LLM to generate ANL.
In the case of Hybrid RAG, the system uses a combination of the above. There are multiple hybridation techniques that go beyond this blog post. The simple idea is that you pass more context to the LLM for Answer Gen, and you let it use its summarisation strength to generate the answer.
1.4 Graph RAG Implementation in LlamaIndex
And now for the code, with the current frameworks, we can build a Graph RAG system in 10 lines of Python.
from llama_index.llms import Ollama
from llama_index import ServiceContext, KnowledgeGraphIndex
from llama_index.retrievers import KGTableRetriever
from llama_index.graph_stores import NebulaGraphStore
from llama_index.storage.storage_context import StorageContext
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.data_structs.data_structs import KG
from IPython.display import Markdown, display
llm = Ollama(model='mistral', base_url="http://localhost:11434")
service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:BAAI/bge-small-en")
graph_store = NebulaGraphStore(space_name="wikipedia", edge_types=["relationship"], rel_prop_names=["relationship"], tags=["entity"])
storage_context = StorageContext.from_defaults(graph_store=graph_store)
kg_index = KnowledgeGraphIndex(index_struct=KG(index_id="vector"), service_context=service_context, storage_context=storage_context)
graph_rag_retriever = KGTableRetriever(index=kg_index, retriever_mode="keyword")
kg_rag_query_engine = RetrieverQueryEngine.from_args(retriever=graph_rag_retriever, service_context=service_context)
response_graph_rag = kg_rag_query_engine.query("Tell me about Peter Quill.")
display(Markdown(f"<b>{response_graph_rag}</b>"))
This snippet assumes you have Ollama serving the mistral model and a Nebula instance running locally. It also assumes you have a Knowledge Graph loaded in your Nebula database, but if you don't we'll cover in the next section how to build one.
Details on Getting Started With Nebula
You can start a Nebula instance with the Docker desktop Nebula Extension.
Once you have Nebula running, you need to do a first-time setup:
ADD HOSTS "storaged0":9779,"storaged1":9779,"storaged2":9779
Then you need to create the index before using it:
CREATE SPACE wikipedia(vid_type=FIXED_STRING(256), partition_num=1, replica_factor=1);
and:
USE wikipedia;
CREATE TAG entity(name string);
CREATE EDGE relationship(relationship string);
CREATE TAG INDEX entity_index ON entity(name(256));
2. KG construction
2.1 Building a Knowledge Graph
Before conducting inference, you need to index your data either in a Vector DB or a Graph DB.
Indexing architectures for RAG The equivalent of chunking and embedding documents for Vector RAG is extracting triples for Graph RAG. Triples are of the form (s, p, o) where s is the subject, p is the predicate, and o is the object. Subjects and objects are entities, and predicates are relationships.
There are a few ways to extract triples from text, but the most common way is to use a combination of a Named Entity Recogniser (NER) and a Relation Extractor (RE). NER will extract entities like "Peter Quill" and "Guardians of the Galaxy Vol 3", and RE will extract relationships like "plays role in" and "directed by".
There are fine-tuned models specialized in RE like REBEL, but people started using LLMs to extract triples. Here is the default prompt chain of LlamaIndex for RE:
Some text is provided below. Given the text, extract up to
{max_knowledge_triplets}
knowledge triplets in the form of (subject, predicate, object). Avoid stopwords.
---------------------
Example:
Text: Alice is Bob's mother.
Triplets: (Alice, is mother of, Bob)
Text: Philz is a coffee shop founded in Berkeley in 1982.
Triplets:
(Philz, is, coffee shop)
(Philz, founded in, Berkeley)
(Philz, founded in, 1982)
---------------------
Text: {text}
Triplets:
The issue with this approach is that first you have to parse the chat output with regexes, and second you have no control over the quality of entities or relationships extracted.
2.2 KG Construction Implementation in LlamaIndex
With LlamaIndex however, you can build a KG in 10 lines of Python using the following code snippet:
from llama_index.llms import Ollama
from llama_index import ServiceContext, KnowledgeGraphIndex
from llama_index.graph_stores import NebulaGraphStore
from llama_index.storage.storage_context import StorageContext
from llama_index import download_loader
llm = Ollama(model='mistral', base_url="http://localhost:11434")
service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:BAAI/bge-small-en")
graph_store = NebulaGraphStore(space_name="wikipedia", edge_types=["relationship"], rel_prop_names=["relationship"], tags=["entity"])
storage_context = StorageContext.from_defaults(graph_store=graph_store)
loader = download_loader("WikipediaReader")()
documents = loader.load_data(pages=['Guardians of the Galaxy Vol. 3'], auto_suggest=False)
kg_index = KnowledgeGraphIndex.from_documents(
documents,
storage_context=storage_context,
service_context=service_context,
max_triplets_per_chunk=5,
include_embeddings=True,
kg_triplet_extract_fn=None,
kg_triple_extract_template=None,
space_name="wikipedia",
edge_types=["relationship"],
rel_prop_names=["relationship"],
tags=["entity"],
)
2.3 Example Failure Modes of LLM-Based KG Construction
However, if we have a look at the resulting KG for the movie "Guardians of the Galaxy Vol 3", we can note a few issues.
Here is a table overview of the issues
# | Observed | Expected | Comment |
1. | "Peter Quill / star-lord" vs "Quill" or "Guardians of the Galaxy" vs "Vol. 3" are separate entities | Different synonyms should still disambiguate to the same entity | Entity Linking systems are used to disambiguate entities via collected "surface forms" |
2. | "plays role in" and "is part of the cast in" are different relationships that mean the same thing | Relationships should be consistent or, even better, matching a provided controlled vocabulary | Relation Extraction systems are used to extract standardised relationships |
3. | triples (Quill, speaks uncensored language in, Guardians of the Galaxy) and (James Gunn, could not imagine, Guardians of the Galaxy) are imprecise | If a triple is found, it should resolve to the most important information. In this case (Quill, is present in, Guardians of the Galaxy) or (James Gunn, directed, Guardians of the Galaxy) | Could be mitigated by using a controlled vocabulary for relationships |
This is to be compared with the Wikidata graph labeled by humans, which looks like this:
2.4 Towards Better KG Construction
So where do we go from there? KGs are difficult to construct and evolve by nature, which challenges the existing methods in KGs to generate new facts and represent unseen knowledge. The paper Unifying Large Language Models and Knowledge Graphs: A Roadmap provides a good overview of the current state of the art and the challenges ahead.
Knowledge graph construction involves creating a structured representation of knowledge within a specific domain. This includes identifying entities and their relationships with each other. The process of knowledge graph construction typically involves multiple stages, including 1) entity discovery, 2) coreference resolution, and 3) relation extraction. Fig 19 presents the general framework of applying LLMs for each stage in KG construction. More recent approaches have explored 4) end-to-end knowledge graph construction, which involves constructing a complete knowledge graph in one step or directly 5) distilling knowledge graphs from LLMs.
Which is summarised in this figure from the paper:
I've seen only a few projects that have tried to tackle this problem: DerwenAI/textgraphs and IBM/zshot.
3. Unlock Experts
3.1 Human vs. AI
The final place where Graphs and LLMs meet is Human+Machine collaboration. Who doesn't love a "Human vs AI" story? News headlines about "AGI" or "ChatGPT passing the bar exam" are everywhere.
I would encourage the reader to have a look at this answer from the AI Snake Oil newsletter. They make a good point that models like ChatGPT memorize the solutions rather than reason about them, which makes exams a bad way to compare humans with machines.
Going beyond Memorisation, there is a whole area of research around what's called Generalization, Reasoning, Planning, and Representation Learning, and graphs can help with that.
3.2 Human + Machine: Visualization
Rather than against each other, I'm interested in ways Humans and Machines can work together. In particular, how do humans understand and debug black-box models?
One key project that, in my opinion, moved the needle there was the whatlies paper from Vincent Warmerdam, 2020. He used UMAP on embeddings to reveal quality issues in LLMs, and built a framework for others to audit their embeddings rather than blindly trust them.
Similarly, Graph Databases come with a lot of visualization tools out of the box. For example, they would add context with color, metadata, and different layout algorithms (force-based, Sankey).
3.3 Human + Machine: Human in the Loop
Finally, how do we address the lack of control of Deep Learning models, and how do we incorporate declarative knowledge from domain experts?
I like to refer to the phrase "the proof is in the pudding", and by that, I mean that the value of a piece of tech must be judged based on its results in production. And when we look at production systems, we see that LLMs or Deep Learning models are not used in isolation, but rather within Human-in-the-Loop systems.
In a project and paper from 2 weeks ago, Google has started using language models to help it find and spot bugs in its C/C++, Java, and Go code. The results have been encouraging: it has recently started using an LLM based on its Gemini model to “successfully fix 15% of sanitizer bugs discovered during unit tests, resulting in hundreds of bugs patched”. Though the 15% acceptance rate sounds relatively small, it has a big effect on Google's scale. The bug pipeline yields better-than-human fixes — “approximately 95% of the commits sent to code owners were accepted without discussion,” Google writes. “This was a higher acceptance rate than human-generated code changes, which often provoke questions and comments.”
The key takeaway here for me has to do with their architecture:
They built it with an LLM, but they also combined LLMs with smaller more specific AI models, and more importantly with a double human filter on top, thus working with machines.
Conclusion
I remember those 2019 days vividly, moving from LSTMs to Transformers, and we thought that was Deep Learning. Now, with LLMs, we've reached what I would describe as Abysmal Learning. And I like this image because it can mean both "extremely deep" as well as "profoundly bad."
More than ever, we need more control, more transparency, and ways for humans to work with machines. In this blog post, we've seen here a few ways in which Graphs and LLMs can work together to help with that, and I'm excited to see what the future holds.
Published at DZone with permission of Louis Guitton. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments