Reducing Hallucinations Using Prompt Engineering and RAG

Using prompt engineering as a tool to reduce hallucinations with LLMs. This is one of the methodologies I used with LLM to output the desired information.

Pranav Kumar Chaudhary

Jul. 02, 25 · Analysis

Likes (3)

Comment

Save

2.6K Views

Overview

Large language models (LLMs) are a powerful tool to generate content. The generative capabilities of these LLMs come with various pros and cons. One of the major issues we often encounter is the factual correctness of the generated content. The models have a high tendency to hallucinate and sometimes generate non-existent and incorrect content. These generated contents are so impressive that they look like they are factually correct and viable. As developers, it is our responsibility to ensure the system works perfectly and generates concise content.

In this article, I will delve into two of the major methodologies that I employed to lower the hallucinations for applications developed using AWS Bedrock and other AWS tools and technologies.

Prompt Engineering

System Prompt

Set up role: Using the system prompt, you can set up the role for the LLM. This will instruct the model to assume the provided role and generate the content in a confined space.
Set up boundaries: Boundaries are something that will instruct LLM to generate content in the given space. This helps in setting up the clarity and precision to break down the instructions and act accordingly.
Enhance security: Security is one of the important aspects of any software application. System prompt helps in upending the security of the LLM application by adding an extra layer of protection between user input and the LLM.

A clear system prompt will help the LLM to break down the instructions into steps and make decisions accordingly. This will make the system clearer, concise, and active. In order to design the system prompt, we need to first:

Identify the use case: A generic system is prone to error, and it can assume any role it wants. In order to minimize the risk of hallucinations, we first need to identify the use case and assign a role to the LLM. This role will help the LLM to work in the given space. E.g., “You are working as a research assistant to break down the input user queries, use input data, validate, and generate content.” “As a marketing assistant, using the inputs, generate the required output without assuming any information. If you require more information, please ask the user.”
Identify the constraints and boundaries: It is essential for such systems to understand the constraints and boundaries beyond which they should not resolve any data. This can be supplied using constraints and boundaries. E.g., if you don’t know the answers, revert to “I can not help with this” instead of making up the answers. “Return the response in strict JSON format. Before returning, validate the JSON and fix any JSON errors, etc.”
Identify presentation requirements: Formatting is another requirement to check for before designing the system prompt. Formatting like delimiters, output formats, etc., helps presentation layers to render the generated content in the required format. E.g., “Create bullet points for the list of items”, “Generate the output in JSON format,” etc.

Retrieval-Augmented Generation (RAG)

1. Knowledge Base (KB) Data Sync

For this, I have leveraged AWS OpenSearch to store the generated embeddings. The source data is stored and synced in an S3 bucket regularly to ensure the latest information is available for the KB. This S3 bucket is the source for the KB, which will be chunked using the chunking strategy and stored in the OpenSearch vector store.

2. Embedding Model

The embedding model is used to create the vector embeddings of the provided source data. For this, I have leveraged the Amazon Titan Embedding model to create the embeddings of the source data. The Titan Embedding model is one of the text-to-vector models, and a vector is a mathematical representation of given information (text).

Vectors represent a multidimensional view of the data, which allows for efficient search, indexing, and other operations, and can be used to calculate the similarity or distance between different data points. This is useful for clustering, finding nearest neighbors, and other tasks that require identifying similar objects.

3. Knowledge Bases Creation

The next step is to create the KB using Amazon Titan Embedding Model and a chunking strategy to ensure efficient data chunking and retrieval. The data from S3 is used as a source and chunked and stored in OpenSearch vector databases. OpenSearch provides various out-of-the-box serverless capabilities to ensure scaling, efficient retrieval, query, filtering, etc.

4. RAG Library

A RAG library is required to efficiently perform all RAG operations across various data sources. This library, upon receiving a user query, will perform a KB query to retrieve the relevant chunks using similarity search. Once this chunk is retrieved, it is used to enrich the prompt with the retrieved data. This will provide the LLMs with the required context and details of the information for the given input query.

5. Output Generation

The LLM, upon receiving the enriched prompt with relevant information, will generate the output in a confined role and with the information. This will ensure they do not inject any non-existent data or make up data in the output.

Conclusion

This process has enabled me to curb the hallucinations and generate factually correct information with given citations (link to the documents referred to). Apart from the above approach, I have also experimented with another approach, using LLM as a judge to evaluate the generated content against the gold dataset. This was a measure to ensure the fairness of the generated content.

Knowledge base large language model RAG

Opinions expressed by DZone contributors are their own.

Related

Trending