Building an AI-Powered Insurance Q and A Assistant With RAG and Snowflake Cortex
This project shows how to create an AI-powered insurance Q&A assistant using Retrieval-Augmented Generation and Snowflake Cortex Search for accurate answers.
Join the DZone community and get the full member experience.
Join For FreeIn the insurance industry, there are vast amounts of data stored in documents like policies, claim details, and FAQs. Providing answers to customers' queries quickly and accurately is crucial for satisfaction and efficiency. The objective of this project is to develop an AI-powered Q&A assistant using Retrieval Augmented Generation (RAG) and Snowflake Cortex Search.
RAG (Retrieval-Augmented Generation) integrates large language models with external information retrieval. Upon the user asking a question, the system brings back candidate documents from a knowledge base. The documents act as context to the LLM to generate a proper and informative response.
This project demonstrates how to build a robust and effective insurance Q&A assistant by combining the strengths of Retrieval Augmented Generation (RAG) with Snowflake Cortex Search. Using Snowflake's semantic search capability, we can quickly retrieve contextually relevant information from an insurance document knowledge base. The retrieved context is then used as input to a Large Language Model (LLM) to generate accurate and informative answers to user queries.
Snowflake Cortex Search allows us to perform semantic search directly within Snowflake, enabling efficient retrieval of relevant documents based on their meaning, not just keywords
Here is step by step setup :
This process provides significant benefit compared to traditional keyword-based search or using LLMs without contextual awareness.
1. Setup Environment:
pip install streamlit snowflake-connector-python python-dotenv
2. Create an .env file:
SNOWFLAKE_ACCOUNT=your_snowflake_account
SNOWFLAKE_USER=your_snowflake_user
SNOWFLAKE_PASSWORD=your_snowflake_password
SNOWFLAKE_WAREHOUSE=your_snowflake_warehouse
SNOWFLAKE_DATABASE=your_snowflake_database
SNOWFLAKE_SCHEMA=your_snowflake_schema
3.Create the INSURANCE_DOCUMENTS table in Snowflake:
CREATE TABLE INSURANCE_DOCUMENTS (
DOCUMENT_CONTENT STRING
);
4. Populate the table with sample example data:
INSERT INTO INSURANCE_DOCUMENTS (DOCUMENT_CONTENT) VALUES
('Auto insurance policy coverage: This policy covers damages caused by collisions, theft, and natural disasters. Liability coverage is also included.'),
('Claim filing guide for water damage: To file a water damage claim, provide photos, a description of the damage, and any repair estimates.'),
('Health insurance plan limitations: This plan does not cover cosmetic surgery or experimental treatments. Pre-existing conditions may have waiting periods.'),
('Life insurance beneficiary change process: To change your beneficiary, complete a beneficiary change form and submit it to the insurance company.'),
('Documents needed for car accident claims: You will need a police report, photos of the accident scene, and contact information for all parties involved.');
5. Create the Streamlit application (insurance_qa.py):
import streamlit as st
import snowflake.connector
import os
from dotenv import load_dotenv
load_dotenv()
# Snowflake Cortex Search Configuration
SNOWFLAKE_ACCOUNT = os.environ.get("SNOWFLAKE_ACCOUNT")
SNOWFLAKE_USER = os.environ.get("SNOWFLAKE_USER")
SNOWFLAKE_PASSWORD = os.environ.get("SNOWFLAKE_PASSWORD")
SNOWFLAKE_WAREHOUSE = os.environ.get("SNOWFLAKE_WAREHOUSE")
SNOWFLAKE_DATABASE = os.environ.get("SNOWFLAKE_DATABASE")
SNOWFLAKE_SCHEMA = os.environ.get("SNOWFLAKE_SCHEMA")
SNOWFLAKE_TABLE = "INSURANCE_DOCUMENTS"
def connect_to_snowflake():
try:
conn = snowflake.connector.connect(
account=SNOWFLAKE_ACCOUNT,
user=SNOWFLAKE_USER,
password=SNOWFLAKE_PASSWORD,
warehouse=SNOWFLAKE_WAREHOUSE,
database=SNOWFLAKE_DATABASE,
schema=SNOWFLAKE_SCHEMA,
)
return conn
except Exception as e:
st.error(f"Error connecting to Snowflake: {e}")
return None
def snowflake_cortex_search(conn, query):
try:
cursor = conn.cursor()
sql = f"""
SELECT DOCUMENT_CONTENT
FROM {SNOWFLAKE_TABLE}
WHERE SEMANTIC_SIMILARITY(DOCUMENT_CONTENT, '{query}') > 0.7 # Adjust similarity threshold
ORDER BY SEMANTIC_SIMILARITY(DOCUMENT_CONTENT, '{query}') DESC
LIMIT 3; # Limit the number of retrieved documents
"""
cursor.execute(sql)
results = cursor.fetchall()
return [row[0] for row in results]
except Exception as e:
st.error(f"Error performing semantic search: {e}")
return []
def generate_response(query, context):
if not context:
return "No relevant information found in the documents."
prompt = f"""
You are an insurance assistant. Use the following context to answer the user's question.
Context:
{context}
Question: {query}
Answer:
"""
# Replace with your actual LLM API call (e.g., OpenAI, Vertex AI)
# Example using a placeholder:
response = f"Based on the provided information: [Simulated LLM Response] {prompt}"
return response
# Streamlit App
st.title("Insurance Q&A Assistant")
query = st.text_input("Enter your insurance question:")
if st.button("Get Answer"):
if query:
conn = connect_to_snowflake()
if conn:
context_docs = snowflake_cortex_search(conn, query)
conn.close()
context = "\n\n".join(context_docs)
response = generate_response(query, context)
st.write(response)
else:
st.error("Could not connect to Snowflake.")
Replace the placeholder LLM API call within the generate_response function with your own real LLM API integration. (e.g., OpenAI, Google Vertex AI, etc.) You will need to install the respective python library for your selected LLM, and modify the generate response function to use that library.
-
Adjust the
SEMANTIC_SIMILARITYthreshold in the Snowflake query as needed. -
Adjust the limit of the returned rows from snowflake as needed.
6. Run the Streamlit application:
streamlit run insurance_qa.py
- Open your terminal or command prompt.
- Navigate to the directory where you saved insurance_qa.py.
- Run the command: streamlit run insurance_qa.py.
Interact With the Application
- Streamlit will open a web browser window displaying the insurance Q&A assistant.
- Enter your insurance-related questions into the text input field.
- Click the "Get Answer" button to retrieve the answer.
- Observe the response provided by the LLM, augmented by the context from Snowflake.
This process flow effectively demonstrates how an insurance Q&A assistant can be developed using Retrieval Augmented Generation (RAG) and Snowflake Cortex Search. By integrating Snowflake's semantic search capability with a Large Language Model (LLM), we've created a system that produces accurate and contextually relevant answers to user queries. This method of response generation and information retrieval is significantly better than the traditional ones, with the assurance of improved accuracy, efficiency, and scalability.
To access and understand enormous amounts of insurance data in natural language format can enable automated customer service, streamline business processes, and enable enhanced overall experience throughout the insurance sector. While this process flow itself is a good foundation, there is still a need for additional enhancements in LLM integration, data pre-processing, and deployment strategies in creating a stable and production-grade application. Lastly, this integration of RAG and Snowflake Cortex Search process is a milestone in itself to tap the potential of AI in transforming information access effectively utilize it in insurance sector
Opinions expressed by DZone contributors are their own.
Comments