DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • A Deep Dive into Apache Doris Indexes
  • Python Packages for Validating Database Migration Projects
  • Why Text2SQL Alone Isn’t Enough: Embracing TAG
  • The Future of Data Lakehouses: Apache Iceberg Explained

Trending

  • How to Submit a Post to DZone
  • DZone's Article Submission Guidelines
  • Medallion Architecture: Why You Need It and How To Implement It With ClickHouse
  • Event-Driven Architectures: Designing Scalable and Resilient Cloud Solutions
  1. DZone
  2. Data Engineering
  3. Databases
  4. Building RAG Apps With Apache Cassandra, Python, and Ollama

Building RAG Apps With Apache Cassandra, Python, and Ollama

A brief introduction to Apache Cassandra for retrieval-augmented generation using Python and Ollama for developing applications free of cost locally or on a server.

By 
Varun Setia user avatar
Varun Setia
·
Feb. 03, 25 · Tutorial
Likes (0)
Comment
Save
Tweet
Share
2.9K Views

Join the DZone community and get the full member experience.

Join For Free

Retrieval-augmented generation (RAG) is the most popular approach for obtaining real-time data or updated data from a data source based on text input by users. Thus empowering all our search applications with state-of-the-art neural search. 

In RAG search systems, each user request is converted into a vector representation by embedding model, and this vector comparison is performed using various algorithms such as cosine similarity, longest common sub-sequence, etc., with existing vector representations stored in our vector-supporting database.

The existing vectors stored in the vector database are also generated or updated asynchronously by a separate background process.

This diagram provides a conceptual overview of vector comparison This diagram provides a conceptual overview of vector comparison 


To use RAG, we need at least an embedding model and a vector storage database to be used by the application. Contributions from community and open-source projects provide us with an amazing set of tools that help us build effective and efficient RAG applications.

In this article, we will implement the usage of a vector database and embedding generation model in a Python application. If you are reading this concept for the first time or nth time, you only need tools to work, and no subscription is needed for any tool. You can simply download tools and get started.

Our tech stack consists of the following open-source and free-to-use tools:

  • Operating system – Ubuntu Linux
  • Vector database – Apache Cassandra
  • Embedding model – nomic-embed-text
  • Programming language – Python

Key Benefits of this Stack

  • Open-source
  • Isolated data to meet data compliance standards

This diagram provides a high-level dependency architecture of the system

This diagram provides a high-level dependency architecture of the system

Implementation Walkthrough

You may implement and follow along if prerequisites are fulfilled; otherwise, read to the end to understand the concepts.

Prerequisites

  • Linux (In my case, it's Ubuntu 24.04.1 LTS)
  • Java Setup (OpenJDK 17.0.2)
  • Python (3.11.11)
  • Ollama

Ollama Model Setup

Ollama is an open-source middleware server that acts as an abstraction between generative AI and applications by installing all the necessary tools to make generative AI models available to consume as CLI and API in a machine. It has most of the openly available models like llama, phi, mistral, snowflake-arctic-embed, etc. It is cross-platform and can be easily configured in OS.

In Ollama, we will pull the nomic-embed-text model to generate embeddings.

Run in command line:

Plain Text
 
ollama pull nomic-embed-text


This model generates embeddings of size 768 vectors.

Apache Cassandra Setup and Scripts

Cassandra is an open-source NoSQL database designed to work with a high amount of workloads that require high scaling as per industry needs. Recently, it has added support for Vector search in version 5.0 that will facilitate our RAG use case.

Note: Cassandra requires Linux OS to work; it can also be installed as a docker image.

Installation

Download Apache Cassandra from https://cassandra.apache.org/_/download.html.

Configure Cassandra in your PATH.

Start the server by running the following command in the command line:

Plain Text
 
cassandra


Table

Open a new Linux terminal and write cqlsh; this will open the shell for Cassandra Query Language. Now, execute the below scripts to create the embeddings keyspace, document_vectors table, and necessary index edv_ann_index to perform a vector search. 

SQL
 
CREATE KEYSPACE IF NOT EXISTS embeddings
   WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : '1' };

USE embeddings;

CREATE TABLE IF NOT EXISTS embeddings.document_vectors (
  record_id timeuuid,
  id uuid,
  content_chunk text,
  content_vector VECTOR <FLOAT, 768>,
  created_at timestamp,
  PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);

CREATE INDEX IF NOT EXISTS edv_ann_index
  ON embeddings.document_vectors(content_vector) USING 'sai';


Note: content_vector VECTOR <FLOAT, 768> is responsible for storing vectors of 768 length that are generated by the model.  

Milestone 1: We are ready with database setup to store vectors.

Python Code

This programming language certainly needs no introduction; it is easy to use and loved by the industry with strong community support.

Virtual Environment

Set up virtual environment:

Plain Text
 
sudo apt install python3-virtualenv && python3 -m venv myvenv


Activate virtual environment:

Plain Text
 
source /media/setia/Data/Tutorials/cassandra-ollama-app/myvenv/bin/activate


Packages

Download Datastax Cassandra package:

Plain Text
 
pip install cassandra-driver


Download requests package:

Plain Text
 
pip install requests


File

Create a file named app.py.

Now, write the code below to insert sample documents in Cassandra. This is the first step always to insert data in the database; it can be done by a separate process asynchronously. For demo purposes, I have written a method that will insert documents first in the database. Later on, we can comment on this method once the insertion of documents is successful.

Python
 
from cassandra.cluster import Cluster
from cassandra.query import PreparedStatement, BoundStatement
import uuid
import datetime
import requests

cluster = Cluster(['127.0.0.1'],port=9042)
session = cluster.connect()

def generate_embedding(text):
    embedding_url = 'http://localhost:11434/api/embed'
    body = {
        "model": "nomic-embed-text",
        "input": text
    }

    response = requests.post(embedding_url, json = body)
    return response.json()['embeddings'][0] 

def insert_chunk(content, vector):
    id = uuid.uuid4()
    content_chunk = content
    content_vector = vector
    created_at = datetime.datetime.now()
    insert_query = """
        INSERT INTO embeddings.document_vectors (record_id, id, content_chunk, content_vector, created_at)
        VALUES (now(), ?, ?, ?, ?)
        """
    prepared_stmt = session.prepare(insert_query)
    session.execute(prepared_stmt, [
        id, 
        content_chunk, 
        content_vector, 
        created_at
    ])

def insert_sample_data_in_cassandra():
  sentences = [
    "The aroma of freshly baked bread wafted through the quaint bakery nestled in the cobblestone streets of Paris, making Varun feel like time stood still.",
    "Sipping a spicy masala chai in a bustling tea stall in Mumbai, Varun felt he was tasting the very soul of the city.",
    "The sushi in a small Tokyo diner was so fresh, it felt like Varun was on a culinary journey to the sea itself.",
    "Under the starry desert sky in Morocco, Varun enjoyed a lamb tagine that tasted like a dream cooked slowly over a fire.",
    "The cozy Italian trattoria served the creamiest risotto, perfectly capturing the heart of Tuscany on a plate, which Varun savored with delight.",
    "Enjoying fish tacos on a sunny beach in Mexico, with the waves crashing nearby, made the flavors unforgettable for Varun.",
    "The crispy waffles drizzled with syrup at a Belgian café were worth every minute of waiting, as Varun indulged in the decadent treat.",
    "A bowl of warm pho in a roadside eatery in Hanoi felt like comfort wrapped in a broth of herbs and spices, giving Varun a sense of warmth.",
    "Sampling chocolate truffles in a Swiss chocolate shop, Varun found himself in a moment of pure bliss amidst snow-capped mountains.",
    "The street food stalls in Bangkok served fiery pad Thai that left Varun with a tangy memory of the city’s vibrant energy."
    ]
  for sentence in sentences:
     vector = generate_embedding(sentence)
     insert_chunk(sentence, vector)

insert_sample_data_in_cassandra()



Now, run this file using the commandline in the virtual environment:

Plain Text
 
python app.py


Once the file is executed and documents are inserted, this can be verified by querying the Cassandra database from the cqlsh console. For this, open cqlsh and execute:

SQL
 
SELECT content_chunk FROM embeddings.document_vectors;


This will return 10 documents inserted in the database, as seen in the screenshot below.

10 documents inserted in the database

Milestone 2: We are done with data setup in our vector database.

Now, we will write code to query documents based on cosine similarity. Cosine similarity is the dot product of two vector values. Its formula is A.B / |A||B|. This cosine similarity is internally supported by Apache Cassandra, helping us to compute everything in the database and handle large data efficiently.

The code below is self-explanatory; it fetches the top three results based on cosine similarity using ORDER BY <column name> ANN OF <text_vector> and also returns cosine similarity values. To execute this code, we need to ensure that indexing is applied to this vector column.

Python
 
def query_rag(text):
     text_embeddings = generate_embedding(text)
     select_query = """
        SELECT content_chunk,similarity_cosine(content_vector, ?) FROM embeddings.document_vectors
        ORDER BY content_vector ANN OF ?
        LIMIT 3
        """
     prepared_stmt = session.prepare(select_query)
     result_rows = session.execute(prepared_stmt, [
            text_embeddings,
            text_embeddings 
        ])
     for row in result_rows:
         print(row[0], row[1])


query_rag('Tell about my Bangkok experiences')


Remember to comment insertion code:

Python
 
#insert_sample_data_in_cassandra()


Now, execute the Python code by using python app.py.

We will get the output below:

Plain Text
 
(myvenv) setia@setia-Lenovo-IdeaPad-S340-15IIL:/media/setia/Data/Tutorials/cassandra-ollama-app$ python app.py                                                                               
The street food stalls in Bangkok served fiery pad Thai that left Varun with a tangy memory of the city’s vibrant energy. 0.8205469250679016                                                 
Sipping a spicy masala chai in a bustling tea stall in Mumbai, Varun felt he was tasting the very soul of the city. 0.7719690799713135                                                       
A bowl of warm pho in a roadside eatery in Hanoi felt like comfort wrapped in a broth of herbs and spices, giving Varun a sense of warmth. 0.7495554089546204  


You can see the cosine similarity of "The street food stalls in Bangkok served fiery pad Thai that left Varun with a tangy memory of the city’s vibrant energy." is 0.8205469250679016, which is the closest match.

Final Milestone: We have implemented the RAG search. 

Enterprise Applications

Apache Cassandra

For enterprises, we can use Apache Cassandra 5.0 from popular cloud vendors such as Microsoft Azure, AWS, GCP, etc.

Ollama

This middleware requires a VM compatible with Nvidia-powered GPU for running high-performance models, but we don't need high-end VMs for models used for generating vectors. Depending upon traffic requirements, multiple VMs can be used, or any generative AI service like Open AI, Anthropy, etc, whichever Total Cost of Ownership is lower for scaling needs or Data Governance needs.   

Linux VM

Apache Cassandra and Ollama can be combined and hosted in a single Linux VM if the use case doesn't require high usage to lower the Total Cost of Ownership or to address Data Governance needs.

Conclusion

We can easily build RAG applications by using Linux OS, Apache Cassandra, embedding models (nomic-embed-text) used via Ollama, and Python with good performance without needing any additional cloud subscription or services in the comfort of our machines/servers. 

However, hosting a VM in server(s) or opt for a cloud subscription for scaling as an enterprise application compliant with scalable architectures is recommended. In this Apache, Cassandra is a key component to do the heavy lifting of our vector storage and vector comparison and Ollama server for generating vector embeddings.

That's it! Thanks for reading 'til the end.

Apache Cassandra Database Python (language) Apache RAG

Opinions expressed by DZone contributors are their own.

Related

  • A Deep Dive into Apache Doris Indexes
  • Python Packages for Validating Database Migration Projects
  • Why Text2SQL Alone Isn’t Enough: Embracing TAG
  • The Future of Data Lakehouses: Apache Iceberg Explained

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!