DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Simplifying Multi-LLM Integration With KubeMQ
  • A Guide to Using Amazon Bedrock Prompts for LLM Integration
  • Multimodal RAG Is Not Scary, Ghosts Are Scary
  • Getting Started With C# DataFrame and XPlot.Ploty

Trending

  • Unlocking the Benefits of a Private API in AWS API Gateway
  • Breaking Bottlenecks: Applying the Theory of Constraints to Software Development
  • Unlocking AI Coding Assistants Part 3: Generating Diagrams, Open API Specs, And Test Data
  • Integrating Security as Code: A Necessity for DevSecOps
  1. DZone
  2. Software Design and Architecture
  3. Integration
  4. Ollama + SingleStore - LangChain = :-(

Ollama + SingleStore - LangChain = :-(

Previously, we saw how LangChain provided an efficient and compact solution for integrating Ollama with SingleStore. But what if we were to remove LangChain?

By 
Akmal Chaudhri user avatar
Akmal Chaudhri
DZone Core CORE ·
May. 31, 24 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
2.1K Views

Join the DZone community and get the full member experience.

Join For Free

In a previous article, we used Ollama with LangChain and SingleStore. LangChain provided an efficient and compact solution for integrating Ollama with SingleStore. However, what if we were to remove LangChain? In this article, we’ll demonstrate an example of using Ollama with SingleStore without relying on LangChain. We’ll see that while we can achieve the same results described in the previous article, the number of code increases, requiring us to manage more of the plumbing that LangChain normally handles.

The notebook file used in this article is available on GitHub.

Introduction

From the previous article, we’ll follow the same steps to set up our test environment as described in these sections:

  • Introduction
    • Use a Virtual Machine or venv.
  • Create a SingleStoreDB Cloud account
    • Use Ollama Demo Group as the Workspace Group Name and ollama-demo as the Workspace Name. Make a note of the password and host name. Temporarily allow access from anywhere by configuring the firewall under Ollama Demo Group > Firewall.
  • Create a Database 
  • CREATE DATABASE IF NOT EXISTS ollama_demo;
  • Install Jupyter
    • pip install notebook
  • Install Ollama
    • curl -fsSL https://ollama.com/install.sh | sh
  • Environment Variable
    • export SINGLESTOREDB_URL="admin:<password>@<host>:3306/ollama_demo"
      Replace <password> and <host> with the values for your environment.
  • Launch Jupyter
    • jupyter notebook

Fill out the Notebook

First, some packages:

Shell
 
!pip install ollama numpy pandas sqlalchemy-singlestoredb --quiet --no-warn-script-location


Next, we’ll import some libraries:

Python
 
import ollama
import os
import numpy as np
import pandas as pd
from sqlalchemy import create_engine, text


We’ll create embeddings using all-minilm (45 MB at the time of writing):

Python
 
ollama.pull("all-minilm")


Example output:

Plain Text
 
{'status': 'success'}


For our LLM we’ll use llama2 (3.8 GB at the time of writing): 

Python
 
ollama.pull("llama2")


Example output:

Plain Text
 
{'status': 'success'}


Next, we’ll use the example text from the Ollama website:

Python
 
documents = [
    "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels",
    "Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands",
    "Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall",
    "Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight",
    "Llamas are vegetarians and have very efficient digestive systems",
    "Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old"
]

df_data = []

for doc in documents:
    response = ollama.embeddings(
        model = "all-minilm",
        prompt = doc
    )
    embedding = response["embedding"]
    embedding_array = np.array(embedding).astype(np.float32)
    df_data.append({"content": doc, "vector": embedding_array})

df = pd.DataFrame(df_data)

dimensions = len(df.at[0, "vector"])


We’ll set the embeddings to all-minilm and iterate through each document to build up the content for a Pandas DataFrame. Additionally, we’ll convert the embeddings to a 32-bit format as this is SingleStore’s default for the VECTOR data type. Lastly, we’ll determine the number of embedding dimensions for the first document in the Pandas DataFrame. 

Next, we’ll create a connection to our SingleStore instance:

Python
 
connection_url = "singlestoredb://" + os.environ.get("SINGLESTOREDB_URL")

db_connection = create_engine(connection_url)


Now we’ll create a table with the vector column using the dimensions we previously determined:

Python
 
query = text("""
CREATE TABLE IF NOT EXISTS pandas_docs (
    id BIGINT AUTO_INCREMENT NOT NULL,
    content LONGTEXT,
    vector VECTOR(:dimensions) NOT NULL,
    PRIMARY KEY(id)
);
""")

with db_connection.connect() as conn:
    conn.execute(query, {"dimensions": dimensions})


We’ll now write the Pandas DataFrame to the table:

Python
 
df.to_sql(
    "pandas_docs",
    con = db_connection,
    if_exists = "append",
    index = False,
    chunksize = 1000
)


Example output:

Plain Text
 
6


We'll now create an index to match the one we created in the previous article:

Python
 
query = text("""
ALTER TABLE pandas_docs ADD VECTOR INDEX (vector)
    INDEX_OPTIONS '{
          "metric_type": "EUCLIDEAN_DISTANCE"
     }';
""")

with db_connection.connect() as conn:
    conn.execute(query)


We’ll now ask a question, as follows:

Python
 
prompt = "What animals are llamas related to?"

response = ollama.embeddings(
    prompt = prompt,
    model = "all-minilm"
)

embedding = response["embedding"]
embedding_array = np.array(embedding).astype(np.float32)

query = text("""
SELECT content
FROM pandas_docs
ORDER BY vector <-> :embedding_array ASC
LIMIT 1;
""")

with db_connection.connect() as conn:
    results = conn.execute(query, {"embedding_array": embedding_array})
    row = results.fetchone()

data = row[0]
print(data)


We’ll convert the prompt to embeddings, ensure that the embeddings are converted to a 32-bit format, and then execute the SQL query which uses the infix notation <-> for Euclidean Distance.

Example output:

Plain Text
 
Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels


Next, we’ll use the LLM, as follows:

Python
 
output = ollama.generate(
    model = "llama2",
    prompt = f"Using this data: {data}. Respond to this prompt: {prompt}"
)

print(output["response"])


Example output:

Plain Text
 
Llamas are members of the camelid family, which means they are closely related to other animals such as:

1. Vicuñas: Vicuñas are small, wild camelids that are native to South America. They are known for their soft, woolly coats and are considered an endangered species due to habitat loss and poaching.
2. Camels: Camels are large, even-toed ungulates that are native to Africa and the Middle East. They are known for their distinctive humps on their backs, which store water and food for long periods of time.

Both llamas and vicuñas are classified as members of the family Camelidae, while camels are classified as belonging to the family Dromedaryae. Despite their differences in size and habitat, all three species share many similarities in terms of their physical characteristics and behavior.


Summary

In this article, we've replicated the steps we followed in the previous article and achieved similar results. However, we had to write a series of SQL statements and manage several steps that LangChain would have handled for us. Additionally, there may be more time and cost involved in maintaining the code base long-term compared to the LangChain solution. 

Using LangChain instead of writing custom code for database access provides several advantages, such as efficiency, scalability, and reliability. 

LangChain offers a library of prebuilt modules for database interaction, reducing development time and effort. Developers can use these modules to quickly implement various database operations without starting from scratch.

LangChain abstracts many of the complexities involved in database management, allowing developers to focus on high-level tasks rather than low-level implementation details. This improves productivity and time-to-market for database-driven applications.

LangChain has a large, active, and growing community of developers, is available on GitHub, and provides extensive documentation and examples. 

In summary, LangChain offers developers a powerful, efficient, and reliable platform for building database-driven applications, enabling them to focus on business problems using higher-level abstractions rather than reinventing the wheel with custom code. Comparing the example in this article with the example we used in the previous article, we can see the benefits.

Pandas SingleStore jupyter notebook Integration large language model

Published at DZone with permission of Akmal Chaudhri. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Simplifying Multi-LLM Integration With KubeMQ
  • A Guide to Using Amazon Bedrock Prompts for LLM Integration
  • Multimodal RAG Is Not Scary, Ghosts Are Scary
  • Getting Started With C# DataFrame and XPlot.Ploty

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!