Ollama + SingleStore - LangChain = :-(

Previously, we saw how LangChain provided an efficient and compact solution for integrating Ollama with SingleStore. But what if we were to remove LangChain?

Akmal Chaudhri

CORE ·

May. 31, 24 · Tutorial

Likes (1)

Comment

Save

2.1K Views

In a previous article, we used Ollama with LangChain and SingleStore. LangChain provided an efficient and compact solution for integrating Ollama with SingleStore. However, what if we were to remove LangChain? In this article, we’ll demonstrate an example of using Ollama with SingleStore without relying on LangChain. We’ll see that while we can achieve the same results described in the previous article, the number of code increases, requiring us to manage more of the plumbing that LangChain normally handles.

The notebook file used in this article is available on GitHub.

Introduction

From the previous article, we’ll follow the same steps to set up our test environment as described in these sections:

Introduction
- Use a Virtual Machine or venv.
Create a SingleStoreDB Cloud account
- Use Ollama Demo Group as the Workspace Group Name and ollama-demo as the Workspace Name. Make a note of the password and host name. Temporarily allow access from anywhere by configuring the firewall under Ollama Demo Group > Firewall.
Create a Database

CREATE DATABASE IF NOT EXISTS ollama_demo;

Install Jupyter
- pip install notebook
Install Ollama
- curl -fsSL https://ollama.com/install.sh | sh
Environment Variable
- export SINGLESTOREDB_URL="admin:<password>@<host>:3306/ollama_demo"Replace <password> and <host> with the values for your environment.
Launch Jupyter
- jupyter notebook

Fill out the Notebook

First, some packages:

    Shell
   
   !pip install ollama numpy pandas sqlalchemy-singlestoredb --quiet --no-warn-script-location

Next, we’ll import some libraries:

    Python
   
 

   import ollama
import os
import numpy as np
import pandas as pd
from sqlalchemy import create_engine, text
  

We’ll create embeddings using all-minilm (45 MB at the time of writing):

    Python
   
   ollama.pull("all-minilm")

Example output:

    Plain Text
   
   {'status': 'success'}

For our LLM we’ll use llama2 (3.8 GB at the time of writing):

    Python
   
   ollama.pull("llama2")

Example output:

    Plain Text
   
   {'status': 'success'}

Next, we’ll use the example text from the Ollama website:

    Python
   
 

   documents = [
    "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels",
    "Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands",
    "Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall",
    "Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight",
    "Llamas are vegetarians and have very efficient digestive systems",
    "Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old"
]

df_data = []

for doc in documents:
    response = ollama.embeddings(
        model = "all-minilm",
        prompt = doc
    )
    embedding = response["embedding"]
    embedding_array = np.array(embedding).astype(np.float32)
    df_data.append({"content": doc, "vector": embedding_array})

df = pd.DataFrame(df_data)

dimensions = len(df.at[0, "vector"])
  

We’ll set the embeddings to all-minilm and iterate through each document to build up the content for a Pandas DataFrame. Additionally, we’ll convert the embeddings to a 32-bit format as this is SingleStore’s default for the VECTOR data type. Lastly, we’ll determine the number of embedding dimensions for the first document in the Pandas DataFrame.

Next, we’ll create a connection to our SingleStore instance:

    Python
   
   connection_url = "singlestoredb://" + os.environ.get("SINGLESTOREDB_URL")

db_connection = create_engine(connection_url)

Now we’ll create a table with the vector column using the dimensions we previously determined:

    Python
   
 

   query = text("""
CREATE TABLE IF NOT EXISTS pandas_docs (
    id BIGINT AUTO_INCREMENT NOT NULL,
    content LONGTEXT,
    vector VECTOR(:dimensions) NOT NULL,
    PRIMARY KEY(id)
);
""")

with db_connection.connect() as conn:
    conn.execute(query, {"dimensions": dimensions})
  

We’ll now write the Pandas DataFrame to the table:

    Python
   
 

   df.to_sql(
    "pandas_docs",
    con = db_connection,
    if_exists = "append",
    index = False,
    chunksize = 1000
)
  

Example output:

    Plain Text
   
   6

We'll now create an index to match the one we created in the previous article:

    Python
   
 

   query = text("""
ALTER TABLE pandas_docs ADD VECTOR INDEX (vector)
    INDEX_OPTIONS '{
          "metric_type": "EUCLIDEAN_DISTANCE"
     }';
""")

with db_connection.connect() as conn:
    conn.execute(query)
  

We’ll now ask a question, as follows:

    Python
   
 

   prompt = "What animals are llamas related to?"

response = ollama.embeddings(
    prompt = prompt,
    model = "all-minilm"
)

embedding = response["embedding"]
embedding_array = np.array(embedding).astype(np.float32)

query = text("""
SELECT content
FROM pandas_docs
ORDER BY vector <-> :embedding_array ASC
LIMIT 1;
""")

with db_connection.connect() as conn:
    results = conn.execute(query, {"embedding_array": embedding_array})
    row = results.fetchone()

data = row[0]
print(data)
  

We’ll convert the prompt to embeddings, ensure that the embeddings are converted to a 32-bit format, and then execute the SQL query which uses the infix notation <-> for Euclidean Distance.

Example output:

    Plain Text
   
   Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels

Next, we’ll use the LLM, as follows:

    Python
   
   output = ollama.generate(
    model = "llama2",
    prompt = f"Using this data: {data}. Respond to this prompt: {prompt}"
)

print(output["response"])

Example output:

    Plain Text
   
   Llamas are members of the camelid family, which means they are closely related to other animals such as:

1. Vicuñas: Vicuñas are small, wild camelids that are native to South America. They are known for their soft, woolly coats and are considered an endangered species due to habitat loss and poaching.
2. Camels: Camels are large, even-toed ungulates that are native to Africa and the Middle East. They are known for their distinctive humps on their backs, which store water and food for long periods of time.

Both llamas and vicuñas are classified as members of the family Camelidae, while camels are classified as belonging to the family Dromedaryae. Despite their differences in size and habitat, all three species share many similarities in terms of their physical characteristics and behavior.

Summary

In this article, we've replicated the steps we followed in the previous article and achieved similar results. However, we had to write a series of SQL statements and manage several steps that LangChain would have handled for us. Additionally, there may be more time and cost involved in maintaining the code base long-term compared to the LangChain solution.

Using LangChain instead of writing custom code for database access provides several advantages, such as efficiency, scalability, and reliability.

LangChain offers a library of prebuilt modules for database interaction, reducing development time and effort. Developers can use these modules to quickly implement various database operations without starting from scratch.

LangChain abstracts many of the complexities involved in database management, allowing developers to focus on high-level tasks rather than low-level implementation details. This improves productivity and time-to-market for database-driven applications.

LangChain has a large, active, and growing community of developers, is available on GitHub, and provides extensive documentation and examples.

In summary, LangChain offers developers a powerful, efficient, and reliable platform for building database-driven applications, enabling them to focus on business problems using higher-level abstractions rather than reinventing the wheel with custom code. Comparing the example in this article with the example we used in the previous article, we can see the benefits.

Pandas SingleStore jupyter notebook Integration large language model

Published at DZone with permission of Akmal Chaudhri. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending

Ollama + SingleStore - LangChain = :-(

Previously, we saw how LangChain provided an efficient and compact solution for integrating Ollama with SingleStore. But what if we were to remove LangChain?

Introduction

Fill out the Notebook

Summary

Related

Partner Resources