DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • SQL Commands: A Brief Guide
  • Non-blocking Database Migrations
  • Sample Data Generation With Built-In Database Capabilities
  • Java EE 6 Pet Catalog with GlassFish and MySQL

Trending

  • After 9 Years, Microsoft Fulfills This Windows Feature Request
  • Introduction to Retrieval Augmented Generation (RAG)
  • Navigating the LLM Landscape: A Comparative Analysis of Leading Large Language Models
  • Code Reviews: Building an AI-Powered GitHub Integration
  1. DZone
  2. Data Engineering
  3. Databases
  4. OpenAI vs Ollama Using LangChain's SQLDatabaseToolkit

OpenAI vs Ollama Using LangChain's SQLDatabaseToolkit

Learn about LangChain's SQLDatabaseToolkit for NL-to-SQL queries and compare OpenAI and Ollama results, highlighting setup, examples, and tool performance.

By 
Akmal Chaudhri user avatar
Akmal Chaudhri
DZone Core CORE ·
Mar. 19, 25 · Analysis
Likes (4)
Comment
Save
Tweet
Share
4.6K Views

Join the DZone community and get the full member experience.

Join For Free

Disclaimer

The stock data used in this article is entirely fictitious. It is purely for demo purposes. Please do not use this data for making any financial decisions.

In a previous article, we saw the benefits of using Ollama locally for a RAG application. In this article, we'll extend our evaluation of Ollama by testing natural language (NL) queries against a database system, using LangChain's SQLDatabaseToolkit. SQL will serve as the baseline system for comparison as we explore the quality of results provided by OpenAI and Ollama.

The notebook files used in this article are available on GitHub.

Introduction

LangChain's SQLDatabaseToolkit is a powerful tool designed to integrate NL processing capabilities with relational database systems. It enables users to query databases using NL inputs, using the capabilities of large language models (LLMs) to generate SQL queries dynamically. This makes it especially useful for applications where non-technical users or automated systems need to interact with structured data. 

A number of LLMs are well supported by LangChain. LangChain also provides support for Ollama. In this article, we'll evaluate how well LangChain integrates with Ollama and the feasibility of using the SQLDatabaseToolkit in a local setup.

Create a SingleStore Cloud Account

A previous article showed the steps to create a free SingleStore Cloud account. We'll use the Free Shared Tier.

Selecting the Starter Workspace > Connect > CLI Client will give us the details we need later, such as username, password, host, port and database.

Create Database Tables

For our test environment, we'll use SingleStore running in the Cloud as our target database system, and we'll connect securely to this environment using Jupyter notebooks running in a local system.

From the left navigation pane in the SingleStore cloud portal, we'll select DEVELOP > Data Studio > Open SQL Editor. We'll create three tables, as follows:

SQL
 
CREATE TABLE IF NOT EXISTS tick (
    symbol VARCHAR(10),
    ts     DATETIME SERIES TIMESTAMP,
    open   NUMERIC(18, 2),
    high   NUMERIC(18, 2),
    low    NUMERIC(18, 2),
    price  NUMERIC(18, 2),
    volume INT,
    KEY(ts)
);

CREATE TABLE IF NOT EXISTS portfolio (
    symbol         VARCHAR(10),
    shares_held    INT,
    purchase_date  DATE,
    purchase_price NUMERIC(18, 2)
);

CREATE TABLE IF NOT EXISTS stock_sentiment (
    headline  VARCHAR(250),
    positive  FLOAT,
    negative  FLOAT,
    neutral   FLOAT,
    url       TEXT,
    publisher VARCHAR(30),
    ts        DATETIME,
    symbol    VARCHAR(10)
);


We'll load the portfolio table with the following fictitious data:

SQL
 
INSERT INTO portfolio (symbol, shares_held, purchase_date, purchase_price) VALUES
('AAPL', 100, '2022-01-15',  150.25),
('MSFT',  50, '2021-12-10',  305.50),
('GOOGL', 25, '2021-11-05', 2800.75),
('AMZN',  10, '2020-07-20', 3200.00),
('TSLA',  40, '2022-02-18',  900.60),
('NFLX',  15, '2021-09-01',  550.00);


For the stock_sentiment table, we'll download the stock_sentiment.sql.zip file and unpack it. We'll load the data into the table using a MySQL client, as follows:

Shell
 
mysql -u "<username>" -p"<password>" -h "<host>" -P <port> -D <database> < stock_sentiment.sql


We'll use the values for <username>, <password>, <host>, <port> and <database> that we saved earlier.

Finally, for the tick table, we'll create a pipeline:

SQL
 
CREATE PIPELINE tick
AS LOAD DATA KAFKA 'public-kafka.memcompute.com:9092/stockticker'
BATCH_INTERVAL 45000
INTO TABLE tick
FIELDS TERMINATED BY ','
(symbol,ts,open,high,low,price,volume);


We'll adjust to get the earliest data:

SQL
 
ALTER PIPELINE tick SET OFFSETS EARLIEST;


And test the pipeline:

SQL
 
TEST PIPELINE tick LIMIT 1;


Example output:

Plain Text
 
+--------+---------------------+--------+--------+--------+--------+--------+
| symbol | ts                  | open   | high   | low    | price  | volume |
+--------+---------------------+--------+--------+--------+--------+--------+
| MMM    | 2025-01-23 21:40:32 | 178.34 | 178.43 | 178.17 | 178.24 |  38299 |
+--------+---------------------+--------+--------+--------+--------+--------+


And then we'll start the pipeline:

SQL
 
START PIPELINE tick;


After a few minutes, we'll check the quantity of data loaded so far:

SQL
 
SELECT COUNT(*)
FROM tick;


Local Test Environment

From a previous article, we'll follow the same steps to set up our local test environment as described in these sections:

  • Introduction. Use a Virtual Machine or venv.
  • Create a SingleStore Cloud account. This step was completed above.
  • Create a database. The Free Shared Tier already provides a database and we just need to note down the database name.
  • Install Jupyter.
    Plain Text
     
    pip install notebook
  • Install Ollama.
    Plain Text
     
    curl -fsSL https://ollama.com/install.sh | sh
  • Environment variables.
    Plain Text
     
    export SINGLESTOREDB_URL="<username>:<password>@<host>:<port>/<database>"

    Replace <username>, <password>, <host>, <port> and <database> with the values for your environment.

    Plain Text
     
    export OPENAI_API_KEY="<OpenAI API Key>"

    Replace <OpenAI API Key> with your key.

  • Launch Jupyter.
    Plain Text
     
    jupyter notebook

We'll use the Jupyter notebooks from GitHub. These notebooks are configured to use OpenAI and Ollama. For Ollama, we'll use one of the LLMs listed with Tools support. We'll test the following four queries.

First Query

SQL

SQL
 
SELECT symbol, (MAX(high) - MIN(low)) AS volatility
FROM tick
GROUP BY symbol
ORDER BY volatility ASC
LIMIT 1;


Natural Language

Plain Text
 
"For each stock symbol, calculate the volatility as the difference\n"
"between the highest recorded price and the lowest recorded price over time.\n"
"Which stock symbol has the least volatility?"


Results

SQL

Plain Text
 
+--------+------------+
| symbol | volatility |
+--------+------------+
| FTR    |       0.55 |
+--------+------------+


OpenAI

Plain Text
 
The stock symbol with the least volatility is FTR, with a volatility of 0.55.


Ollama

Plain Text
 
To find the stock with the highest price, we need to compare the prices of all the given stocks and find the maximum value. However, I don't have real-time access to financial data or the ability to execute code that interacts with external APIs. Therefore, I can't directly calculate the highest price from this list.

However, if you provide me with a specific date or time range, I can help you find the stock with the highest price during that period. For example, if you want to know the stock with the highest price on a particular day in the past, I can assist you with that.

If you're looking for the current highest-priced stock, I recommend checking a financial news website or an API that provides real-time stock data, such as Yahoo Finance, Google Finance, or a service like Alpha Vantage or Finnhub.io. These platforms can give you the most up-to-date information on stock prices.`
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE


Second Query

SQL

SQL
 
SELECT COUNT(*)
FROM tick;


Natural Language

Plain Text
 
"How many rows are in the tick table?"


Results

SQL

Plain Text
 
+----------+
| COUNT(*) |
+----------+
| 22367162 |
+----------+


OpenAI

Plain Text
 
There are 22,367,162 rows in the tick table.


Ollama

Plain Text
 
The "tick" table has 3 rows.


Third Query

SQL

SQL
 
-- Step 1: Get the latest price for each symbol
WITH latest_prices AS (
    SELECT symbol, price
    FROM tick t1
    WHERE ts = (
        SELECT MAX(ts)
        FROM tick t2
        WHERE t2.symbol = t1.symbol
    )
)

-- Step 2: Calculate the total portfolio value
SELECT SUM(p.shares_held * lp.price) AS total_portfolio_value
FROM portfolio p, latest_prices lp
WHERE p.symbol = lp.symbol;


Natural Language

Plain Text
 
"Taking all the stock symbols from the portfolio table,\n"
"and using the latest value for each stock symbol from the tick table,\n"
"calculate the grand total value of all the shares listed in the portfolio table."


Results

SQL

Plain Text
 
+-----------------------+
| total_portfolio_value |
+-----------------------+
|              44540.60 |
+-----------------------+


OpenAI

Plain Text
 
$44,540.60


Ollama

Plain Text
 
I don't know. The provided SQL query does not contain any of the common mistakes listed. It uses the correct join conditions, functions, and data types. The query also properly quotes identifiers and uses the correct number of arguments for functions. Therefore, no changes are needed. However, without access to the actual database schema and table data, I cannot provide a specific answer to the question.


Fourth Query

SQL

SQL
 
SELECT
    (
        SELECT ts
        FROM stock_sentiment
        WHERE symbol = 'AAPL'
        ORDER BY positive DESC
        LIMIT 1
    ) AS sentiment_date,
    (
        SELECT positive
        FROM stock_sentiment
        WHERE symbol = 'AAPL'
        ORDER BY positive DESC
        LIMIT 1
    ) AS most_positive_sentiment,
    (
        SELECT price
        FROM tick
        WHERE symbol = 'AAPL'
        ORDER BY ts DESC
        LIMIT 1
    ) AS current_best_price;


Natural Language

Plain Text
 
"Using the symbol AAPL, show me the date and the most\n"
"positive sentiment in the stock sentiment table and the\n"
"current best price for this symbol from the tick table."


Results

SQL

Plain Text
 
+---------------------+-------------------------+--------------------+
| sentiment_date      | most_positive_sentiment | current_best_price |
+---------------------+-------------------------+--------------------+
| 2020-05-28 00:00:00 |                0.331509 |             116.27 |
+---------------------+-------------------------+--------------------+


OpenAI

Plain Text
 
On 2020-05-28, the most positive sentiment for AAPL was 0.331509, and the current best price for AAPL is 116.27.


Ollama

Plain Text
 
The provided list contains decimal numbers, which appear to be the results of some kind of experiment or analysis. Without additional context, it's difficult to determine the exact nature of these results. However, we can observe that the majority of the numbers are between 116.85 and 117.27, with a few outliers at 115.99 and 117.30. The smallest number in the list is 115.99, and the largest is 117.30.`
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE


Summary

Analyzing the results, we see that SQL and OpenAI produce consistent outputs across all four queries. However, Ollama presents clear issues. A discussion thread on GitHub highlights that while an LLM model should support tool calling, this functionality is not natively available in Ollama.

If you are able to get this LangChain functionality working with Ollama in one of the supported LLMs, please send me a message, and I'll update the article and acknowledge your help.

MySQL Plain text Relational database sql Data Types

Published at DZone with permission of Akmal Chaudhri. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • SQL Commands: A Brief Guide
  • Non-blocking Database Migrations
  • Sample Data Generation With Built-In Database Capabilities
  • Java EE 6 Pet Catalog with GlassFish and MySQL

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!