DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Implement a Distributed Database to Your Java Application
  • Introduction to SparkSession
  • The Role of API Gateways in API Security
  • Accessing Phoenix Data Using HBase API

Trending

  • Introduction to ESP32 for Beginners Using the Xedge32 Lua IDE
  • Breaking Down Silos: The Importance of Collaboration in Solution Architecture
  • Architecting a Completely Private VPC Network and Automating the Deployment
  • Setting up Request Rate Limiting With NGINX Ingress
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Analyzing Stock Tick Data in SingleStoreDB Using LangChain and OpenAI's Whisper

Analyzing Stock Tick Data in SingleStoreDB Using LangChain and OpenAI's Whisper

Talk to SingleStoreDB about stock tick data using LangChain and OpenAI's Whisper.

Akmal Chaudhri user avatar by
Akmal Chaudhri
CORE ·
Sep. 17, 23 · Tutorial
Like (1)
Save
Tweet
Share
2.29K Views

Join the DZone community and get the full member experience.

Join For Free

Disclaimer

The stock data used in this article is entirely fictitious. It is purely for demo purposes. Please do not use this data for making any financial decisions.

This article will show how to connect a Kafka broker, streaming example stock tick data, to SingleStoreDB. We'll then query the data using English sentences through LangChain, which provides a basic question-and-answer capability for the tick data. We'll build a Python application, through several design iterations, to use OpenAI's Whisper to ask questions through speech and use speech synthesis to reply.

The notebook file, SQL, and Python code are available on GitHub.

Introduction

The ability to ask questions about a database system using natural language is familiar. However, it has become much easier to implement with modern tools like LangChain and OpenAI's Whisper. In this article, we'll see how.

Create a SingleStoreDB Cloud Account

A previous article showed the steps to create a free SingleStoreDB Cloud account. We'll use the following settings:

  • Workspace Group Name: Whisper Demo Group
  • Cloud Provider: AWS
  • Region: US East 1 (N. Virginia)
  • Workspace Name: whisper-demo
  • Size: S-00
  • Settings: None selected

Once the workspace is available, we'll make a note of our password and host. The host will be available from Whisper Demo Group > Overview > Workspaces > whisper-demo > Connect > Connect Directly > SQL IDE > Host. We'll need this information later for a Python application. We'll temporarily allow access from anywhere by configuring the firewall under Whisper Demo Group > Firewall.

Create a Database and Tables

From the left navigation pane, we'll select DEVELOP > SQL Editor to create a timeseries_db database, tick and stock_sentiment tables, as follows:

SQL
 
CREATE DATABASE IF NOT EXISTS timeseries_db;

USE timeseries_db;

CREATE TABLE IF NOT EXISTS tick (
  symbol VARCHAR(10),
  ts DATETIME SERIES TIMESTAMP,
  open NUMERIC(18, 2),
  high NUMERIC(18, 2),
  low NUMERIC(18, 2),
  price NUMERIC(18, 2),
  volume INT,
  KEY(ts)
);

CREATE TABLE IF NOT EXISTS stock_sentiment (
  headline VARCHAR(250),
  positive FLOAT,
  negative FLOAT,
  neutral FLOAT,
  url TEXT,
  publisher VARCHAR(30),
  ts DATETIME,
  symbol VARCHAR(10)
);

Create a Pipeline

Pipelines allow us to create streaming ingest feeds from various sources, such as Kafka, S3 and HDFS, using a single command. With pipelines, we can also perform ETL operations. For our use case, we'll create a simple pipeline in SingleStoreDB as follows:

SQL
 
CREATE PIPELINE tick
AS LOAD DATA KAFKA 'public-kafka.memcompute.com:9092/stockticker'
BATCH_INTERVAL 2500
INTO TABLE tick
FIELDS TERMINATED BY ','
(symbol,ts,open,high,low,price,volume);

We'll control the rate of data ingestion using the BATCH_INTERVAL. Initially, we'll set this to 2500 milliseconds.

We'll configure the pipeline to start from the earliest offset, as follows:

SQL
 
ALTER PIPELINE tick SET OFFSETS EARLIEST;

and we'll test the pipeline before we start running it, as follows:

SQL
 
TEST PIPELINE tick LIMIT 1;

The output should be similar to the following:

Plain Text
 
+--------+---------------------+-------+-------+-------+-------+--------+
| symbol | ts                  | open  | high  | low   | price | volume |
+--------+---------------------+-------+-------+-------+-------+--------+
| AIV    | 2023-09-05 06:47:53 | 44.89 | 44.89 | 44.88 | 44.89 | 719    |
+--------+---------------------+-------+-------+-------+-------+--------+

Remember, this is fictitious data.

We'll now start the pipeline:

SQL
 
START PIPELINE tick;

Load Stock Sentiment Data

We'll now load the data into the stock_sentiment table. The data are derived from a GitHub Gist and are a small subset of the data from Kaggle. The CSV file already includes the sentiment values. We'll download this file and then use a MySQL client to connect to SingleStoreDB Cloud, as follows:

Shell
 
mysql --local-infile -u admin -h <host> -P 3306 --default-auth=mysql_native_password -p

The <host> being replaced with the value we obtained from SingleStoreDB Cloud earlier. We'll load the data into SingleStoreDB from the CSV file, as follows:

SQL
 
USE timeseries_db;

LOAD DATA LOCAL INFILE '/path/to/stock_sentiment.csv'
INTO TABLE stock_sentiment
IGNORE 1 LINES
COLUMNS TERMINATED BY '\t';

The /path/to/ replaced with the actual path to where the CSV file is located.

Load the Notebook

We'll use the notebook file available on GitHub. From the left navigation pane in SingleStoreDB Cloud, we'll select Notebooks. In the top right of the web page will be New Notebook with a pulldown that has two options:

  1. New Notebook
  2. Import From File

We'll select the second option, locate the notebook file we downloaded from GitHub and load it into SingleStoreDB Cloud.

Run the Notebook

We'll start by selecting the Connection (whisper-demo) and Database (timeseries_db) using the drop-down menus above the notebook.

Analyzing Time Series Data

Part 1 of the notebook contains a set of Time Series operations on the data in the tick table. These operations were described in greater detail in a previous article.

LangChain OnlinePDFLoader

Part 2 of the notebook loads the contents of a PDF document and vector embeddings into a table called fintech_docs. These operations were described in greater detail in a previous article. Replace <PDF document URL> with the hyperlink of your chosen FinTech document:

Python
 
from langchain.document_loaders import OnlinePDFLoader

loader = OnlinePDFLoader("<PDF document URL>")

data = loader.load()

The Fully Qualified Domain Name (FQDN) where the PDF file is located must be added to the firewall by selecting the Edit Firewall option in the top right.

We'll use ChatGPT to answer questions on the PDF file, such as:

Plain Text
 
"What are the best investment opportunities in Blockchain?"

LangChain SQL Agent

Part 3 of the notebook contains LangChain agent operations on the data in the tick and stock_sentiment tables. This will be the main focus of this article.

First, we'll install some libraries:

Shell
 
!pip install langchain --quiet
!pip install openai --quiet

Now, we'll enter our OpenAI API Key:

Python
 
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

Next, we'll need some imports:

Python
 
from langchain import OpenAI, SQLDatabase
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.agents import create_sql_agent

We'll configure the LangChain toolkit and agent, as follows:

Python
 
db = SQLDatabase.from_uri(connection_url)

llm = OpenAI(temperature = 0, verbose = False)

toolkit = SQLDatabaseToolkit(db = db, llm = llm)

agent_executor = create_sql_agent(
  llm = OpenAI(temperature = 0), toolkit = toolkit, verbose = False
)

We'll now test the code using an example query, as follows:

Python
 
agent_executor.run(
"""
    From the tick table, which stock symbol saw the least volatility in share trading in the dataset?
"""
)

Here is some example output:

Plain Text
 
'The stock symbol with the least volatility in share trading in the dataset is FTR.'

We'll make this a little more interactive using the following code:

Python
 
query = input("Please enter your question:")
agent_executor.run(query)

We'll test this with an example query:

Plain Text
 
Using the symbol A, what is the most positive sentiment in the stock_sentiment table and the current best price for this symbol from the tick table?

Here is some example output:

Plain Text
 
'The most positive sentiment for symbol A is 0.9576 and the current best price is 46.43.'

To see the chain output, we'll set verbose to True, as follows:

Python
 
agent_executor = create_sql_agent(
  llm = OpenAI(temperature = 0), toolkit = toolkit, verbose = True
)

An example of chain output from another database was shown in a previous article.

Bonus: Build a Visual Python Application

We'll now build a simple Python application that uses OpenAI's Whisper to ask questions about the database system. An excellent article inspired this application.

Install the Required Software

For this article, this is the software that was required in a clean install of Ubuntu 22.04.2 running in a VMware Fusion Virtual Machine:

Shell
 
sudo apt install ffmpeg
sudo apt install libespeak1
sudo apt install portaudio19-dev
sudo apt install python3-tk
sudo apt install python3-pil python3-pil.imagetk

as well as the following packages:

Plain Text
 
langchain
matplotlib
openai
openai-whisper
pyaudio
pymysql
pyttsx3 wave

These can be found in the requirements.txt file on GitHub. Run the file as follows:

Shell
 
pip install -r requirements.txt

openai-whisper may take a while to install.

We'll need to provide an OpenAI API Key in our environment. For example:

Shell
 
export OPENAI_API_KEY="<OpenAI API Key>" 

Replace <OpenAI API Key> with your key.

In each of the following applications, we have the following code:

Python
 
s2_password = "<password>"
s2_host = "<host>"

We'll replace <password> and <host> with the values that we saved earlier from SingleStoreDB Cloud.

First Iteration

Let's start with a simple visual Python application using Tkinter. We'll also add voice recognition using OpenAI's Whisper. The application enables up to 20 seconds of recorded speech. It can be run as follows:

Shell
 
python3 record-transcribe.py

Example output is shown in Figure 1.

Figure 1. First Iteration.

Second Iteration

In the next iteration, we'll add an Audio Waveform. We'll run the program as follows:

Shell
 
python3 record-transcribe-visualise.py

Example output is shown in Figure 2.

Figure 2. Second Iteration.

Third Iteration

In the third and final iteration, we'll remove a text-based response by the application and replace it with speech synthesis. We'll run the program as follows:

Shell
 
python3 record-transcribe-visualise-speak.py

Example output is shown in Figure 3.

Figure 3. Third Iteration.

In the code for all three iterations, we can call the OpenAI Whisper API instead of using the local Whisper installation. To do this, we'd uncomment these lines of code in the transcribe_audio function:

Python
 
# transcript = openai.Audio.transcribe(
#     model = "whisper-1",
#     file = audio_file,
#     language = "en"
# )

And comment out this line of code:

Python
 
transcript = model.transcribe(filename)

However, calling the OpenAI Whisper API would incur additional costs. The local Whisper installation already provides excellent results.

Summary

In this article, we've seen that without creating vector embeddings, we've been able to access our data quite effectively using LangChain. However, our queries need to be more focused, and we need to understand the database schema before asking questions. Integrating a speech capability enables our applications to be more widely used.

API Database sql AI Natural language generation

Published at DZone with permission of Akmal Chaudhri. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Implement a Distributed Database to Your Java Application
  • Introduction to SparkSession
  • The Role of API Gateways in API Security
  • Accessing Phoenix Data Using HBase API

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: