DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • DuckDB for Python Developers
  • RAG Done Right: When to Use SQL, Search, and Vector Retrieval and How To Combine Them
  • Cutting P99 Latency From ~3.2s To ~650ms in a Policy‑Driven Authorization API (Python + MongoDB)
  • An AI-Driven Architecture for Autonomous Network Operations (NetOps)

Trending

  • Why Your Test Automation Is Always Behind the Code And the Architecture That Fixes It
  • 5 Failure Patterns That Break AI Chatbots in Production
  • Skills, Java 17, and Theme Accents
  • The Big Data Architecture Blueprint: Core Storage, Integration, and Governance Patterns
  1. DZone
  2. Data Engineering
  3. Databases
  4. SingleStore Kai Support for MongoDB $vectorSearch

SingleStore Kai Support for MongoDB $vectorSearch

SingleStore is a high-performance distributed SQL database. SingleStore Kai extends its compatibility to MongoDB's vector search capabilities. Learn more here.

By 
Akmal Chaudhri user avatar
Akmal Chaudhri
DZone Core CORE ·
Jul. 31, 24 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
2.7K Views

Join the DZone community and get the full member experience.

Join For Free

SingleStore is a high-performance distributed SQL database and through SingleStore Kai extends its compatibility to MongoDB's vector search capabilities. By supporting MongoDB's $vectorSearch operator, SingleStore Kai provides efficient querying and indexing of vector data, such as embeddings used in ML and AI applications. Users can benefit from SingleStore's optimized data handling and Kai's vector search capabilities to perform similarity searches across large datasets. In this article, we'll see an example of how to use SingleStore Kai with vector data.

The notebook file used in this article is available on GitHub.

Introduction

In a previous article, we discussed SingleStore Kai's support for the $euclideanDistance extension. Support is now available for $vectorSearch from MongoDB.

Create a SingleStoreDB Cloud Account

A previous article showed the steps to create a free SingleStoreDB Cloud account. We'll use the following settings:

  • Workspace Group Name: Iris Demo Group
  • Cloud Provider: AWS
  • Region: US East 1 (N. Virginia)
  • Workspace Name: iris-demo
  • Size: S-00
  • Settings: SingleStore Kai selected

From Deployments > Firewall, we'll temporarily allow access from anywhere.

Import the Notebook

We'll download the notebook from GitHub (linked in the opening section).

From the left navigation pane in the SingleStore cloud portal, we'll select DEVELOP > Data Studio.

In the top right of the web page, we'll select New Notebook > Import From File.

We'll use the wizard to locate and import the notebook we downloaded from GitHub.

Run the Notebook

After checking that we are connected to our SingleStore workspace, we'll run the cells one by one.

In the database, we'll store the Iris flower data set. We'll first download the Iris CSV file into a Pandas Dataframe and then convert it into two columns, as follows:

Python
 
pandas_df["vector"] = pandas_df.apply(
    lambda row: [
        row["sepal_length"],
        row["sepal_width"],
        row["petal_length"],
        row["petal_width"]
    ], axis = 1
)

new_df = pandas_df[["vector", "species"]]

new_df.head()


Example output:

Plain Text
 
                 vector      species
0  [5.1, 3.5, 1.4, 0.2]  Iris-setosa
1  [4.9, 3.0, 1.4, 0.2]  Iris-setosa
2  [4.7, 3.2, 1.3, 0.2]  Iris-setosa
3  [4.6, 3.1, 1.5, 0.2]  Iris-setosa
4  [5.0, 3.6, 1.4, 0.2]  Iris-setosa


Next, we'll transform the data into a dictionary:

Python
 
records = new_df.to_dict(orient = "records")


and get the number of dimensions for the vector column:

Python
 
dimensions = len(new_df.at[0, "vector"])


We'll now create a SingleStore database:

SQL
 
DROP DATABASE IF EXISTS iris_db;
CREATE DATABASE IF NOT EXISTS iris_db;


This allows us to use the connection_url_kai, as follows:

Python
 
client = pymongo.MongoClient(connection_url_kai)
db = client["iris_db"]
collection = db["iris"]


This avoids the need to provide a long connection string.

We'll now create a collection:

Python
 
db.create_collection("iris",
    columns = [{
        "id": "vector", "type": f"VECTOR({dimensions}) NOT NULL"
    }],
);


This uses the SingleStore VECTOR type with the number of dimensions previously determined.

Next, we'll insert the data:

Python
 
result = collection.insert_many(records)


And we'll retrieve a few rows to confirm the data have been stored:

Python
 
cursor = collection.find(projection = {"_id": 0}).limit(5)

table = []

for document in cursor:
    species = document["species"]
    vector = [round(value, 2) for value in document["vector"]]
    table.append([vector, species])

print(tabulate(table, headers = ["vector", "species"]))


Example output:

Python
 
vector                species
--------------------  ---------------
[6.4, 3.2, 4.5, 1.5]  Iris-versicolor
[6.0, 3.0, 4.8, 1.8]  Iris-virginica
[6.7, 3.1, 5.6, 2.4]  Iris-virginica
[7.2, 3.2, 6.0, 1.8]  Iris-virginica
[4.6, 3.4, 1.4, 0.3]  Iris-setosa


Next, we'll create a vector index:

Python
 
db.command({
    "createIndexes": "iris",
    "indexes": [{
        "key": {"vector": "vector"},
        "name": "vector_index",
        "kaiIndexOptions": {
            "index_type": "AUTO",
            "metric_type": "EUCLIDEAN_DISTANCE",
            "dimensions": dimensions
        }
    }],
});


AUTO uses IVF_PQFS. Other indexing options are also available.

Finally, let's use some fictitious data values to make a prediction:

Python
 
query_vector = [5.2, 3.6, 1.5, 0.3]


And query the data to find the closest matches:

Python
 
pipeline = [
    {
        "$vectorSearch": {
            "index": "vector_index",
            "path": "vector",
            "queryVector": query_vector,
            "limit": 5
        }
    }, {
        "$project": {
            "_id": 0,
            "species": 1,
            "score": {
                "$meta": "vectorSearchScore"
            }
        }
    }
]

cursor = collection.aggregate(pipeline)

table = []

for document in cursor:
    species = document["species"]
    score = document["score"]
    table.append([score, species])

print(tabulate(table, headers = ["score", "species"]))


Example output:

Plain Text
 
   score  species
--------  -----------
0.141421  Iris-setosa
0.173205  Iris-setosa
0.173205  Iris-setosa
0.173205  Iris-setosa
0.2       Iris-setosa


Summary

In this article, we've seen how to use $vectorSearch with SingleStore Kai. Comparing the results we obtained running similar queries in a previous article, we can see that the results are the same. For larger datasets, the results may differ due to the vector indexing.

Data structure MongoDB Python (language) sql SingleStore

Published at DZone with permission of Akmal Chaudhri. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • DuckDB for Python Developers
  • RAG Done Right: When to Use SQL, Search, and Vector Retrieval and How To Combine Them
  • Cutting P99 Latency From ~3.2s To ~650ms in a Policy‑Driven Authorization API (Python + MongoDB)
  • An AI-Driven Architecture for Autonomous Network Operations (NetOps)

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook