DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Recover Distributed Transactions in MySQL
  • Enhancing Database Efficiency With MySQL Views: A Comprehensive Guide and Examples
  • Seamless Transition: Strategies for Migrating From MySQL to SQL Server With Minimal Downtime
  • How To Convert MySQL Database to SQL Server

Trending

  • How to Build Scalable Mobile Apps With React Native: A Step-by-Step Guide
  • Artificial Intelligence, Real Consequences: Balancing Good vs Evil AI [Infographic]
  • Kubeflow: Driving Scalable and Intelligent Machine Learning Systems
  • Measuring the Impact of AI on Software Engineering Productivity
  1. DZone
  2. Data Engineering
  3. Databases
  4. Filtered Vector Search in MyScale

Filtered Vector Search in MyScale

MyScale allows users to handle filtered vector search with both complex conditions and data types, just using the common `WHERE` clause in SQL.

By 
Fangrui Liu user avatar
Fangrui Liu
·
Dec. 07, 23 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
1.7K Views

Join the DZone community and get the full member experience.

Join For Free

Vector databases offer lightning-fast retrieval on similar objects stored in between billions of records. However, you may also be interested in searching for related objects that match a specific set of conditions, known as filtered vector search. With help from MyScale(opens new window), you can boost your filtered vector searches to a new level.

Most vector indexes or vector stores work as dedicated index services. They support a partial filtered vector search implementation of MongoDB query and projection operators (opens a new window where you can input a dictionary of conditions.

Supported data types and comparators differ between implementations, but most interfaces only support strings, integers on equals, and basic value comparisons. Unlike databases, these vector indexes are not designed to handle complex data types and conditions. As a result, you need an external database solution to store this data, but you cannot use this data to perform filtered vector searches. This solution works, but it is complicated and has limitations.

Actually, there can and should be a better solution. Vector search can be integrated with a database to make it more robust than it is now. MyScale can simultaneously handle filtered vector search with complex conditions and data types using the standard WHERE clause.

Pre-Filtering and Post-Filtering

Filtered vector search implementations can be categorized into two types:

  1. Pre-Filtered Vector Search
  2. Post-Filtered Vector Search

For instance:

Imagine you have a table containing the chat history for users Jack, Jan, and John, and you would like to use a filtered vector search query to retrieve Jack’s chat history similar to the given query vector.

Note: Each record has a user mark and feature vector—for simplicity, we turn vectors into numbers.

The following image describes both a NoSQL and SQL query retrieving Jack’s chat history:

 NoSQL and SQL query retrieving Jack’s chat history

Both of those queries contain a filter on the user Jack. However, this filter can be structured differently, depending on the implementation.

  1. Pre-filtered vector search: For the pre-filtered vector search, the engine will first scan the data and only retain records that match the given filter condition. Once this scan is complete, the engine will perform the vector search on the pre-filtered candidates.
  2. Post-filtered vector search: On the other hand, the post-filtered vector search will first perform the vector search and then filter these results based on the given filter condition.

vector search

Between these two methods, pre-filtering is better than post-filtering in accuracy and meets what we expect from a filtered vector search. Most vector databases support pre-filtering with vector search. However, this pre-filtering doesn't come for free. However, pre-filtering isn’t free, increasing computation and dragging the filtered vector search’s performance. Most implementations suffer from either performance or filter limitations, such as data storage and supported comparator limits.

MyScale uses a column-based storage engine (opens new window)adapted from the ClickHouse MergeTree engine(opens fresh wind, which is super-fast on conventional filters, significantly boosting first-stage filtering and making the filtered vector search faster than other implementations. Furthermore, you can use a simple SQLWHERE clause to define filters on any column in your table.

What You Can Do With the Where Clause in MyScale

As MyScale is developed on top of Clickhouse(opens a new window), it offers precisely the same functionality as ClickHouse.

For instance:

Method Others MyScale
eq / neq ✅ ✅
ge / gt / lt / le ✅ ✅
include/exclude ✅ ✅
with string pattern match LIKE ❌ ✅
Timestamps/Geo-Data/JSON ❌ ✅
with function ❌ ✅
with arrayFunction (opens new window) ❌ ✅
with subqueries ❌ ✅

Let’s look at several examples highlighting what MyScale’s WHERE clause is capable of. You can find the code for these examples in our Colab or GitHub spaces: 

Note: Refer to the Clickhouse’s Official Documentation (opens new window)for more information on data types and functions.

Common Value Comparison: =, !=, >, <, >=, <=

Most vector index solutions support these operations on strings or numbers. In MyScale, you can write value comparisons with:

SQL
 
WHERE column = value


Where the column can be any column name in the table, and the operation can be any of =, !=, >, <, >=, <=.

Note: The column type and value must be the same.

If you have multiple conditions to add to the WHERE clause, use logical operators like AND to connect them:

SQL
 
WHERE column_1 = value_1 AND column_2 >= value_2


Common Set Operators: Include, Exclude

MyScale also supports set operations like IN and NOT IN:

SQL
 
WHERE column IN (value_1, value_2, ...)


This is useful when you want to select a set of rows. Similarly, you can use logical operators to connect these set operators with other conditions.

Operators for Arrays

You can check if an element is in an array with the has function:

SQL
 
WHERE has(column, value_1)


String Pattern Matching

You can match string patterns in MyScale with the keyword LIKE:

SQL
 
WHERE column_1 LIKE '%value%'


This condition matches values that contain a value in column_1. This string pattern matching operator is one of many operators offered by MySQL. Others include: NOT LIKE, match with regular expressions and ngramSearch.

Note: See ClickHouse's official documentation (opens new window)for more information on the LIKE operator.

Date-Time Comparison

MyScale also includes a date-time comparison function:

SQL
 
WHERE dateDiff('hour', column_datetime, toDateTime('2018-01-02 23:00:00')) >= 25;


This WHERE clause refers to any rows whose column_datetime is later than the given date time for more than 25 hours. This function also supports seconds, minutes, days and months.

Note: See here (opens new window)for more information.

Geo-Data Comparison

MyScale can handle the H3 Index (opens new window)and S2 Geometry(opens new window, powerful tools for route planning and geometry analysis.

For example, with the H3 Index, you can use the area of a hexagon to filter out geographical data in a given area:

SQL
 
WHERE h3CellAreaM2(column_h3) > 1000


You can also add the distance to a specific H3 Index:

SQL
 
WHERE h3Distance(column_h3, value_h3) > 10


Arbitrary Object With JSON Columns

MyScale allows you to store JSON as an object and filter on its attributes.

You can use the JSON data type to import a JSON string into a table and use the WHERE clause below to filter out results:

SQL
 
WHERE column_json.attr_1 = value_1


You can also filter on nested attributes as follows:

SQL
 
WHERE column_json.attr_1.attr_2 = value_1


Though this is an experimental feature(opens a new window, it is powerful to use. We have used these objects in our LangChain (opens new window)and LlamaIndex (opens new window)vector store implementations.

Value Functions

MyScale includes lots of column data processing functions that you can utilize in WHERE clauses, such as:

SQL
 
WHERE abs(column_1) > 5


You can include multiple columns in your WHERE clause:

SQL
 
WHERE column_1 + column_2 + column_3 > 10


Array Functions

Array functions are really powerful, especially with our vector search. In our documentation(opens a new window, we introduced array functions in MyScale for the final logit computations and gradient computations for our few-shot classifier.

Clickhouse has great documentation on array functions(opens a new window.

Note: If you still need help with array functions in MyScale, please join our discord (opens a new window)and ask.

Subquery Support

Subqueries are queries within queries. You can also write a WHERE clause with another SELECT query as follows:

SQL
 
WHERE column_1 IN (SELECT ... FROM another_table WHERE ...)


Filtered Vector Search Performance

We investigated filtered vector search performance in vector-db-benchmark(opens new window). We used laion-768-5m-ip-probability where a random float as its filter mark during the query. We also tested popular vector database solutions against MyScale. As the following chart describes, MyScale surpasses most other vector database solutions in that we provide better accuracy with higher throughput.

Furthermore, MyScale achieves the best cost-efficiency of all vector databases tested when accuracy >= 90%. When compared to other SQL-integrated vector databases — such as pgvector and pgvector.rs — MyScale stands out as the only SQL and vector-integrated database that achieves production-ready accuracy and throughput for filtered search.

Note: See the following blog comparing pgvector and MyScale (opens a new window)for more information.

In summary, MyScale offers better accuracy with higher throughput at a lower cost. We also support 5 million vectors with more data types and functions within our s1 pod in the product line, which is free for all registered users.

Conclusion

Filtered search is a common type of query in vector databases that allows you to search for similar vectors or data points based on specific criteria or filters, especially when dealing with data that can be represented as vectors, such as text and image embeddings or other structured data.

MyScale embeds SQL power into AI technology; filtered search is a case in point, enabling more sophisticated and flexible querying capabilities for vector databases. By combining AI and SQL, you can perform complex data operations and searches, making extracting valuable insights, discovering patterns, and performing different analytical tasks easier.

Data structure Database MySQL sql

Published at DZone with permission of Fangrui Liu. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Recover Distributed Transactions in MySQL
  • Enhancing Database Efficiency With MySQL Views: A Comprehensive Guide and Examples
  • Seamless Transition: Strategies for Migrating From MySQL to SQL Server With Minimal Downtime
  • How To Convert MySQL Database to SQL Server

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!