DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > How Bitset Enables the Versatility of Vector Search

How Bitset Enables the Versatility of Vector Search

This post clarifies the concept of bitset in Milvus and how it works to support delete operations, Time Travel, and attribute filtering with three examples.

Jun Gu user avatar by
Jun Gu
CORE ·
Yudong Cai user avatar by
Yudong Cai
·
Apr. 07, 22 · Big Data Zone · Analysis
Like (2)
Save
Tweet
3.30K Views

Join the DZone community and get the full member experience.

Join For Free

Cover image.

Various new essential features of a vector database are provided together with the release of Milvus 2.0. Among the new features, Time Travel, attribute filtering, and delete operations are correlated as these three features are achieved by one common mechanism: bitset.

Therefore, this article aims to clarify the concept of bitset in Milvus and explain how it works to support delete operations, Time Travel, and attribute filtering with three examples.

What Is Bitset?

A bitset is an array of bit numbers ("0" and "1") that can be used to represent certain data information. With bitsets, you can store certain types of data compactly and efficiently as opposed to storing them in Ints, floats, or chars. Bitsets work on boolean logic, according to which the value of output is either valid or invalid, usually denoted by "1" and "0" respectively. "1" stands for valid, and "0" for invalid. Since bitsets are highly efficient and can save storage, they can also be used to achieve many features such as attribute filtering, delete operations, Time Travel, and more.

Starting from version 0.7.0, the concept of bitset has been introduced in Milvus to enable the delete function. More specifically, the bitset is used to mark if each row in the segment is deleted. Deleted entities are marked with "1" in the corresponding bitset, and as a result, the deleted entities will not be computed during a search or query.

In the Milvus 2.0 version, the application of bitset is extended to enable more features, like attribute filtering and Time Travel. The general principle in a bitset remains the same. That is, if an entity is marked with "1" in the corresponding bitset, the entity will be ignored during a search or query. Bitsets are used to enable 3 features in Milvus:

  • Attribute filtering
  • Data deletion
  • Query with Time Travel

How Does Bitset Work in Milvus?

The examples below are used to illustrate how bitset works in Milvus.

Prerequisites

Suppose there is a segment with eight entities and a series of data manipulation language (DML) events that happen in the order shown in the figure below.

  • Four of the entities, whose primary_keys are [1, 2, 3, 4] respectively, are inserted when the timestamp ts equals 100.
  • The remaining four entities, whose primary_keys are [5, 6, 7, 8], are inserted when the timestamp ts equals 200.
  • Entities whose primary_keys are [7, 8] are deleted when the timestamp ts equals 300.
  • Only entities, whose primary_keys are [1, 3, 5, 7], satisfy the conditions of attribute filtering.

Order of DML events

Case One

Suppose the value a user sets for time_travel is 150. In other words, the user conducts a query on the data stored in Milvus when ts = 150. The bitset generation process is illustrated in Figure 1 below.

During the initial filtering stage, the result of the filter_bitset should be [1, 0, 1, 0, 1, 0, 1, 0] as entities [1, 3, 5, 7] are valid filtering results and marked as "1" in the bitset. However, entities [4, 5, 6, 7] were not even inserted into the vector database when ts equals 150. Therefore, these four entities should be marked as "0" regardless of the filtering condition. Now the bitset result should be [1, 0, 1, 0, 0, 0, 0, 0]. Since in Milvus, the general principle of bitset computing is that entities marked with "1" in the bitset are ignored during a search or query, the bitset result after Time Travel and attribute filtering needs to be flipped in order to be combined with the deletion bitmap. The flipped result of filter_bitset should be [0, 1, 0, 1, 1, 1, 1, 1].

As for the deletion bitset del_bitset, the initial value should be [0, 0, 0, 0, 0, 0, 1, 1]. However, entities 7 and 8 are not deleted until ts is 300. Therefore, when ts is 150, entities 7 and 8 are still valid. As a result, the del_bitset value after Time Travel should be [0, 0, 0, 0, 0, 0, 0, 0].

Now we have two bitsets after Time Travel and attribute filtering: filter_bitset [0, 1, 0, 1, 1, 1, 1, 1] and del_bitset [0, 0, 0, 0, 0, 0, 0, 0]. Combine these two bitsets with the "OR" binary logic operator. The ultimate value of result_bitset is [0, 1, 0, 1, 1, 1, 1, 1]. That is to say, only entities 1 and 3 will be computed in the following search or query stage.

Figure 1. Search with Time Travel = 150.

Case Two

Suppose the value the user sets for time_travel is 250. In other words, the user conducts a query on the data stored in Milvus when ts = 250. The bitset generation process is illustrated in Figure 2 below.

Like in case one, the resultant filter_bitset of the initial attribute filtering stage should be [1, 0, 1, 0, 1, 0, 1, 0].

All entities [1, 2, 3, 4, 5, 6, 7, 8] are inserted to the vector database when ts= 250. Therefore, the previous result of filter_bitset remains the same. Again, we need to flip the result of the filter_bitset, and we will get [0, 1, 0, 1, 0, 1, 0, 1].

As for the deletion bitset del_bitset, the initial value should be [0, 0, 0, 0, 0, 0, 1, 1]. However, entities 7 and 8 were not deleted until ts is 300. Therefore, when ts is 250, entities 7 and 8 are still valid. As a result, the del_bitset value after Time Travel should be [0, 0, 0, 0, 0, 0, 0, 0].

Now we have two bitsets after Time Travel and attribute filtering: filter_bitset [0, 1, 0, 1, 0, 1, 0, 1] and del_bitset [0, 0, 0, 0, 0, 0, 0, 0]. Combine these two bitsets with the "OR" binary logic operator. The ultimate value of result_bitset is [0, 1, 0, 1, 0, 1, 0, 1]. That is to say, only entities [1, 3, 5, 7] will be computed in the following search or query stage.

Figure 2. Search with Time Travel = 250.

Case Three

Suppose the value the user sets for time_travel is 350. In other words, the user conducts a query on the data stored in Milvus when ts = 350. The bitset generation process is illustrated in Figure 3 below.

We see the same as in cases one and two: the resultant filter_bitset of the initial attribute filtering stage is [0, 1, 0, 1, 0, 1, 0, 1]. All entities [1, 2, 3, 4, 5, 6, 7, 8] are inserted into the vector database when ts= 350. Therefore, the final flipped result of the filter_bitset is [0, 1, 0, 1, 0, 1, 0, 1], the same as in case two.

As for the deletion bitset del_bitset, since entities 7 and 8 are already deleted when ts=350, therefore, the result of del_bitset should be [0, 0, 0, 0, 0, 0, 1, 1].

Now we have two bitsets after Time Travel and attribute filtering: filter_bitset [0, 1, 0, 1, 0, 1, 0, 1] and del_bitset [0, 0, 0, 0, 0, 0, 1, 1]. Combine these two bitsets with the "OR" binary logic operator. The ultimate value of result_bitset is [0, 1, 0, 1, 0, 1, 1, 1]. That is to say, only entities [1, 3, 5] will be computed in the following search or query stage.

Figure 3. Search with Time Travel = 350.

What's Next?

In this series of blogs, we have previously introduced the design of data deletion in Milvus 2.0. In the upcoming articles in this series, we will introduce the design of data compaction, and dynamic load balance in Milvus 2.0. Please stay tuned.

Published at DZone with permission of Jun Gu. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • 10 Books Every Senior Engineer Should Read
  • Modern REST API Design Principles and Rules
  • Memory Debugging and Watch Annotations
  • 11 Reasons To Use Selenium for Automation Testing

Comments

Big Data Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo