How Bitset Enables the Versatility of Vector Search
This post clarifies the concept of bitset in Milvus and how it works to support delete operations, Time Travel, and attribute filtering with three examples.
Join the DZone community and get the full member experience.
Join For FreeVarious new essential features of a vector database are provided together with the release of Milvus 2.0. Among the new features, Time Travel, attribute filtering, and delete operations are correlated as these three features are achieved by one common mechanism: bitset.
Therefore, this article aims to clarify the concept of bitset in Milvus and explain how it works to support delete operations, Time Travel, and attribute filtering with three examples.
What Is Bitset?
A bitset is an array of bit numbers ("0" and "1") that can be used to represent certain data information. With bitsets, you can store certain types of data compactly and efficiently as opposed to storing them in Ints, floats, or chars. Bitsets work on boolean logic, according to which the value of output is either valid or invalid, usually denoted by "1" and "0" respectively. "1" stands for valid, and "0" for invalid. Since bitsets are highly efficient and can save storage, they can also be used to achieve many features such as attribute filtering, delete operations, Time Travel, and more.
Starting from version 0.7.0, the concept of bitset has been introduced in Milvus to enable the delete function. More specifically, the bitset is used to mark if each row in the segment is deleted. Deleted entities are marked with "1" in the corresponding bitset, and as a result, the deleted entities will not be computed during a search or query.
In the Milvus 2.0 version, the application of bitset is extended to enable more features, like attribute filtering and Time Travel. The general principle in a bitset remains the same. That is, if an entity is marked with "1" in the corresponding bitset, the entity will be ignored during a search or query. Bitsets are used to enable 3 features in Milvus:
- Attribute filtering
- Data deletion
- Query with Time Travel
How Does Bitset Work in Milvus?
The examples below are used to illustrate how bitset works in Milvus.
Prerequisites
Suppose there is a segment with eight entities and a series of data manipulation language (DML) events that happen in the order shown in the figure below.
- Four of the entities, whose
primary_keys
are [1, 2, 3, 4] respectively, are inserted when the timestampts
equals 100. - The remaining four entities, whose
primary_keys
are [5, 6, 7, 8], are inserted when the timestampts
equals 200. - Entities whose
primary_keys
are [7, 8] are deleted when the timestampts
equals 300. - Only entities, whose
primary_keys
are [1, 3, 5, 7], satisfy the conditions of attribute filtering.

Order of DML events
Case One
Suppose the value a user sets for time_travel
is 150. In other words, the user conducts a query on the data stored in Milvus when ts
= 150. The bitset generation process is illustrated in Figure 1 below.
During the initial filtering stage, the result of the filter_bitset
should be [1, 0, 1, 0, 1, 0, 1, 0] as entities [1, 3, 5, 7] are valid filtering results and marked as "1" in the bitset. However, entities [4, 5, 6, 7] were not even inserted into the vector database when ts
equals 150. Therefore, these four entities should be marked as "0" regardless of the filtering condition. Now the bitset result should be [1, 0, 1, 0, 0, 0, 0, 0]. Since in Milvus, the general principle of bitset computing is that entities marked with "1" in the bitset are ignored during a search or query, the bitset result after Time Travel and attribute filtering needs to be flipped in order to be combined with the deletion bitmap. The flipped result of filter_bitset
should be [0, 1, 0, 1, 1, 1, 1, 1].
As for the deletion bitset del_bitset
, the initial value should be [0, 0, 0, 0, 0, 0, 1, 1]. However, entities 7 and 8 are not deleted until ts
is 300. Therefore, when ts
is 150, entities 7 and 8 are still valid. As a result, the del_bitset
value after Time Travel should be [0, 0, 0, 0, 0, 0, 0, 0].
Now we have two bitsets after Time Travel and attribute filtering: filter_bitset
[0, 1, 0, 1, 1, 1, 1, 1] and del_bitset
[0, 0, 0, 0, 0, 0, 0, 0]. Combine these two bitsets with the "OR" binary logic operator. The ultimate value of result_bitset
is [0, 1, 0, 1, 1, 1, 1, 1]. That is to say, only entities 1 and 3 will be computed in the following search or query stage.
Case Two
Suppose the value the user sets for time_travel
is 250. In other words, the user conducts a query on the data stored in Milvus when ts
= 250. The bitset generation process is illustrated in Figure 2 below.
Like in case one, the resultant filter_bitset
of the initial attribute filtering stage should be [1, 0, 1, 0, 1, 0, 1, 0].
All entities [1, 2, 3, 4, 5, 6, 7, 8] are inserted to the vector database when ts
= 250. Therefore, the previous result of filter_bitset
remains the same. Again, we need to flip the result of the filter_bitset
, and we will get [0, 1, 0, 1, 0, 1, 0, 1].
As for the deletion bitset del_bitset
, the initial value should be [0, 0, 0, 0, 0, 0, 1, 1]. However, entities 7 and 8 were not deleted until ts
is 300. Therefore, when ts
is 250, entities 7 and 8 are still valid. As a result, the del_bitset
value after Time Travel should be [0, 0, 0, 0, 0, 0, 0, 0].
Now we have two bitsets after Time Travel and attribute filtering: filter_bitset
[0, 1, 0, 1, 0, 1, 0, 1] and del_bitset
[0, 0, 0, 0, 0, 0, 0, 0]. Combine these two bitsets with the "OR" binary logic operator. The ultimate value of result_bitset
is [0, 1, 0, 1, 0, 1, 0, 1]. That is to say, only entities [1, 3, 5, 7] will be computed in the following search or query stage.
Figure 2. Search with Time Travel = 250.
Case Three
Suppose the value the user sets for time_travel
is 350. In other words, the user conducts a query on the data stored in Milvus when ts
= 350. The bitset generation process is illustrated in Figure 3 below.
We see the same as in cases one and two: the resultant filter_bitset
of the initial attribute filtering stage is [0, 1, 0, 1, 0, 1, 0, 1]. All entities [1, 2, 3, 4, 5, 6, 7, 8] are inserted into the vector database when ts
= 350. Therefore, the final flipped result of the filter_bitset
is [0, 1, 0, 1, 0, 1, 0, 1], the same as in case two.
As for the deletion bitset del_bitset
, since entities 7 and 8 are already deleted when ts
=350, therefore, the result of del_bitset
should be [0, 0, 0, 0, 0, 0, 1, 1].
Now we have two bitsets after Time Travel and attribute filtering: filter_bitset
[0, 1, 0, 1, 0, 1, 0, 1] and del_bitset
[0, 0, 0, 0, 0, 0, 1, 1]. Combine these two bitsets with the "OR" binary logic operator. The ultimate value of result_bitset
is [0, 1, 0, 1, 0, 1, 1, 1]. That is to say, only entities [1, 3, 5] will be computed in the following search or query stage.
What's Next?
In this series of blogs, we have previously introduced the design of data deletion in Milvus 2.0. In the upcoming articles in this series, we will introduce the design of data compaction, and dynamic load balance in Milvus 2.0. Please stay tuned.
Published at DZone with permission of Jun Gu. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments