Essential Techniques for Production Vector Search Systems, Part 3: Filterable HNSW

Proven techniques for production vector search, including when to use each one, how to combine them effectively, and trade-offs to understand before deployment.

Pavan Vemuri

CORE ·

Jan. 30, 26 · Analysis

Likes (2)

Comment

Save

1.6K Views

After implementing vector search systems at multiple companies, I wanted to document efficient techniques that can be very helpful for successful production deployments of vector search systems.

I want to present these techniques by showcasing when to apply each one, how they complement each other, and the trade-offs they introduce. This will be a multi-part series that introduces all of the techniques one by one in each article. I have also included code snippets to quickly test each technique.

Before we get into the real details, let us look at the prerequisites and setup.

For ease of understanding and use, I am using the free cloud tier from Qdrant for all of the demonstrations below.

Steps to Set Up Qdrant Cloud

Step 1: Get a Free Qdrant Cloud Cluster

Sign up at https://cloud.qdrant.io.
Create a free cluster
- Click "Create Cluster."
- Select Free Tier.
- Choose a region closest to you.
- Wait for the cluster to be provisioned.
Capture your credentials.
- Cluster URL: https://xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.us-east.aws.cloud.qdrant.io:6333.
- API Key: Click "API Keys" → "Generate" → Copy the key.

Step 2: Install Python Dependencies

    PowerShell
   
   pip install qdrant-client fastembed numpy

Recommended versions:

qdrant-client >= 1.7.0
fastembed >= 0.2.0
numpy >= 1.24.0
python-dotenv >= 1.0.0

Step 3: Set Environment Variables or Create a `.env` File

    PowerShell
   
   # Add to your ~/.bashrc or ~/.zshrc
export QDRANT_URL="https://your-cluster-url.cloud.qdrant.io:6333"
export QDRANT_API_KEY="your-api-key-here"

Create a .env file in the project directory with the following content. Remember to add .env to your .gitignore to avoid committing credentials.

    PowerShell
   
   # .env file
QDRANT_URL=https://your-cluster-url.cloud.qdrant.io:6333
QDRANT_API_KEY=your-api-key-here

Step 4: Verify Connection

We can verify the connection to the Qdrant collection with the following script. From this point onward, I am assuming the .env setup is complete.

    Python
   
 

   from qdrant_client import QdrantClient
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Initialize client
client = QdrantClient(
    url=os.getenv("QDRANT_URL"),
    api_key=os.getenv("QDRANT_API_KEY"),
)

# Test connection
try:
    collections = client.get_collections()
    print(f" Connected successfully!")
    print(f"   Current collections: {len(collections.collections)}")
except Exception as e:
    print(f" Connection failed: {e}")
    print("   Check your .env file has QDRANT_URL and QDRANT_API_KEY")
  

Expected output:

    Plain Text
   
   python verify-connection.py
Connected successfully!
   Current collections: 2

Now that we have the setup out of the way, we can get into the meat of the article.

Before the deep dive into filterable HNSW, let us look at a high-level overview of the techniques we are about to cover in this multi-part series.

Technique	problems solved	performance impact	complexity
Hybrid Search	We will miss exact matches if we employ semantic search purely.	Huge increase in the accuracy, closer to 16%	Medium
Binary Quantization	Memory costs scale linearly with data.	40X memory reduction, 15% faster	Low
Filterable HNSW	Not a good practice to apply post-filtering as it wastes computation.	5X faster filtered queries	Medium
Multi Vector Search	A single embedding will not be able to capture the importance of various fields.	Handles queries from multiple fields, such as title vs description, and requires two times more storage.	Medium
Reranking	Optimized vector search for speed over precision.	Deeper semantic understanding, 15-20% ranking improvement	High

Keep in mind that production systems typically combine two to four of these techniques.

For example, a typical e-commerce website might use hybrid search, binary quantization, and filterable HNSW.

We covered Hybrid Search in the first part of the series and Binary Quantization in the second part. In this part, we will dive into filterable HNSW.

Filterable HNSW

To understand how filterable HNSW is advantageous, let us look at how traditional filtering approaches, whether pre- or post-filter, waste computation. Post-filtering discards 90% of retrieved results, whereas pre-filtering reduces the search space so much that vector similarity becomes less significant.

That is where filterable HNSW comes in handy, as it applies filters during the HNSW graph traversal. In other words, the algorithm navigates only through graph nodes that satisfy filter conditions.

With components such as payload indexes (fast lookup structures for filterable fields), filter-aware traversal (HNSW navigation skips non-matching nodes), and dynamic candidate expansion (automatically fetch more candidates when filters are restrictive), the filterable HNSW is the way to go.

Let us take a look at it in more detail with the code below.

    Python
   
 

   """
Example usage of the filterable_hnsw module.

This demonstrates how to use Filterable HNSW with your own Qdrant collection.
"""

from filterable_hnsw import (
    filterable_search,
    compare_filtered_unfiltered,
    display_filtered_results,
    get_qdrant_client
)
from dotenv import load_dotenv
import os

load_dotenv()

# Initialize client
client = get_qdrant_client()

# Your collection name
COLLECTION_NAME = "automotive_parts"  # Change this to your collection name

# Example 1: Filtered search
print("=" * 80)
print("EXAMPLE 1: Filtered Search (Filterable HNSW)")
print("=" * 80)
print("Searching: 'engine sensor' with category filter")
print("Expected: Finds semantically similar parts within the specified category\n")

query1 = "engine sensor"
# First get unfiltered results to see what categories exist
unfiltered_test1 = filterable_search(
    collection_name=COLLECTION_NAME,
    query=query1,
    filter_conditions=None,
    client=client,
    limit=1
)
# Extract category from first result if available
if unfiltered_test1 and 'category' in unfiltered_test1[0]['payload']:
    actual_category1 = unfiltered_test1[0]['payload']['category']
    filter1 = {"category": actual_category1}
    print(f"Using category from data: '{actual_category1}'\n")
else:
    filter1 = {"category": "Engine Components"}  # Fallback

filtered_results = filterable_search(
    collection_name=COLLECTION_NAME,
    query=query1,
    filter_conditions=filter1,
    client=client,
    limit=5
)
display_filtered_results(
    filtered_results, 
    query1, 
    show_fields=['part_name', 'part_id', 'category', 'description']
)

print("\n\n")

# Example 2: Comparison between Filterable HNSW and Post-Filtering
print("=" * 80)
print("EXAMPLE 2: Filterable HNSW vs Post-Filtering Comparison")
print("=" * 80)
print("Comparing filtering DURING traversal vs filtering AFTER retrieval")
print("Expected: Shows Filterable HNSW is more efficient (no wasted computation)\n")

query2 = "brake system"
# First get unfiltered results to see what categories exist
unfiltered_test2 = filterable_search(
    collection_name=COLLECTION_NAME,
    query=query2,
    filter_conditions=None,
    client=client,
    limit=1
)
# Extract category from first result if available
if unfiltered_test2 and 'category' in unfiltered_test2[0]['payload']:
    actual_category2 = unfiltered_test2[0]['payload']['category']
    filter2 = {"category": actual_category2}
    print(f"Using category from data: '{actual_category2}'\n")
else:
    filter2 = {"category": "Braking System"}  # Fallback

comparison = compare_filtered_unfiltered(
    collection_name=COLLECTION_NAME,
    query=query2,
    filter_conditions=filter2,
    client=client,
    limit=5
)

print("\n\n")

# Example 3: Display detailed comparison
print("=" * 80)
print("EXAMPLE 3: Detailed Result Comparison")
print("=" * 80)
print("Top results from both methods:\n")

print("Post-Filtered Results (Top 3):")
print("-" * 80)
for i, result in enumerate(comparison["post_filtered"]["results"][:3], 1):
    payload = result["payload"]
    name = payload.get('part_name', payload.get('name', 'Unknown'))
    category = payload.get('category', 'N/A')
    print(f"{i}. {name}")
    print(f"   Category: {category}")
    print(f"   Score: {result['score']:.4f}")
    print(f"   ID: {result['id']}")

print("\nFilterable HNSW Results (Top 3):")
print("-" * 80)
for i, result in enumerate(comparison["filtered"]["results"][:3], 1):
    payload = result["payload"]
    name = payload.get('part_name', payload.get('name', 'Unknown'))
    category = payload.get('category', 'N/A')
    print(f"{i}. {name}")
    print(f"   Category: {category}")
    print(f"   Score: {result['score']:.4f}")
    print(f"   ID: {result['id']}")

print("\n" + "=" * 80)
print("SUMMARY:")
print("=" * 80)
print("Filterable HNSW:")
print("  - Filters DURING graph traversal (not before or after)")
print("  - Only navigates through nodes that satisfy filter conditions")
print("  - No wasted computation - doesn't retrieve then discard results")
print("  - More efficient than post-filtering which wastes >90% computation")
print(f"  - In this example: {comparison['overlap_ratio']*100:.1f}% result overlap")


  

Let us now look at the Filterable HNSW in action with the implementation output

    Plain Text
   
 

   ================================================================================
EXAMPLE 1: Filtered Search (Filterable HNSW)
================================================================================
Searching: 'engine sensor' with category filter
Expected: Finds semantically similar parts within the specified category

Using category from data: 'Safety Systems'


Filtered Search Results for: 'engine sensor'
================================================================================
Found 5 results


1. Safety Sensor Module 237
   Part_name: Safety Sensor Module 237
   Part_id: DEL-0000237
   Category: Safety Systems
   Description: Advanced safety sensor for ADAS applications including collision avoidance and driver assistance fea...
   Score: 0.4092
--------------------------------------------------------------------------------

2. Safety Sensor Module 240
   Part_name: Safety Sensor Module 240
   Part_id: BOS-0000240
   Category: Safety Systems
   Description: Advanced safety sensor for ADAS applications including collision avoidance and driver assistance fea...
   Score: 0.4052
--------------------------------------------------------------------------------

3. Safety Sensor Module 242
   Part_name: Safety Sensor Module 242
   Part_id: VAL-0000242
   Category: Safety Systems
   Description: Advanced safety sensor for ADAS applications including collision avoidance and driver assistance fea...
   Score: 0.4004
--------------------------------------------------------------------------------

4. Safety Sensor Module 246
   Part_name: Safety Sensor Module 246
   Part_id: CON-0000246
   Category: Safety Systems
   Description: Advanced safety sensor for ADAS applications including collision avoidance and driver assistance fea...
   Score: 0.3983
--------------------------------------------------------------------------------

5. Safety Sensor Module 234
   Part_name: Safety Sensor Module 234
   Part_id: ZF-0000234
   Category: Safety Systems
   Description: Advanced safety sensor for ADAS applications including collision avoidance and driver assistance fea...
   Score: 0.3978
--------------------------------------------------------------------------------



================================================================================
EXAMPLE 2: Filterable HNSW vs Post-Filtering Comparison
================================================================================
Comparing filtering DURING traversal vs filtering AFTER retrieval
Expected: Shows Filterable HNSW is more efficient (no wasted computation)

Using category from data: 'Braking System'


Comparing Filterable HNSW vs Post-Filtering for: 'brake system'
Filters: {'category': 'Braking System'}
================================================================================

1. Post-Filtering (Inefficient)
   Retrieves many results, then filters AFTER retrieval
--------------------------------------------------------------------------------

2. Filterable HNSW (Efficient)
   Filters DURING graph traversal - only navigates matching nodes
--------------------------------------------------------------------------------

================================================================================
COMPARISON SUMMARY
================================================================================
Post-Filtering (Traditional Approach):
  Time: 126.94 ms
  Results: 5
  Approach: Retrieves 50 candidates, discards 45
  Top Score: 0.6419

Filterable HNSW:
  Time: 79.26 ms
  Results: 5
  Approach: Only navigates through nodes matching filter conditions
  Top Score: 0.6419

Overlap:
  Common Results: 5 / 5 (100.0%)

Filterable HNSW is 1.60x faster

Key Difference:
  Post-Filtering: Wastes computation by retrieving and discarding results
  Filterable HNSW: Filters during graph traversal - no wasted computation

================================================================================



================================================================================
EXAMPLE 3: Detailed Result Comparison
================================================================================
Top results from both methods:

Post-Filtered Results (Top 3):
--------------------------------------------------------------------------------
1. Brake Control Component 168
   Category: Braking System
   Score: 0.6419
   ID: 1794233379
2. Brake Control Component 154
   Category: Braking System
   Score: 0.6396
   ID: 3151300734
3. Brake Control Component 176
   Category: Braking System
   Score: 0.6394
   ID: 1517692434

Filterable HNSW Results (Top 3):
--------------------------------------------------------------------------------
1. Brake Control Component 168
   Category: Braking System
   Score: 0.6419
   ID: 1794233379
2. Brake Control Component 154
   Category: Braking System
   Score: 0.6396
   ID: 3151300734
3. Brake Control Component 176
   Category: Braking System
   Score: 0.6394
   ID: 1517692434

================================================================================
SUMMARY:
================================================================================
Filterable HNSW:
  - Filters DURING graph traversal (not before or after)
  - Only navigates through nodes that satisfy filter conditions
  - No wasted computation - doesn't retrieve then discard results
  - More efficient than post-filtering which wastes >90% computation
  - In this example: 100.0% result overlap

  

Benefits

As you can clearly see from the results, filterable HNSW offers computational efficiency, achieving 1.6 times faster performance. There is also no wasted computation, as you can see from the results, post filtering retrieved 50 items and discarded 45 of them, whereas filterable HNSW only navigated nodes matching the "breaking system" category. The results are also guaranteed for good quality, as you can see from the overlap (all 5 results are identical between methods).

Costs

For us to be able to execute filterable HNSW, we have a payload index overhead in creating an index for the category, supplier, and in_stock field. For a million parts, we are looking at a minimum of 6% overhead. Also, we need to consider the maintenance aspect of it, as every new part indexed must update the payload indexes. Also to keep in mind is the fact that complex OR conditions may degrade performance on the filtering. Also, payload indexes are kept in RAM for faster access, so there is no need to account for this in capacity planning.

When to Use

When the results are frequently filtered
When the filters are selective (reduce results by more than 50%)
When the data has categorical/structured metadata

When Not to Use

When filters are rarely used
Filters are not selective (remove less than 20% of results)
Very small datasets (less than 10,000 items

Efficiency Comparison

Approach	candidates retrieved	results returned	wasted work	cpu efficiency
Post Filtering	50	5	45 (90%)	10% Efficient
Filterable HNSW	5	5	0 (0%)	100 % efficient

Performance Characteristics

Based on the results, let us now look at the performance characteristics

Metric	post filtering	filterable hnsw	evidence from the data
Query Latency	126.94ms	79.26ms	1.6 times faster
Wasted Computation	90%	0%	No wasted computation by filterable HNSW
Result Quality	0.6419 (top score)	0.6419 (top score)	100% overlap
Memory Overhead	Baseline	+5-10%	Payload indexes for the categories and other fields
Scalability	Degrades with Selectivity	Constant Performance	More selective filter, bigger speedup for filterable HNSW

Conclusion

We have looked at the concept and also the results for filterable HNSW and concluded that the more selective the filters are, the better the output for the results. The bottom line is that if more than 30% of your queries use filterable HNSW, unlike the previous two techniques discussed in the series, filterable HNSW just gives pure gain and no overheads.

In the next part of the series, we will look at multi-vector search and its advantages and disadvantages.

Data structure Production (computer science) systems AI

Opinions expressed by DZone contributors are their own.

Related

Trending