DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Essential Techniques for Production Vector Search Systems, Part 5: Reranking
  • Essential Techniques for Production Vector Search Systems Part 2 - Binary Quantization
  • Essential Techniques for Production Vector Search Systems Part 1 - Hybrid Search
  • Designing Self-Healing AI Infrastructure: The Role of Autonomous Recovery

Trending

  • How AI Is Rewriting Full-Stack Java Systems: Practical Patterns with Spring Boot, Kafka and WebSockets
  • The Cost of Knowing: When Observability Becomes the Outage
  • AWS Kiro: The Agentic IDE That Makes Specs the Unit of Work
  • Solving the Mystery: Why Java RSS Grows in Docker on M1 Macs
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Essential Techniques for Production Vector Search Systems, Part 3: Filterable HNSW

Essential Techniques for Production Vector Search Systems, Part 3: Filterable HNSW

Proven techniques for production vector search, including when to use each one, how to combine them effectively, and trade-offs to understand before deployment.

By 
Pavan Vemuri user avatar
Pavan Vemuri
DZone Core CORE ·
Jan. 30, 26 · Analysis
Likes (2)
Comment
Save
Tweet
Share
1.4K Views

Join the DZone community and get the full member experience.

Join For Free

After implementing vector search systems at multiple companies, I wanted to document efficient techniques that can be very helpful for successful production deployments of vector search systems.

I want to present these techniques by showcasing when to apply each one, how they complement each other, and the trade-offs they introduce. This will be a multi-part series that introduces all of the techniques one by one in each article. I have also included code snippets to quickly test each technique.

Before we get into the real details, let us look at the prerequisites and setup.

For ease of understanding and use, I am using the free cloud tier from Qdrant for all of the demonstrations below.

Steps to Set Up Qdrant Cloud

Step 1: Get a Free Qdrant Cloud Cluster

  • Sign up at https://cloud.qdrant.io.
  • Create a free cluster
    • Click "Create Cluster."
    • Select Free Tier.
    • Choose a region closest to you.
    • Wait for the cluster to be provisioned.
  • Capture your credentials.
    • Cluster URL: https://xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.us-east.aws.cloud.qdrant.io:6333.
    • API Key: Click "API Keys" → "Generate" → Copy the key.

Step 2: Install Python Dependencies

PowerShell
 
pip install qdrant-client fastembed numpy


Recommended versions:

  • qdrant-client >= 1.7.0
  • fastembed >= 0.2.0
  • numpy >= 1.24.0
  • python-dotenv >= 1.0.0

Step 3: Set Environment Variables or Create a .env File

PowerShell
 
# Add to your ~/.bashrc or ~/.zshrc
export QDRANT_URL="https://your-cluster-url.cloud.qdrant.io:6333"
export QDRANT_API_KEY="your-api-key-here"


Create a .env file in the project directory with the following content. Remember to add .env to your .gitignore to avoid committing credentials.

PowerShell
 
# .env file
QDRANT_URL=https://your-cluster-url.cloud.qdrant.io:6333
QDRANT_API_KEY=your-api-key-here


Step 4: Verify Connection

We can verify the connection to the Qdrant collection with the following script. From this point onward, I am assuming the .env setup is complete.

Python
 
from qdrant_client import QdrantClient
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Initialize client
client = QdrantClient(
    url=os.getenv("QDRANT_URL"),
    api_key=os.getenv("QDRANT_API_KEY"),
)

# Test connection
try:
    collections = client.get_collections()
    print(f" Connected successfully!")
    print(f"   Current collections: {len(collections.collections)}")
except Exception as e:
    print(f" Connection failed: {e}")
    print("   Check your .env file has QDRANT_URL and QDRANT_API_KEY")


Expected output:

Plain Text
 
python verify-connection.py
Connected successfully!
   Current collections: 2


Now that we have the setup out of the way, we can get into the meat of the article.

Before the deep dive into filterable HNSW, let us look at a high-level overview of the techniques we are about to cover in this multi-part series.

Technique problems solved performance impact complexity
Hybrid Search We will miss exact matches if we employ semantic search purely. Huge increase in the accuracy, closer to 16% Medium
Binary Quantization Memory costs scale linearly with data. 40X memory reduction, 15% faster Low
Filterable HNSW Not a good practice to apply post-filtering as it wastes computation. 5X faster filtered queries Medium
Multi Vector Search A single embedding will not be able to capture the importance of various fields. Handles queries from multiple fields, such as title vs description, and requires two times more storage. Medium
Reranking Optimized vector search for speed over precision. Deeper semantic understanding, 15-20% ranking improvement High


Keep in mind that production systems typically combine two to four of these techniques.

For example, a typical e-commerce website might use hybrid search, binary quantization, and filterable HNSW.

We covered Hybrid Search in the first part of the series and Binary Quantization in the second part. In this part, we will dive into filterable HNSW.

Filterable HNSW

To understand how filterable HNSW is advantageous, let us look at how traditional filtering approaches, whether pre- or post-filter, waste computation. Post-filtering discards 90% of retrieved results, whereas pre-filtering reduces the search space so much that vector similarity becomes less significant.

That is where filterable HNSW comes in handy, as it applies filters during the HNSW graph traversal. In other words, the algorithm navigates only through graph nodes that satisfy filter conditions.

With components such as payload indexes (fast lookup structures for filterable fields), filter-aware traversal (HNSW navigation skips non-matching nodes), and dynamic candidate expansion (automatically fetch more candidates when filters are restrictive), the filterable HNSW is the way to go.

Let us take a look at it in more detail with the code below.

Python
 
"""
Example usage of the filterable_hnsw module.

This demonstrates how to use Filterable HNSW with your own Qdrant collection.
"""

from filterable_hnsw import (
    filterable_search,
    compare_filtered_unfiltered,
    display_filtered_results,
    get_qdrant_client
)
from dotenv import load_dotenv
import os

load_dotenv()

# Initialize client
client = get_qdrant_client()

# Your collection name
COLLECTION_NAME = "automotive_parts"  # Change this to your collection name

# Example 1: Filtered search
print("=" * 80)
print("EXAMPLE 1: Filtered Search (Filterable HNSW)")
print("=" * 80)
print("Searching: 'engine sensor' with category filter")
print("Expected: Finds semantically similar parts within the specified category\n")

query1 = "engine sensor"
# First get unfiltered results to see what categories exist
unfiltered_test1 = filterable_search(
    collection_name=COLLECTION_NAME,
    query=query1,
    filter_conditions=None,
    client=client,
    limit=1
)
# Extract category from first result if available
if unfiltered_test1 and 'category' in unfiltered_test1[0]['payload']:
    actual_category1 = unfiltered_test1[0]['payload']['category']
    filter1 = {"category": actual_category1}
    print(f"Using category from data: '{actual_category1}'\n")
else:
    filter1 = {"category": "Engine Components"}  # Fallback

filtered_results = filterable_search(
    collection_name=COLLECTION_NAME,
    query=query1,
    filter_conditions=filter1,
    client=client,
    limit=5
)
display_filtered_results(
    filtered_results, 
    query1, 
    show_fields=['part_name', 'part_id', 'category', 'description']
)

print("\n\n")

# Example 2: Comparison between Filterable HNSW and Post-Filtering
print("=" * 80)
print("EXAMPLE 2: Filterable HNSW vs Post-Filtering Comparison")
print("=" * 80)
print("Comparing filtering DURING traversal vs filtering AFTER retrieval")
print("Expected: Shows Filterable HNSW is more efficient (no wasted computation)\n")

query2 = "brake system"
# First get unfiltered results to see what categories exist
unfiltered_test2 = filterable_search(
    collection_name=COLLECTION_NAME,
    query=query2,
    filter_conditions=None,
    client=client,
    limit=1
)
# Extract category from first result if available
if unfiltered_test2 and 'category' in unfiltered_test2[0]['payload']:
    actual_category2 = unfiltered_test2[0]['payload']['category']
    filter2 = {"category": actual_category2}
    print(f"Using category from data: '{actual_category2}'\n")
else:
    filter2 = {"category": "Braking System"}  # Fallback

comparison = compare_filtered_unfiltered(
    collection_name=COLLECTION_NAME,
    query=query2,
    filter_conditions=filter2,
    client=client,
    limit=5
)

print("\n\n")

# Example 3: Display detailed comparison
print("=" * 80)
print("EXAMPLE 3: Detailed Result Comparison")
print("=" * 80)
print("Top results from both methods:\n")

print("Post-Filtered Results (Top 3):")
print("-" * 80)
for i, result in enumerate(comparison["post_filtered"]["results"][:3], 1):
    payload = result["payload"]
    name = payload.get('part_name', payload.get('name', 'Unknown'))
    category = payload.get('category', 'N/A')
    print(f"{i}. {name}")
    print(f"   Category: {category}")
    print(f"   Score: {result['score']:.4f}")
    print(f"   ID: {result['id']}")

print("\nFilterable HNSW Results (Top 3):")
print("-" * 80)
for i, result in enumerate(comparison["filtered"]["results"][:3], 1):
    payload = result["payload"]
    name = payload.get('part_name', payload.get('name', 'Unknown'))
    category = payload.get('category', 'N/A')
    print(f"{i}. {name}")
    print(f"   Category: {category}")
    print(f"   Score: {result['score']:.4f}")
    print(f"   ID: {result['id']}")

print("\n" + "=" * 80)
print("SUMMARY:")
print("=" * 80)
print("Filterable HNSW:")
print("  - Filters DURING graph traversal (not before or after)")
print("  - Only navigates through nodes that satisfy filter conditions")
print("  - No wasted computation - doesn't retrieve then discard results")
print("  - More efficient than post-filtering which wastes >90% computation")
print(f"  - In this example: {comparison['overlap_ratio']*100:.1f}% result overlap")

Let us now look at the Filterable HNSW in action with the implementation output

Plain Text
 
================================================================================
EXAMPLE 1: Filtered Search (Filterable HNSW)
================================================================================
Searching: 'engine sensor' with category filter
Expected: Finds semantically similar parts within the specified category

Using category from data: 'Safety Systems'


Filtered Search Results for: 'engine sensor'
================================================================================
Found 5 results


1. Safety Sensor Module 237
   Part_name: Safety Sensor Module 237
   Part_id: DEL-0000237
   Category: Safety Systems
   Description: Advanced safety sensor for ADAS applications including collision avoidance and driver assistance fea...
   Score: 0.4092
--------------------------------------------------------------------------------

2. Safety Sensor Module 240
   Part_name: Safety Sensor Module 240
   Part_id: BOS-0000240
   Category: Safety Systems
   Description: Advanced safety sensor for ADAS applications including collision avoidance and driver assistance fea...
   Score: 0.4052
--------------------------------------------------------------------------------

3. Safety Sensor Module 242
   Part_name: Safety Sensor Module 242
   Part_id: VAL-0000242
   Category: Safety Systems
   Description: Advanced safety sensor for ADAS applications including collision avoidance and driver assistance fea...
   Score: 0.4004
--------------------------------------------------------------------------------

4. Safety Sensor Module 246
   Part_name: Safety Sensor Module 246
   Part_id: CON-0000246
   Category: Safety Systems
   Description: Advanced safety sensor for ADAS applications including collision avoidance and driver assistance fea...
   Score: 0.3983
--------------------------------------------------------------------------------

5. Safety Sensor Module 234
   Part_name: Safety Sensor Module 234
   Part_id: ZF-0000234
   Category: Safety Systems
   Description: Advanced safety sensor for ADAS applications including collision avoidance and driver assistance fea...
   Score: 0.3978
--------------------------------------------------------------------------------



================================================================================
EXAMPLE 2: Filterable HNSW vs Post-Filtering Comparison
================================================================================
Comparing filtering DURING traversal vs filtering AFTER retrieval
Expected: Shows Filterable HNSW is more efficient (no wasted computation)

Using category from data: 'Braking System'


Comparing Filterable HNSW vs Post-Filtering for: 'brake system'
Filters: {'category': 'Braking System'}
================================================================================

1. Post-Filtering (Inefficient)
   Retrieves many results, then filters AFTER retrieval
--------------------------------------------------------------------------------

2. Filterable HNSW (Efficient)
   Filters DURING graph traversal - only navigates matching nodes
--------------------------------------------------------------------------------

================================================================================
COMPARISON SUMMARY
================================================================================
Post-Filtering (Traditional Approach):
  Time: 126.94 ms
  Results: 5
  Approach: Retrieves 50 candidates, discards 45
  Top Score: 0.6419

Filterable HNSW:
  Time: 79.26 ms
  Results: 5
  Approach: Only navigates through nodes matching filter conditions
  Top Score: 0.6419

Overlap:
  Common Results: 5 / 5 (100.0%)

Filterable HNSW is 1.60x faster

Key Difference:
  Post-Filtering: Wastes computation by retrieving and discarding results
  Filterable HNSW: Filters during graph traversal - no wasted computation

================================================================================



================================================================================
EXAMPLE 3: Detailed Result Comparison
================================================================================
Top results from both methods:

Post-Filtered Results (Top 3):
--------------------------------------------------------------------------------
1. Brake Control Component 168
   Category: Braking System
   Score: 0.6419
   ID: 1794233379
2. Brake Control Component 154
   Category: Braking System
   Score: 0.6396
   ID: 3151300734
3. Brake Control Component 176
   Category: Braking System
   Score: 0.6394
   ID: 1517692434

Filterable HNSW Results (Top 3):
--------------------------------------------------------------------------------
1. Brake Control Component 168
   Category: Braking System
   Score: 0.6419
   ID: 1794233379
2. Brake Control Component 154
   Category: Braking System
   Score: 0.6396
   ID: 3151300734
3. Brake Control Component 176
   Category: Braking System
   Score: 0.6394
   ID: 1517692434

================================================================================
SUMMARY:
================================================================================
Filterable HNSW:
  - Filters DURING graph traversal (not before or after)
  - Only navigates through nodes that satisfy filter conditions
  - No wasted computation - doesn't retrieve then discard results
  - More efficient than post-filtering which wastes >90% computation
  - In this example: 100.0% result overlap


Benefits

As you can clearly see from the results, filterable HNSW offers computational efficiency, achieving 1.6 times faster performance. There is also no wasted computation, as you can see from the results, post filtering retrieved 50 items and discarded 45 of them, whereas filterable HNSW only navigated nodes matching the "breaking system" category. The results are also guaranteed for good quality, as you can see from the overlap (all 5 results are identical between methods).

Costs

For us to be able to execute filterable HNSW, we have a payload index overhead in creating an index for the category, supplier, and in_stock field. For a million parts, we are looking at a minimum of 6% overhead. Also, we need to consider the maintenance aspect of it, as every new part indexed must update the payload indexes. Also to keep in mind is the fact that complex OR conditions may degrade performance on the filtering. Also, payload indexes are kept in RAM for faster access, so there is no need to account for this in capacity planning.

When to Use

  • When the results are frequently filtered
  • When the filters are selective (reduce results by more than 50%)
  • When the data has categorical/structured metadata

When Not to Use

  • When filters are rarely used
  • Filters are not selective (remove less than 20% of results)
  • Very small datasets (less than 10,000 items

Efficiency Comparison

Approach candidates retrieved results returned wasted work cpu efficiency
Post Filtering 50 5 45 (90%) 10% Efficient
Filterable HNSW 5 5 0 (0%) 100 % efficient


Performance Characteristics 

Based on the results, let us now look at the performance characteristics

Metric post filtering filterable hnsw evidence from the data
Query Latency 126.94ms 79.26ms 1.6 times faster 
Wasted Computation 90% 0% No wasted computation by filterable HNSW
Result Quality 0.6419 (top score) 0.6419 (top score) 100% overlap
Memory Overhead Baseline +5-10% Payload indexes for the categories and other fields
Scalability Degrades with Selectivity Constant Performance More selective filter, bigger speedup for filterable HNSW


Conclusion

We have looked at the concept and also the results for filterable HNSW and concluded that the more selective the filters are, the better the output for the results. The bottom line is that if more than 30% of your queries use filterable HNSW, unlike the previous two techniques discussed in the series, filterable HNSW just gives pure gain and no overheads.

In the next part of the series, we will look at multi-vector search and its advantages and disadvantages.

Data structure Production (computer science) systems AI

Opinions expressed by DZone contributors are their own.

Related

  • Essential Techniques for Production Vector Search Systems, Part 5: Reranking
  • Essential Techniques for Production Vector Search Systems Part 2 - Binary Quantization
  • Essential Techniques for Production Vector Search Systems Part 1 - Hybrid Search
  • Designing Self-Healing AI Infrastructure: The Role of Autonomous Recovery

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook