Essential Techniques for Production Vector Search Systems, Part 3: Filterable HNSW
Proven techniques for production vector search, including when to use each one, how to combine them effectively, and trade-offs to understand before deployment.
Join the DZone community and get the full member experience.
Join For FreeAfter implementing vector search systems at multiple companies, I wanted to document efficient techniques that can be very helpful for successful production deployments of vector search systems.
I want to present these techniques by showcasing when to apply each one, how they complement each other, and the trade-offs they introduce. This will be a multi-part series that introduces all of the techniques one by one in each article. I have also included code snippets to quickly test each technique.
Before we get into the real details, let us look at the prerequisites and setup.
For ease of understanding and use, I am using the free cloud tier from Qdrant for all of the demonstrations below.
Steps to Set Up Qdrant Cloud
Step 1: Get a Free Qdrant Cloud Cluster
- Sign up at https://cloud.qdrant.io.
- Create a free cluster
- Click "Create Cluster."
- Select Free Tier.
- Choose a region closest to you.
- Wait for the cluster to be provisioned.
- Capture your credentials.
- Cluster URL: https://xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.us-east.aws.cloud.qdrant.io:6333.
- API Key: Click "API Keys" → "Generate" → Copy the key.
Step 2: Install Python Dependencies
pip install qdrant-client fastembed numpy
Recommended versions:
- qdrant-client >= 1.7.0
- fastembed >= 0.2.0
- numpy >= 1.24.0
- python-dotenv >= 1.0.0
Step 3: Set Environment Variables or Create a .env File
# Add to your ~/.bashrc or ~/.zshrc
export QDRANT_URL="https://your-cluster-url.cloud.qdrant.io:6333"
export QDRANT_API_KEY="your-api-key-here"
Create a .env file in the project directory with the following content. Remember to add .env to your .gitignore to avoid committing credentials.
# .env file
QDRANT_URL=https://your-cluster-url.cloud.qdrant.io:6333
QDRANT_API_KEY=your-api-key-here
Step 4: Verify Connection
We can verify the connection to the Qdrant collection with the following script. From this point onward, I am assuming the .env setup is complete.
from qdrant_client import QdrantClient
from dotenv import load_dotenv
import os
# Load environment variables from .env file
load_dotenv()
# Initialize client
client = QdrantClient(
url=os.getenv("QDRANT_URL"),
api_key=os.getenv("QDRANT_API_KEY"),
)
# Test connection
try:
collections = client.get_collections()
print(f" Connected successfully!")
print(f" Current collections: {len(collections.collections)}")
except Exception as e:
print(f" Connection failed: {e}")
print(" Check your .env file has QDRANT_URL and QDRANT_API_KEY")
Expected output:
python verify-connection.py
Connected successfully!
Current collections: 2
Now that we have the setup out of the way, we can get into the meat of the article.
Before the deep dive into filterable HNSW, let us look at a high-level overview of the techniques we are about to cover in this multi-part series.
| Technique | problems solved | performance impact | complexity |
|---|---|---|---|
| Hybrid Search | We will miss exact matches if we employ semantic search purely. | Huge increase in the accuracy, closer to 16% | Medium |
| Binary Quantization | Memory costs scale linearly with data. | 40X memory reduction, 15% faster | Low |
| Filterable HNSW | Not a good practice to apply post-filtering as it wastes computation. | 5X faster filtered queries | Medium |
| Multi Vector Search | A single embedding will not be able to capture the importance of various fields. | Handles queries from multiple fields, such as title vs description, and requires two times more storage. | Medium |
| Reranking | Optimized vector search for speed over precision. | Deeper semantic understanding, 15-20% ranking improvement | High |
Keep in mind that production systems typically combine two to four of these techniques.
For example, a typical e-commerce website might use hybrid search, binary quantization, and filterable HNSW.
We covered Hybrid Search in the first part of the series and Binary Quantization in the second part. In this part, we will dive into filterable HNSW.
Filterable HNSW
To understand how filterable HNSW is advantageous, let us look at how traditional filtering approaches, whether pre- or post-filter, waste computation. Post-filtering discards 90% of retrieved results, whereas pre-filtering reduces the search space so much that vector similarity becomes less significant.
That is where filterable HNSW comes in handy, as it applies filters during the HNSW graph traversal. In other words, the algorithm navigates only through graph nodes that satisfy filter conditions.
With components such as payload indexes (fast lookup structures for filterable fields), filter-aware traversal (HNSW navigation skips non-matching nodes), and dynamic candidate expansion (automatically fetch more candidates when filters are restrictive), the filterable HNSW is the way to go.
Let us take a look at it in more detail with the code below.
"""
Example usage of the filterable_hnsw module.
This demonstrates how to use Filterable HNSW with your own Qdrant collection.
"""
from filterable_hnsw import (
filterable_search,
compare_filtered_unfiltered,
display_filtered_results,
get_qdrant_client
)
from dotenv import load_dotenv
import os
load_dotenv()
# Initialize client
client = get_qdrant_client()
# Your collection name
COLLECTION_NAME = "automotive_parts" # Change this to your collection name
# Example 1: Filtered search
print("=" * 80)
print("EXAMPLE 1: Filtered Search (Filterable HNSW)")
print("=" * 80)
print("Searching: 'engine sensor' with category filter")
print("Expected: Finds semantically similar parts within the specified category\n")
query1 = "engine sensor"
# First get unfiltered results to see what categories exist
unfiltered_test1 = filterable_search(
collection_name=COLLECTION_NAME,
query=query1,
filter_conditions=None,
client=client,
limit=1
)
# Extract category from first result if available
if unfiltered_test1 and 'category' in unfiltered_test1[0]['payload']:
actual_category1 = unfiltered_test1[0]['payload']['category']
filter1 = {"category": actual_category1}
print(f"Using category from data: '{actual_category1}'\n")
else:
filter1 = {"category": "Engine Components"} # Fallback
filtered_results = filterable_search(
collection_name=COLLECTION_NAME,
query=query1,
filter_conditions=filter1,
client=client,
limit=5
)
display_filtered_results(
filtered_results,
query1,
show_fields=['part_name', 'part_id', 'category', 'description']
)
print("\n\n")
# Example 2: Comparison between Filterable HNSW and Post-Filtering
print("=" * 80)
print("EXAMPLE 2: Filterable HNSW vs Post-Filtering Comparison")
print("=" * 80)
print("Comparing filtering DURING traversal vs filtering AFTER retrieval")
print("Expected: Shows Filterable HNSW is more efficient (no wasted computation)\n")
query2 = "brake system"
# First get unfiltered results to see what categories exist
unfiltered_test2 = filterable_search(
collection_name=COLLECTION_NAME,
query=query2,
filter_conditions=None,
client=client,
limit=1
)
# Extract category from first result if available
if unfiltered_test2 and 'category' in unfiltered_test2[0]['payload']:
actual_category2 = unfiltered_test2[0]['payload']['category']
filter2 = {"category": actual_category2}
print(f"Using category from data: '{actual_category2}'\n")
else:
filter2 = {"category": "Braking System"} # Fallback
comparison = compare_filtered_unfiltered(
collection_name=COLLECTION_NAME,
query=query2,
filter_conditions=filter2,
client=client,
limit=5
)
print("\n\n")
# Example 3: Display detailed comparison
print("=" * 80)
print("EXAMPLE 3: Detailed Result Comparison")
print("=" * 80)
print("Top results from both methods:\n")
print("Post-Filtered Results (Top 3):")
print("-" * 80)
for i, result in enumerate(comparison["post_filtered"]["results"][:3], 1):
payload = result["payload"]
name = payload.get('part_name', payload.get('name', 'Unknown'))
category = payload.get('category', 'N/A')
print(f"{i}. {name}")
print(f" Category: {category}")
print(f" Score: {result['score']:.4f}")
print(f" ID: {result['id']}")
print("\nFilterable HNSW Results (Top 3):")
print("-" * 80)
for i, result in enumerate(comparison["filtered"]["results"][:3], 1):
payload = result["payload"]
name = payload.get('part_name', payload.get('name', 'Unknown'))
category = payload.get('category', 'N/A')
print(f"{i}. {name}")
print(f" Category: {category}")
print(f" Score: {result['score']:.4f}")
print(f" ID: {result['id']}")
print("\n" + "=" * 80)
print("SUMMARY:")
print("=" * 80)
print("Filterable HNSW:")
print(" - Filters DURING graph traversal (not before or after)")
print(" - Only navigates through nodes that satisfy filter conditions")
print(" - No wasted computation - doesn't retrieve then discard results")
print(" - More efficient than post-filtering which wastes >90% computation")
print(f" - In this example: {comparison['overlap_ratio']*100:.1f}% result overlap")
Let us now look at the Filterable HNSW in action with the implementation output
================================================================================
EXAMPLE 1: Filtered Search (Filterable HNSW)
================================================================================
Searching: 'engine sensor' with category filter
Expected: Finds semantically similar parts within the specified category
Using category from data: 'Safety Systems'
Filtered Search Results for: 'engine sensor'
================================================================================
Found 5 results
1. Safety Sensor Module 237
Part_name: Safety Sensor Module 237
Part_id: DEL-0000237
Category: Safety Systems
Description: Advanced safety sensor for ADAS applications including collision avoidance and driver assistance fea...
Score: 0.4092
--------------------------------------------------------------------------------
2. Safety Sensor Module 240
Part_name: Safety Sensor Module 240
Part_id: BOS-0000240
Category: Safety Systems
Description: Advanced safety sensor for ADAS applications including collision avoidance and driver assistance fea...
Score: 0.4052
--------------------------------------------------------------------------------
3. Safety Sensor Module 242
Part_name: Safety Sensor Module 242
Part_id: VAL-0000242
Category: Safety Systems
Description: Advanced safety sensor for ADAS applications including collision avoidance and driver assistance fea...
Score: 0.4004
--------------------------------------------------------------------------------
4. Safety Sensor Module 246
Part_name: Safety Sensor Module 246
Part_id: CON-0000246
Category: Safety Systems
Description: Advanced safety sensor for ADAS applications including collision avoidance and driver assistance fea...
Score: 0.3983
--------------------------------------------------------------------------------
5. Safety Sensor Module 234
Part_name: Safety Sensor Module 234
Part_id: ZF-0000234
Category: Safety Systems
Description: Advanced safety sensor for ADAS applications including collision avoidance and driver assistance fea...
Score: 0.3978
--------------------------------------------------------------------------------
================================================================================
EXAMPLE 2: Filterable HNSW vs Post-Filtering Comparison
================================================================================
Comparing filtering DURING traversal vs filtering AFTER retrieval
Expected: Shows Filterable HNSW is more efficient (no wasted computation)
Using category from data: 'Braking System'
Comparing Filterable HNSW vs Post-Filtering for: 'brake system'
Filters: {'category': 'Braking System'}
================================================================================
1. Post-Filtering (Inefficient)
Retrieves many results, then filters AFTER retrieval
--------------------------------------------------------------------------------
2. Filterable HNSW (Efficient)
Filters DURING graph traversal - only navigates matching nodes
--------------------------------------------------------------------------------
================================================================================
COMPARISON SUMMARY
================================================================================
Post-Filtering (Traditional Approach):
Time: 126.94 ms
Results: 5
Approach: Retrieves 50 candidates, discards 45
Top Score: 0.6419
Filterable HNSW:
Time: 79.26 ms
Results: 5
Approach: Only navigates through nodes matching filter conditions
Top Score: 0.6419
Overlap:
Common Results: 5 / 5 (100.0%)
Filterable HNSW is 1.60x faster
Key Difference:
Post-Filtering: Wastes computation by retrieving and discarding results
Filterable HNSW: Filters during graph traversal - no wasted computation
================================================================================
================================================================================
EXAMPLE 3: Detailed Result Comparison
================================================================================
Top results from both methods:
Post-Filtered Results (Top 3):
--------------------------------------------------------------------------------
1. Brake Control Component 168
Category: Braking System
Score: 0.6419
ID: 1794233379
2. Brake Control Component 154
Category: Braking System
Score: 0.6396
ID: 3151300734
3. Brake Control Component 176
Category: Braking System
Score: 0.6394
ID: 1517692434
Filterable HNSW Results (Top 3):
--------------------------------------------------------------------------------
1. Brake Control Component 168
Category: Braking System
Score: 0.6419
ID: 1794233379
2. Brake Control Component 154
Category: Braking System
Score: 0.6396
ID: 3151300734
3. Brake Control Component 176
Category: Braking System
Score: 0.6394
ID: 1517692434
================================================================================
SUMMARY:
================================================================================
Filterable HNSW:
- Filters DURING graph traversal (not before or after)
- Only navigates through nodes that satisfy filter conditions
- No wasted computation - doesn't retrieve then discard results
- More efficient than post-filtering which wastes >90% computation
- In this example: 100.0% result overlap
Benefits
As you can clearly see from the results, filterable HNSW offers computational efficiency, achieving 1.6 times faster performance. There is also no wasted computation, as you can see from the results, post filtering retrieved 50 items and discarded 45 of them, whereas filterable HNSW only navigated nodes matching the "breaking system" category. The results are also guaranteed for good quality, as you can see from the overlap (all 5 results are identical between methods).
Costs
For us to be able to execute filterable HNSW, we have a payload index overhead in creating an index for the category, supplier, and in_stock field. For a million parts, we are looking at a minimum of 6% overhead. Also, we need to consider the maintenance aspect of it, as every new part indexed must update the payload indexes. Also to keep in mind is the fact that complex OR conditions may degrade performance on the filtering. Also, payload indexes are kept in RAM for faster access, so there is no need to account for this in capacity planning.
When to Use
- When the results are frequently filtered
- When the filters are selective (reduce results by more than 50%)
- When the data has categorical/structured metadata
When Not to Use
- When filters are rarely used
- Filters are not selective (remove less than 20% of results)
- Very small datasets (less than 10,000 items
Efficiency Comparison
| Approach | candidates retrieved | results returned | wasted work | cpu efficiency |
|---|---|---|---|---|
| Post Filtering | 50 | 5 | 45 (90%) | 10% Efficient |
| Filterable HNSW | 5 | 5 | 0 (0%) | 100 % efficient |
Performance Characteristics
Based on the results, let us now look at the performance characteristics
| Metric | post filtering | filterable hnsw | evidence from the data |
|---|---|---|---|
| Query Latency | 126.94ms | 79.26ms | 1.6 times faster |
| Wasted Computation | 90% | 0% | No wasted computation by filterable HNSW |
| Result Quality | 0.6419 (top score) | 0.6419 (top score) | 100% overlap |
| Memory Overhead | Baseline | +5-10% | Payload indexes for the categories and other fields |
| Scalability | Degrades with Selectivity | Constant Performance | More selective filter, bigger speedup for filterable HNSW |
Conclusion
We have looked at the concept and also the results for filterable HNSW and concluded that the more selective the filters are, the better the output for the results. The bottom line is that if more than 30% of your queries use filterable HNSW, unlike the previous two techniques discussed in the series, filterable HNSW just gives pure gain and no overheads.
In the next part of the series, we will look at multi-vector search and its advantages and disadvantages.
Opinions expressed by DZone contributors are their own.
Comments