Essential Techniques for Production Vector Search Systems Part 1 - Hybrid Search

Proven techniques for production vector search including when to use each one, how to combine them effectively, and trade offs to understand before deployment.

Pavan Vemuri

CORE ·

Jan. 08, 26 · Analysis

Likes (2)

Comment

Save

2.1K Views

After implementing vector search systems at multiple companies, I wanted to document efficient techniques that could be very helpful for successful production deployments of vector search systems.

I want to present these techniques, showcasing when to apply each of them, how they complement each other, and the trade-offs they introduce. This will be a multi-part series that introduces all of the techniques one by one in each article. I have also included code snippets to quickly test each of the techniques.

Before we get into the real details, let us look at the prerequisites and setup.

For ease of understanding and use, I am using the free cloud tier from Qdrant for all of the demonstrations below.

Steps to Set Up Qdrant Cloud

Step 1: Get a Free Qdrant Cloud Cluster

Sign up at https://cloud.qdrant.io
Create a free cluster
- Click Create Cluster
- Select Free Tier
- Choose a region closest to you
- Wait for the cluster to be provisioned
Capture your credentials
- Cluster URL: https://xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.us-east.aws.cloud.qdrant.io:6333
- API Key: Click API Keys → Generate → Copy the key

Step 2: Install Python Dependencies

    PowerShell
   
   pip install qdrant-client fastembed numpy

Recommended versions:

qdrant-client >= 1.7.0
fastembed >= 0.2.0
numpy >= 1.24.0
python-dotenv >= 1.0.0

Step 3: Set Environment Variables or Create a `.env` File

    PowerShell
   
   # Add to your ~/.bashrc or ~/.zshrc
export QDRANT_URL="https://your-cluster-url.cloud.qdrant.io:6333"
export QDRANT_API_KEY="your-api-key-here"

Create a .env file in the project directory with the following content.

Remember to add .env to your .gitignore to avoid committing credentials.

    PowerShell
   
   # .env file
QDRANT_URL=https://your-cluster-url.cloud.qdrant.io:6333
QDRANT_API_KEY=your-api-key-here

Step 4: Verify the Connection

We can verify the connection to the Qdrant collection with the following script. From this point on, I am assuming the .env setup.

    Python
   
 

   from qdrant_client import QdrantClient
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Initialize client
client = QdrantClient(
    url=os.getenv("QDRANT_URL"),
    api_key=os.getenv("QDRANT_API_KEY"),
)

# Test connection
try:
    collections = client.get_collections()
    print(f" Connected successfully!")
    print(f"   Current collections: {len(collections.collections)}")
except Exception as e:
    print(f" Connection failed: {e}")
    print("   Check your .env file has QDRANT_URL and QDRANT_API_KEY")
  

Expected output:

    Plain Text
   
   python verify-connection.py
Connected successfully!
   Current collections: 2

Now that we have the setup out of the way, we can get into the meat of the article.

Before the deep dive, let us look at a high-level overview of the techniques we are about to cover.

Technique	problems solved	performance impact	complexity
Hybrid Search	we will miss exact matches if we employ semantic search purely	huge increase in the accuracy, closer to 16%	Medium
Binary Quantization	Memory costs scale linearly with Data	40X memory reduction, 15% faster	Low
Filterable HNSW	Not a good practice to apply post filtering as is wastes computation	5X faster filtered queries	Medium
Multi Vector	Advanced models need multiple embeddings per document	Enables ColBERT and multi modal	High
Distributed Architecture	Single node limits throughput and availability	32X throughput and 99.99% uptime	High

Keep in mind that production systems typically combine two to four of these techniques.

For example, a typical e-commerce website might use Hybrid Search, Binary Quantization, and Filterable HNSW.

Now that we have the high-level overview, we will look at each technique in detail in this multi-part series, starting with Hybrid Search.

Hybrid Search

While developing many search applications, one thing I have learned is that semantic search alone will not suffice, and at the same time, keyword search alone will not suffice. We need to move toward a hybrid approach.

When users search for specific product names, SKUs, or technical specifications, pure semantic search often returns semantically similar but incorrect results.

Let us look at an example. If we are using only semantic search and we search with a part ID such as "BOS-0000240", the exact part will not show up in the results.

You can test this yourself using the following skeleton hybrid_search.py code. You just need to write an implementation example, which will help you understand how hybrid search works.

    Python
   
 

   """Generic Hybrid Search Implementation for Qdrant"""

from qdrant_client import QdrantClient
from dotenv import load_dotenv
import os
from typing import List, Dict, Any, Optional
from collections import Counter
import re

# Cache the embedding model globally
_embedding_model = None

def get_embedding_model():
    """Get or create the embedding model (cached)."""
    global _embedding_model
    if _embedding_model is None:
        try:
            from sentence_transformers import SentenceTransformer
            _embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
        except ImportError:
            raise ImportError(
                "sentence-transformers not installed. Install it with: pip install sentence-transformers"
            )
    return _embedding_model


def get_qdrant_client() -> QdrantClient:
    """Initialize and return Qdrant client."""
    load_dotenv()
    return QdrantClient(
        url=os.getenv("QDRANT_URL"),
        api_key=os.getenv("QDRANT_API_KEY"),
    )


def tokenize(text: str) -> List[str]:
    """Simple tokenization - split on whitespace and punctuation."""
    text = re.sub(r'[^\w\s]', ' ', text.lower())
    return [word for word in text.split() if len(word) > 2]


def calculate_tf_score(query_words: List[str], text: str, boost_exact: bool = True) -> float:
    """Calculate Term Frequency score for keyword matching."""
    text_words = tokenize(text)
    if not text_words or not query_words:
        return 0.0
    
    word_counts = Counter(text_words)
    query_word_counts = Counter(query_words)
    
    score = 0.0
    total_words = len(text_words)
    matched_words = 0
    
    for word, query_freq in query_word_counts.items():
        text_freq = word_counts.get(word, 0)
        if text_freq > 0:
            matched_words += 1
            tf = text_freq / total_words
            boost = 2.0 if boost_exact else 1.0
            score += tf * query_freq * boost
    
    match_ratio = matched_words / len(set(query_words)) if query_words else 0.0
    
    if match_ratio == 1.0:
        score *= 1.5
    
    base_score = score / len(set(query_words)) if query_words else 0.0
    return base_score * match_ratio


def hybrid_search(
    collection_name: str,
    query: str,
    client: Optional[QdrantClient] = None,
    vector_weight: float = 0.7,
    keyword_weight: float = 0.3,
    limit: int = 10,
    keyword_fields: Optional[List[str]] = None
) -> List[Dict[str, Any]]:
    """Perform hybrid search combining vector and keyword search."""
    if client is None:
        client = get_qdrant_client()
    
    if keyword_fields is None:
        keyword_fields = ['text', 'description', 'name', 'title', 'content']
    
    query_words = tokenize(query)
    vector_results = []
    try:
        model = get_embedding_model()
        query_vector = model.encode(query).tolist()
        
        vector_search_response = client.query_points(
            collection_name=collection_name,
            query=query_vector,
            limit=limit * 3,
            with_payload=True
        )
        vector_search = vector_search_response.points
        
        if vector_search:
            scores = [r.score for r in vector_search]
            min_score = min(scores)
            max_score = max(scores)
            score_range = max_score - min_score if max_score > min_score else 1.0
            
            vector_results = [
                {
                    "id": r.id,
                    "score": (r.score - min_score) / score_range if score_range > 0 else r.score,
                    "payload": r.payload,
                    "raw_score": r.score
                }
                for r in vector_search
            ]
    except Exception as e:
        print(f"Vector search error: {e}")
    
    keyword_results = []
    try:
        collection_info = client.get_collection(collection_name)
        total_points = collection_info.points_count
        scroll_limit = min(total_points, 1000)
        
        all_points = client.scroll(
            collection_name=collection_name,
            limit=scroll_limit,
            with_payload=True,
            with_vectors=False
        )
        
        for point in all_points[0]:
            keyword_score = 0.0
            exact_match_found = False
            query_lower = query.lower().strip()
            query_upper = query.upper().strip()
            
            for field in keyword_fields:
                field_value = point.payload.get(field, "")
                if isinstance(field_value, str):
                    field_lower = field_value.lower().strip()
                    field_upper = field_value.upper().strip()
                    
                    if query_lower == field_lower or query_upper == field_upper:
                        exact_match_found = True
                        keyword_score = 1.0
                        break
            
            if not exact_match_found:
                for field in keyword_fields:
                    field_value = point.payload.get(field, "")
                    if isinstance(field_value, str):
                        if query_lower in field_value.lower():
                            exact_match_found = True
                        
                        score = calculate_tf_score(query_words, field_value, boost_exact=True)
                        if field in ['name', 'title', 'id', 'part_id', 'part_name']:
                            keyword_score += score * 3.0
                        else:
                            keyword_score += score * 1.0
                
                if exact_match_found:
                    keyword_score *= 5.0
                    keyword_score = min(keyword_score / 10.0, 1.0)
                else:
                    keyword_score = min(keyword_score / 10.0, 1.0)
            
            if keyword_score > 0:
                keyword_results.append({
                    "id": point.id,
                    "score": keyword_score,
                    "payload": point.payload
                })
        
        keyword_results.sort(key=lambda x: x["score"], reverse=True)
        keyword_results = keyword_results[:limit * 3]
        
    except Exception as e:
        print(f"Keyword search error: {e}")
    
    fused_results = {}
    
    for result in vector_results:
        point_id = result["id"]
        fused_results[point_id] = {
            "id": point_id,
            "payload": result["payload"],
            "vector_score": result["score"],
            "keyword_score": 0.0,
            "combined_score": 0.0,
            "raw_vector_score": result.get("raw_score", 0.0)
        }
    
    for result in keyword_results:
        point_id = result["id"]
        if point_id not in fused_results:
            fused_results[point_id] = {
                "id": point_id,
                "payload": result["payload"],
                "vector_score": 0.0,
                "keyword_score": 0.0,
                "combined_score": 0.0,
                "raw_vector_score": 0.0
            }
        fused_results[point_id]["keyword_score"] = result["score"]
    
    for point_id, result in fused_results.items():
        if result["keyword_score"] >= 0.99:
            result["combined_score"] = 1.0
        elif result["vector_score"] > 0 and result["keyword_score"] > 0:
            result["combined_score"] = (
                vector_weight * result["vector_score"] +
                keyword_weight * result["keyword_score"]
            )
        elif result["vector_score"] > 0:
            result["combined_score"] = vector_weight * result["vector_score"]
        elif result["keyword_score"] > 0:
            result["combined_score"] = keyword_weight * result["keyword_score"]
    
    sorted_results = sorted(
        fused_results.values(),
        key=lambda x: x["combined_score"],
        reverse=True
    )
    
    return sorted_results[:limit]


def display_results(results: List[Dict[str, Any]], query: str, show_fields: Optional[List[str]] = None):
    """Display search results."""
    if show_fields is None:
        show_fields = ['name', 'title', 'description', 'text']
    
    print(f"\nHybrid Search Results for: '{query}'")
    print("=" * 80)
    
    if not results:
        print("No results found.")
        return
    
    for i, result in enumerate(results, 1):
        payload = result["payload"]
        display_name = "Result"
        for field in ['name', 'title', 'part_name', 'id', 'part_id']:
            if field in payload:
                display_name = str(payload[field])
                break
        
        print(f"\n{i}. {display_name}")
        
        for field in show_fields:
            if field in payload and field not in ['name', 'title']:
                value = payload[field]
                if isinstance(value, str):
                    print(f"   {field.capitalize()}: {value[:100]}{'...' if len(value) > 100 else ''}")
        
        print(f"   Scores: Vector={result['vector_score']:.3f}, "
              f"Keyword={result['keyword_score']:.3f}, "
              f"Combined={result['combined_score']:.3f}")
        
        if result.get('raw_vector_score'):
            print(f"   Raw Vector Score: {result['raw_vector_score']:.4f}")
        
        print("-" * 80)


if __name__ == "__main__":
    collection_name = os.getenv("QDRANT_COLLECTION", "your_collection")
    query1 = "example query for exact match"
    
    try:
        client = get_qdrant_client()
        results1 = hybrid_search(
            collection_name=collection_name,
            query=query1,
            client=client,
            vector_weight=0.7,
            keyword_weight=0.3,
            limit=3,
            keyword_fields=['name', 'description', 'text']
        )
        display_results(results1, query1, show_fields=['name', 'description'])
    except Exception as e:
        print(f"Error: {e}")
        print("\nTo use this script:")
        print("1. Set QDRANT_URL and QDRANT_API_KEY in your .env file")
        print("2. Set QDRANT_COLLECTION environment variable or update collection_name in code")
        print("3. Adjust keyword_fields to match your collection's payload structure")

  

Let us look at the implementation of the above code with an example from one of my collections.

    Plain Text
   
 

    python example_usage.py
================================================================================
EXAMPLE 1: Exact Match Search (Keyword Search)
================================================================================
Searching by Part ID: 'BOS-0000240'
Expected: Exact match via keyword search


Hybrid Search Results for: 'BOS-0000240'
================================================================================

1. Safety Sensor Module 240
   Part_name: Safety Sensor Module 240
   Part_id: BOS-0000240
   Category: Safety Systems
   Description: Advanced safety sensor for ADAS applications including collision avoidance and driver assistance fea...
   Scores: Vector=0.000, Keyword=1.000, Combined=1.000
   ✓ Keyword match found


================================================================================
EXAMPLE 2: Semantic Search (Vector Search)
================================================================================
Searching by meaning: 'collision detection device'
Expected: Finds semantically similar parts, even though exact words don't match


Hybrid Search Results for: 'collision detection device'
================================================================================
 Note: No strong keyword matches found. Results are based on semantic similarity.
   (The collection may not contain exact matches for your query)


1. Safety Sensor Module 239
   Part_name: Safety Sensor Module 239
   Part_id: NXP-0000239
   Category: Safety Systems
   Description: Advanced safety sensor for ADAS applications including collision avoidance and driver assistance fea...
   Scores: Vector=1.000, Keyword=0.000, Combined=0.700
   → Semantic similarity match
   Raw Vector Score: 0.4632
--------------------------------------------------------------------------------

2. Safety Sensor Module 211
   Part_name: Safety Sensor Module 211
   Part_id: AMP-0000211
   Category: Safety Systems
   Description: Advanced safety sensor for ADAS applications including collision avoidance and driver assistance fea...
   Scores: Vector=0.789, Keyword=0.000, Combined=0.552
   → Semantic similarity match
   Raw Vector Score: 0.4609
--------------------------------------------------------------------------------

3. Safety Sensor Module 242
   Part_name: Safety Sensor Module 242
   Part_id: VAL-0000242
   Category: Safety Systems
   Description: Advanced safety sensor for ADAS applications including collision avoidance and driver assistance fea...
   Scores: Vector=0.752, Keyword=0.004, Combined=0.527
   → Semantic similarity match
   Raw Vector Score: 0.4605
--------------------------------------------------------------------------------

  

Let Us Understand the Output

Search 1: Searched using the part number and retrieved the exact part.
Search 2: Searched using context and retrieved products matching the context.

Dense embeddings struggle with retrieving information based on SKUs, part numbers, and exact specifications. This is where hybrid search comes in handy by combining semantic understanding with exact keyword matching.

Now let us look at the benefits and costs of using hybrid search.

Benefits

Exact match accuracy: Dense embeddings cannot match arbitrary codes or SKUs, which is where sparse vectors help by matching exact tokens.
Specification-based search: Dense embeddings treat numbers as preferences rather than requirements. Sparse vectors enforce exact numeric matches.
Robustness: Users search in multiple ways — sometimes contextually, sometimes with exact terms. Hybrid search handles both.

Costs

Storage: Because we store both dense and sparse vectors, we effectively double the number of vectors per document, resulting in roughly 2× storage and RAM costs.
Indexing complexity: Dense-only search requires one model and one embedding. Hybrid search requires two models, two embeddings, more code, more failure points, and more debugging.
Maintenance burden: As a result, the operational and maintenance burden is also higher.

When to Use

Searching by SKUs or product numbers to find exact matches
Technical documentation with API names, error codes, and function signatures
Medical or legal search requiring both exact citations and semantic understanding

When Not to Use

Pure recommendation systems where semantic similarity is sufficient
Extremely low-latency requirements, since fusion adds overhead
Simple full-text search where traditional search engines are sufficient

Metrics Overview

Let us look at the results in little bit more detail from a metric standpoint which will give us a bit more idea about the hybrid search.

Metric	Dense Only	Hybrid search	Evidence from the search
MRR@10	0.60-0.70	.0.95+	for Part ID queries 1.0 and for semantic queries 0.9+
Recall@10	0.65	0.90	Finds parts with IDs in specs that dense misses
Query Latency	30-35ms	35-40ms	Only 5ms for the fusion overhead
False Positives	40-60%	<5%	Dense vector had 0.0000 when searched with PartID

MRR@10: Average position of the first correct result in the top 10

Recall@10: Percentage of relevant results found in the top 10

Query Latency: Time from query submission to results

False Positives: Rate of incorrect results for exact-term searches

Conclusion

Do not start with hybrid search from the outset — introduce it as the situation demands. As we have seen, hybrid search is particularly useful when queries mix semantic intent and exact terms and are difficult to separate cleanly.

It is always good practice to start simple and add complexity only when metrics justify it.

The techniques described in this article are database-agnostic, though implementations may vary. Qdrant provides native support for hybrid search, while other databases may require workarounds or may not support it at all.

In the next part of the series, we will look at Binary Quantization.

Data structure Semantic search Production (computer science) systems Data Types AI

Opinions expressed by DZone contributors are their own.

Related

Trending