Build Multimodal RAG Apps With Amazon Bedrock and OpenSearch

Deploying a scalable Multimodal RAG application using Amazon Bedrock for embeddings and language models, and Amazon OpenSearch as a vector store.

Mahesh VK

Mar. 20, 25 · Tutorial

Likes (2)

Comment

Save

1.4K Views

Scenario

Customer support tickets with screenshots, technical documentation with diagrams, and a mountain of legacy PDFs — all containing valuable information, but impossible to query efficiently.

"There has to be a better way," I thought. That's when I dove headfirst into the world of multimodal retrieval-augmented generation (RAG).

The Multimodal Revelation

Like many developers, I had experimented with basic RAG systems that worked well enough for text. But our real-world data isn't just text. When a customer sends a screenshot of an error along with a description, or when our medical clients need to cross-reference radiology images with patient notes, text-only RAG falls short.

The revelation came when I realized we could generate embeddings for different types of data — text, images, and potentially audio — and use them together. Let me share what I learned by building a multimodal RAG system with AWS.

The AWS Services That Made It Possible

After several false starts with custom solutions, I landed on a combination that worked surprisingly well:

Amazon Bedrock for both embeddings and LLMs
Amazon OpenSearch as our vector database
AWS Lambda and API Gateway for deployment

Why this stack? Honestly, I'd tried maintaining my own embedding models and it was a nightmare of GPU provisioning and version conflicts. Bedrock's fully managed approach meant I could focus on application logic instead of infrastructure.

How I Built It: The Architecture

Here's the approach that ultimately worked for us:

1. Data Ingestion Pipeline

The first challenge was converting our diverse data into embeddings. For text, the solution was straightforward:

    Python
   
 

   import boto3
import json

bedrock = boto3.client(service_name='bedrock-runtime')

def get_text_embedding(text):
    response = bedrock.invoke_model(
        body=json.dumps({"inputText": text}),
        modelId="amazon.titan-embed-text-v1",
        accept="application/json",
        contentType="application/json"
    )
    return json.loads(response.get('body').read())['embedding']

  

For images, though, I hit a roadblock. After experimenting with several approaches, I deployed a CLIP model on SageMaker that converted images into the same embedding space. This was tricky to get right — the dimensions had to be carefully managed.

2. Setting Up OpenSearch for Vector Search

The trickiest part was configuring OpenSearch correctly. After several failed attempts with incorrect dimensions, this mapping finally worked:

    JSON
   
 

   PUT /rag-index
{
  "mappings": {
    "properties": {
      "embedding": {
        "type": "knn_vector",
        "dimension": 1536 # Titan Embeddings dimension
      },
      "text": {"type": "text"},
      "image_embedding": {
        "type": "knn_vector",
        "dimension": 512 # CLIP embedding dimension
      }
    }
  }
}
  

A word of caution: make sure your dimension values match your embedding models exactly, or you'll spend hours debugging cryptic errors as I did!

3. The Retrieval Logic That Saved Our Project

The most elegant part of the solution was the retrieval function. Initially, I tried complicated hybrid approaches, but ended up with something simpler that worked better:

    Python
   
 

   def retrieve_documents(query_text, query_image=None):
    query_embedding = get_text_embedding(query_text)
    
    knn_query = {
        "size": 5,
        "query": {
            "knn": {
                "embedding": {
                    "vector": query_embedding,
                    "k": 5
                }
            }
        }
    }
    
    results = opensearch.search(body=knn_query, index="rag-index")
    return results['hits']['hits']

  

The real magic happened when combining this with Claude on Bedrock:

    Python
   
   def generate_answer(query, context):
    prompt = f"""Human: Answer this query: {query} using the context:
    {context}
    
    A:"""
    
    response = bedrock.invoke_model(
        body=json.dumps({"prompt": prompt}),
        modelId="anthropic.claude-v2"
    )
    
    return json.loads(response.get('body').read())['completion']

Real-World Impact and Lessons Learned

When we deployed this to production, the results were immediate. Our customer support team, which previously struggled with "I can't describe this error, here's a screenshot" tickets, could now instantly retrieve similar past issues.

Some hard-earned lessons:

Cost management is crucial. Embedding generation costs can add up quickly. We implemented a caching layer that reduced our API calls by 70%.
Start with text, then add images. Don't try to solve the multimodal problem all at once. Get the text working perfectly first.
Latency matters. We initially put everything in Lambda, but for large embedding operations, we moved to dedicated EC2 instances with results cached in ElastiCache.

The Future of Our Multimodal RAG System

I'm most excited about extending this approach to video content. We're experimenting with extracting key frames and generating embeddings that can help retrieve relevant video segments.

We're also looking at adding speech transcription to bring audio into our multimodal mix — imagine being able to search through customer calls alongside documentation and screenshots.

Try It Yourself

If you're facing similar challenges with disconnected data sources, I encourage you to experiment with multimodal RAG. Start small, perhaps with just text and a few test images. The AWS stack makes iteration relatively painless.

Build Multimodal RAG Apps With Amazon Bedrock and OpenSearch

Deploying a scalable Multimodal RAG application using Amazon Bedrock for embeddings and language models, and Amazon OpenSearch as a vector store.

Scenario

The Multimodal Revelation

The AWS Services That Made It Possible

How I Built It: The Architecture

1. Data Ingestion Pipeline

2. Setting Up OpenSearch for Vector Search

3. The Retrieval Logic That Saved Our Project

Real-World Impact and Lessons Learned

The Future of Our Multimodal RAG System

Try It Yourself

Further Reading

Partner Resources

Related

Trending

Build Multimodal RAG Apps With Amazon Bedrock and OpenSearch

Deploying a scalable Multimodal RAG application using Amazon Bedrock for embeddings and language models, and Amazon OpenSearch as a vector store.

Scenario

The Multimodal Revelation

The AWS Services That Made It Possible

How I Built It: The Architecture

1. Data Ingestion Pipeline

2. Setting Up OpenSearch for Vector Search

3. The Retrieval Logic That Saved Our Project

Real-World Impact and Lessons Learned

The Future of Our Multimodal RAG System

Try It Yourself

Further Reading

Related

Partner Resources