DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Building a New Testing Mindset for AI-Powered Web Apps
  • Prompt and Retrieval Augmented Generation Using Generative AI Models
  • Hallucination Has Real Consequences — Lessons From Building AI Systems
  • Building a Production-Ready AI Agent in 2026: Beyond the Hello World Demo

Trending

  • Swift Concurrency Part 4: Actors, Executors, and Reentrancy
  • Why Your RAG Pipeline Will Fail Without an MCP Server
  • Evolving Spring Boot APIs to an Event-Driven Mesh
  • Manual Investigation: The Hidden Bottleneck in Incident Response
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Building a Customer Intelligence AI Agent With OpenSearch and LLMs

Building a Customer Intelligence AI Agent With OpenSearch and LLMs

Imagine being able to ask a question about a customer and receive a grounded, natural language answer combining all that data.

By 
Haymang Ahuja user avatar
Haymang Ahuja
·
Dec. 02, 25 · Tutorial
Likes (0)
Comment
Save
Tweet
Share
1.3K Views

Join the DZone community and get the full member experience.

Join For Free

The Problem

You have three types of customer data:

Types of customer data


You want to support questions like:

  • “What are the top business goals for Acme Corp?”
  • “How well did we capture their current pain points?”
  • “What are the growth strategies for Globex?”
  • “What is the ratio of business we have from a customers compared to comptetitors?”

Architecture Overview

Here’s the high-level structure of our AI agent:

Architecture overview


Understanding the Building Blocks

1. What Is BM25?

A ranking function used by traditional search engines like OpenSearch to score documents based on:

  • Term frequency: How often a query term appears in the document.
  • Inverse document frequency: How rare that term is across all documents.
  • Document length normalization.

Think of it as a smarter version of keyword matching  —  not as smart as AI, but very fast and useful for exact matches.

2. What Is Vector Search?

Vector seach enables semantic retrieval  —  instead of matching exact words, it matches based on meaning.

You:

  • Convert text into vectors (a list of floats) using an embedding model.
  • Store those vectors in OpenSearch.
  • At query time, embed the user’s query and find nearest vectors (most similar meanings)

Example embedding with sentence-transformers:

Python
 
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
vector = model.encode("Acme Corp wants predictive maintenance")
print(len(vector))  # e.g. 384 dimensions


Popular models for embedding: all-MiniLM-L6-v2, bge-small-en, or OpenAI’s text-embedding-3-small.

Step-by-Step: Indexing the Data

Before you can search, you need to index your data into OpenSearch.

Indexing the data

What Each Component Does

Implementation stack


Create an Index With BM25 + Vector

We define both text and embedding fields in OpenSearch. If the k-NN plugin is enabled, we use kmm_vector; otherwise fall back to dense_vector.

Python
 
from opensearchpy import OpenSearch

client = OpenSearch("https://localhost:9200", http_auth=("admin","admin"), verify_certs=False)
mapping = {
    "settings": {"index": {"knn": True}},
    "mappings": {
        "properties": {
            "customer_id": {"type": "keyword"},
            "section_text": {"type": "text"},
            "section_score": {"type": "float"},
            "insights": {"type": "nested"},
            "embedding": {"type": "knn_vector", "dimension": 384}  # match your embedding model
        }
    }
}
client.indices.create(index="customers", body=mapping)


Chunking and Indexing Customer Docs

We split documents into sections, embed them, and store insights + quality scores.

Python
 
from sentence_transformers import SentenceTransformer
from opensearchpy import helpers
import uuid

model = SentenceTransformer("all-MiniLM-L6-v2")
def chunk_text(text, max_words=300, overlap=50):
    words = text.split()
    for i in range(0, len(words), max_words - overlap):
        yield " ".join(words[i:i+max_words])
def index_customer_doc(customer_id, doc_text, insights, section_scores):
    chunks = list(chunk_text(doc_text))
    embeddings = model.encode(chunks, convert_to_numpy=True)
    actions = []
    for i, (chunk, emb) in enumerate(zip(chunks, embeddings)):
        section_id = str(uuid.uuid4())
        actions.append({
            "_index": "customers",
            "_id": f"{customer_id}_{section_id}",
            "_source": {
                "customer_id": customer_id,
                "section_id": section_id,
                "section_text": chunk,
                "insights": insights,
                "section_score": section_scores.get(i, 0.5),
                "embedding": emb.tolist(),
            }
        })
    helpers.bulk(client, actions)


Hybrid Retrieval

At the query time, we combine:

  • BM25 -> exact matches
  • Vector similarity -> semantic matches
  • Section score -> quality signal
Python
 
def bm25_search(customer_id, query, size=5):
    body = {
        "query": {
            "bool": {
                "must": [{"term": {"customer_id": customer_id}}],
                "should": [{"match": {"section_text": query}}]
            }
        },
        "size": size
    }
    return client.search(index="customers", body=body)["hits"]["hits"]

def vector_search(customer_id, query_vector, k=5):
    body = {
        "size": k,
        "query": {
            "knn": {"embedding": {"vector": query_vector, "k": k}}
        }
    }
    return client.search(index="customers", body=body)["hits"]["hits"]
def combined_retrieval(customer_id, query):
    q_vec = model.encode([query])[0].tolist()
    bm25_hits = bm25_search(customer_id, query)
    vec_hits = vector_search(customer_id, q_vec)
    # Merge + rank
    results = {}
    for hit in bm25_hits + vec_hits:
        sid = hit["_source"]["section_id"]
        results[sid] = hit["_source"]
    return list(results.values())


Prompting the LLM

Once we have retrieved insights and snippets, we pass them into an LLM with a grounded prompt.

Python
 
import openai
openai.api_key = "sk-..."

def build_prompt(customer_id, insights, snippets, question):
    return f"""
Customer: {customer_id}
Key Insights:
{chr(10).join([f"- {i['key']}: {i['value']}" for i in insights])}
Document Snippets:
{chr(10).join([s['section_text'][:200] for s in snippets])}
User Question: {question}
Answer using only the information above.
"""
def answer_question(prompt):
    resp = openai.ChatCompletion.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Answer strictly using the provided context."},
            {"role": "user", "content": prompt}
        ]
    )
    return resp["choices"][0]["message"]["content"]


Example Prompt Sent to the LLM

Plain Text
 
Customer: Acme Corp

Key Insights:
- Goal: Reduce manufacturing downtime by 20%
- Initiative: Implement AI-based predictive maintenance

Document Snippets:
- "Acme Corp is exploring AI to improve their manufacturing lines..."
Section Scoring:

- Business Goals section: Score 0.87
- Implementation Plan: Score 0.65

User Question: What are Acme Corp's business goals?
Answer using only the information above.


REST API With FastAPI

Finally, we expose this pipeline as a REST API.

Python
 
from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()
class AskRequest(BaseModel):
    customer_id: str
    question: str
@app.post("/ask")
def ask(req: AskRequest):
    snippets = combined_retrieval(req.customer_id, req.question)
    insights = snippets[0].get("insights", [])
    prompt = build_prompt(req.customer_id, insights, snippets, req.question)
    answer = answer_question(prompt)
    return {"answer": answer, "snippets": snippets}


Shell
 
-- Run it with:
uvicorn app:app --reload --port 8000


Real-World Use Case: Enterprise Sales

Sales teams often have:

  • Long narrative notes about a client.
  • Insights from market research.
  • Internal QA scoring of document completeness.

With this setup, you can:

  • Ask, “What are the growth opportunities, and expansion strategies for a customer?”
  • Spot where the documentation is weak.
  • Provide auto-generated summaries before meetings.

It turns a messy database into a searchable intelligence layer.

Final Thoughts

This hybrid retrieval-augmented generation (RAG) setup is one of the most powerful design patterns for enterprise AI:

  • Combines old-school search (BM25) with modern embeddings.
  • Leverages tabular + text data (unstructured text).
  • Gives explainable, LLM-powered answers.
  • Built entirely on open tooling.

Want to Try It?

I’m happy to share starter code or help you build your own version. Drop a comment below.

Coming Next

  • Multi-turn conversational agents: Build a dialogue layer on top of this pipeline so users can ask follow-up questions without losing context. This means storing conversation state, retrieving past snippets, and incrementally updating the prompt.
  • Auto-summarization pipelines: Use OpenSearch ingest pipelines (or async workers) to automatically chunk, embed, and summarize new customer documents as soon as they’re ingested. Think of it as a “continuous ETL” flow for customer intelligence.
  • Feedback-driven ranking: Instead of full ML fine-tuning, start with a feedback loop: log which answers users accept/reject, and adjust retrieval weights (bm25, vector, section_score) accordingly. Over time this gives you a simple but effective learning-to-rank system without diving into model training.
AI Question answering large language model

Opinions expressed by DZone contributors are their own.

Related

  • Building a New Testing Mindset for AI-Powered Web Apps
  • Prompt and Retrieval Augmented Generation Using Generative AI Models
  • Hallucination Has Real Consequences — Lessons From Building AI Systems
  • Building a Production-Ready AI Agent in 2026: Beyond the Hello World Demo

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook