DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • From Keywords to Meaning: The New Foundations of Intelligent Search
  • Essential Techniques for Production Vector Search Systems Part 1 - Hybrid Search
  • How To Build an AI-Powered Search Bar With Vector Embeddings and OpenAI
  • Vector Storage, Indexing, and Search With MariaDB

Trending

  • Scaling Cloud Data Automation: A Practical Guide to Open Table Formats
  • From Data Movement to Local Intelligence: The Shift from Centralized to Federated AI
  • Chaos Engineering Has a Blind Spot. Agentic AI Lives in It.
  • No More Cheap Claude: 4 First Principles of Token Economics in 2026
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Keyword vs Semantic Search With AI

Keyword vs Semantic Search With AI

Learn in this article how to build keyword-based and semantic search in MariaDB using Python, LangChain, FastAPI, and AI embeddings.

By 
Alejandro Duarte user avatar
Alejandro Duarte
DZone Core CORE ·
Oct. 30, 25 · Analysis
Likes (4)
Comment
Save
Tweet
Share
1.9K Views

Join the DZone community and get the full member experience.

Join For Free

When building a search for an application, you typically face two broad approaches:

  • Traditional keyword-based search — match words exactly or with simple variants.
  • Semantic (or vector) search — match meaning or context using AI embeddings.

There’s also a hybrid approach, but I will leave that for a future article. Instead, in this post, I’ll walk you through how the two broad approaches work in Python using MariaDB and an AI embedding model, highlight where they differ, and show code that you can adapt.

The Key Components

For this example, I used MariaDB Cloud to spin up a free serverless database. Within seconds, I had a free instance ready. I grabbed the host/user/password details, connected with VS Code, created a database called demo, created a products table, and loaded ~500 rows of product names via LOAD DATA LOCAL INFILE. This is an extremely small dataset, but it’s enough for learning and experimentation.

Then I built a small Python + FastAPI app. First, I implemented a simple keyword search (by product name) endpoint using a full-text index, then I implemented a semantic (vector) search using AI-generated vector embeddings + MariaDB’s vector support. You can see the whole process in this video.

Keyword-Based Search: Simple and Familiar

For keyword search, I used a full-text index on the name column of the products table. With this index in place, I could search by product name using this SQL query:

SQL
 
SELECT name
FROM products
ORDER BY MATCH(name) AGAINST(?)


I exposed this functionality using a FastAPI endpoint as follows:

Python
 
@app.get("/products/text-search")
def text_search(query: str):
    cursor = connection.cursor()
    cursor.execute(
        "SELECT name FROM products ORDER BY MATCH(name) AGAINST(?) LIMIT 10;", (query,)
    )


Pros:

  • Runs fast.
  • Works well when users type exact or close terms.
  • Uses built-in SQL features (no external AI model needed).

Cons:

  • Misses synonyms, context, or related meaning.
  • Doesn’t understand intent (if user types “running shoes”, a strict keyword search may miss “jogging trainers” or “sneakers”).
  • Quality depends heavily on the wording.

In my demo, the endpoint returned several products that were not relevant to “running shoes”.

Semantic (Vector) Search: Matching Meaning

To go beyond keywords, I implemented a second endpoint:

  1. I use an AI embedding model (Google Generative AI via LangChain) to convert each product name into a high-dimensional vector.
  2. Store those vectors in MariaDB with the vector integration for LangChain.
  3. At query time, embed the user’s search phrase into a vector (using exactly the same AI embedding model of the previous step), then perform a similarity search with the highly performant HNSW algorithm in MariaDB (e.g., top 10 nearest vectors) and return the corresponding products.

Here’s how I implemented the ingestion endpoint:

Python
 
@app.post("/products/ingest")
def ingest_products():
    cursor = connection.cursor()
    cursor.execute("SELECT name FROM products;")
    vector_store.add_texts([name for (name,) in cursor])
    return "Products ingested successfully"


And this is the semantic search endpoint:

Python
 
@app.get("/products/semantic-search")
def search_products(query: str):
    results = vector_store.similarity_search(query, k=10)
    return [doc.page_content for doc in results]


The LangChain integration for MariaDB makes the whole process extremely easy. The integration creates two tables:

  • langchain_collection: Each row represents a related set of vector embeddings. I have only one in this demo which corresponds to the product names.
  • langchain_embedding: The vector embeddings. Each vector belongs to a collection (many-to-one to langchain_collection).

When I ran the semantic search endpoint with the same query “running shoes”, the results felt much more relevant: they included products that didn’t match “running” or “shoes” literally but were semantically close.

Keyword vs. Semantic: When to Use Which

Here’s a quick comparison:

Approach Pros Cons
Keyword search Quick to set up, uses SQL directly Limited to literal term matching, less clever
Semantic search Matches meaning and context, more flexible Requires embedding model + vector support


Pick keyword search when:

  • Your search domain is small and predictable or, obviously, you need exact keyword match.
  • Users know exactly what they’re looking for (specific codes, exact names).
  • You want minimal dependencies and complexity.

Pick semantic search when:

  • You need to handle synonyms, similar concepts, and user intent.
  • The dataset or domain has natural language variation.
  • You’re willing to integrate an embedding model and manage vector storage/indexing. MariaDB helps with this.

In many real-world apps, you’ll use a hybrid: start with keyword search, and for higher-value queries or when exact match fails, fall back to semantic search. Or even mix the two via hybrid search. MariaDB helps with this, too.

How Simple the Integration Can Be

In my demo, I triggered vector ingestion via a POST endpoint (/ingest). That reads all product names, computes embeddings, and writes them to MariaDB. One line of code (via LangChain + MariaDB integration) handled the insertion of ~500 rows of vectors.

Once vectors are stored, adding a semantic search endpoint is just a few lines of code. The MariaDB vector support hidden most of the complexity.

The Source Code

You can find the code on GitHub. I have one simple, easy-to-follow program in webinar-main.py and another, more elaborate one with good practices in backend.py. Feel free to clone the repository, modify it, experiment with your own datasets, and let us know if there’s anything you’d like to see in the LangChain integration for MariaDB.


AI MariaDB Semantic search

Published at DZone with permission of Alejandro Duarte. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • From Keywords to Meaning: The New Foundations of Intelligent Search
  • Essential Techniques for Production Vector Search Systems Part 1 - Hybrid Search
  • How To Build an AI-Powered Search Bar With Vector Embeddings and OpenAI
  • Vector Storage, Indexing, and Search With MariaDB

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook