DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Personalized Search Optimization Using Semantic Models and Context-Aware NLP for Improved Results
  • Stop Debugging Glue Jobs Manually: Building an Agentic Observability Layer for Data Pipelines
  • Data Contracts as the "Circuit Breaker" for Model Reliability
  • Jakarta EE 12: Entering the Data Age of Enterprise Java

Trending

  • S3 Vectors: How to Build a RAG Without a Vector Database
  • 11 Agentic Testing Tools to Know in 2026
  • Solving the Mystery: Why Java RSS Grows in Docker on M1 Macs
  • How to Write for DZone Publications: Trend Reports and Refcards
  1. DZone
  2. Data Engineering
  3. Data
  4. When Search Started Breaking at Scale: How We Chose the Right Search Engine

When Search Started Breaking at Scale: How We Chose the Right Search Engine

Learn about how we scaled our search system, evaluated Solr and Elasticsearch, and redesigned the architecture for better performance and reliability.

By 
sunil paidi user avatar
sunil paidi
·
May. 12, 26 · Analysis
Likes (0)
Comment
Save
Tweet
Share
1.4K Views

Join the DZone community and get the full member experience.

Join For Free

When we first built our search system, everything worked fine.

The data size was manageable, search responses were fast, and updates were happening as expected. Like many teams, we assumed that once a search engine is set up, it will continue to work as the system grows.

But that didn’t last long.

As traffic increased and data volume grew, we started seeing issues that were hard to ignore. Search became slower, updates were delayed, and maintaining the system required more effort.

At that point, it was no longer just a technical choice — it became a decision that directly impacted user experience and system reliability.

In this article, I want to share how we approached this problem and how we evaluated the right search engine when our system started breaking at scale.

The Problem We Faced

At the beginning, our search system worked without any major issues.

But as the system grew, we started noticing several problems:

  • Search responses became slower during peak traffic
  • The newly updated data was not showing up immediately
  • Indexing pipelines started lagging behind
  • Scaling the search cluster required manual effort and tuning

These were not small issues. Users expect fast and accurate results, and delays started affecting their experience.

This forced us to step back and ask: Should we continue scaling our current setup, or is it time to move to a different search engine? The issue wasn’t just the search engine — it was that our architecture wasn’t designed for scale.

The Decision Moment

There was a point where continuing with the existing setup started requiring more effort than expected.

We had to decide whether to keep optimizing the current system or invest time in redesigning the search architecture.

This was not just a technical decision — it was about choosing the right long-term direction.

What We Need to Solve

We were not just looking for a better tool.

We needed a system that could:

  • Handle increasing data without performance drops
  • Support near real-time indexing
  • Deliver low-latency search responses
  • Reduce operational overhead
  • Be ready for future improvements like AI-based search

This changed how we evaluated different options.

Our Existing Search Architecture

To understand where the problems were coming from, here’s a simplified view of how our system was structured:

Plain Text
 
                ┌───────────────────────┐
                │   Source Systems      │
                │  DB / APIs / Events   │
                └──────────┬────────────┘
                           │
                           ▼
                ┌───────────────────────┐
                │ Event / Update Layer  │
                │  CDC / Queue / Stream │
                └──────────┬────────────┘
                           │
                           ▼
                ┌───────────────────────┐
                │   Indexing Service    │
                │ Transform + Enrich    │
                └──────────┬────────────┘
                           │
                           ▼
                ┌───────────────────────┐
                │     Solr Cluster      │
                │   (Primary Search)    │
                └──────────┬────────────┘
                           │
                           ▼
                ┌───────────────────────┐
                │   Search API Layer    │
                │ Query + Ranking       │
                └──────────┬────────────┘
                           │
                           ▼
                ┌───────────────────────┐
                │   Users / Frontend    │
                └───────────────────────┘


This setup worked well initially, but as scale increased, indexing delays and query latency became bottlenecks.

How We Evaluated Our Options

Instead of just comparing features, we focused on real production needs.

1. Data Scale

At smaller scale, most systems work well. But at larger scale, architecture matters.

2. Real-Time Indexing

Delays in indexing meant users were seeing outdated data.

3. Query Performance

Users expect results instantly, especially under heavy traffic.

4. Operational Complexity

Managing clusters and tuning performance required significant effort.

5. Cost Beyond Infrastructure

We considered engineering effort and maintenance, not just infra cost.

6. Future Readiness

We evaluated support for AI search, vector search, and ML integration.

Comparing the Options

Feature Solr Elasticsearch OpenSearch Cloud Search
Setup Complex Easier Easier Very easy
Scaling Good Very good Very good Managed
Real-time updates Good Very good Very good Excellent
Maintenance High Medium Medium Low
Cost Lower infra Medium Medium Higher
AI support Limited Good Good Strong


At scale, the real difference between these systems is not features — it’s how much operational effort they require and how consistently they perform under load.

Understanding the Trade-Offs

Each option comes with trade-offs.

More flexible systems provide control but require more tuning. Managed solutions reduce operational effort but may increase cost.

There is no perfect choice — only the right choice for your system.

What Our Future Architecture Looked Like

We realized that fixing individual components wouldn’t solve the problem — we needed to rethink the architecture itself.

After evaluating different options, we moved toward a more scalable and flexible architecture.

Plain Text
 
                ┌───────────────────────┐
                │   Source Systems      │
                │  DB / APIs / Events   │
                └──────────┬────────────┘
                           │
                           ▼
                ┌───────────────────────┐
                │ Event Streaming Layer │
                │ Kafka / Queue / CDC   │
                └──────────┬────────────┘
                           │
                           ▼
                ┌───────────────────────┐
                │   Indexing Service    │
                │ Async + Scalable      │
                └──────────┬────────────┘
                           │
                           ▼
                ┌────────────────────────────┐
                │ Distributed / Managed      │
                │ Search Engine              │
                │ (Elastic / Cloud Search)   │
                └──────────┬─────────────────┘
                           │
                           ▼
                ┌───────────────────────┐
                │   Search API Layer    │
                │ Caching + Ranking     │
                └──────────┬────────────┘
                           │
                           ▼
                ┌───────────────────────┐
                │   Users / Frontend    │
                └───────────────────────┘


What Changed in the New Architecture

The key improvements were:

  • Moving to an event-driven indexing pipeline
  • Introducing asynchronous processing
  • Using a more scalable distributed search system
  • Reducing manual operational effort

Impact After Changes

After moving to this approach, we started seeing noticeable improvements:

  • Faster indexing updates
  • More consistent query response times
  • Better handling of peak traffic
  • Reduced operational overhead

In our case, indexing delays were reduced significantly, and query performance became more stable as the system scaled.

A Common Mistake We Made

One mistake we made early on was focusing too much on initial setup and not enough on long-term scalability.

We also considered continuing with the existing setup and optimizing it further. However, we realized that incremental fixes would not solve the underlying scaling challenges.

What We Learned

The best search engine is not the one that works today — it’s the one that continues to work as your system grows.

How I Would Approach This Today

If I had to make this decision again today, I would start by evaluating:

  • Expected data size in the next 1–2 years
  • Real-time vs batch indexing requirements
  • Operational ownership (team size and expertise)
  • Need for AI or semantic search

This would help avoid rework later and make the system easier to scale from the beginning.

When to Choose What

  • Solr → good for controlled enterprise environments
  • Elasticsearch/OpenSearch → flexible and scalable
  • Cloud search → low operational overhead and AI-ready

Final Thoughts

Choosing a search engine is not just about features — it’s about making a decision that will hold up as your system grows.

In my experience, it’s better to think about scale and future requirements early, rather than trying to fix limitations later.

The right decision early can save a lot of time and effort down the road.

Semantic search Data (computing) Search engine (computing)

Opinions expressed by DZone contributors are their own.

Related

  • Personalized Search Optimization Using Semantic Models and Context-Aware NLP for Improved Results
  • Stop Debugging Glue Jobs Manually: Building an Agentic Observability Layer for Data Pipelines
  • Data Contracts as the "Circuit Breaker" for Model Reliability
  • Jakarta EE 12: Entering the Data Age of Enterprise Java

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook