Optimizing Search: A Patent-Backed Approach to Perceived Speed
Perceived speed matters as much as latency. Search can be made to appear instant by using prefetching, slow/fast orchestration, and streaming.
Join the DZone community and get the full member experience.
Join For FreeSo, imagine it’s a Friday night after a long week. The kids are finally asleep, and you’re ready to unwind with the new season of Stranger Things. You open the Netflix app, select that banner on the home page, and press play! And then you see that dreaded loading circle that just wouldn’t go away.
Why?
Well, because in this scenario, Netflix is trying to download the full episode in 4 K resolution with its Dolby Digital-enabled soundtrack before it starts playing.
Needless to say, you get frustrated and abandon the idea altogether because no one should have to go through such an experience, or lack thereof. Ever!
The problem that our apocalyptic analogy brings up is solved easily by streaming, which lets one start consuming video content in seconds, while the rest progressively loads in the background.
Search can work the same way — it can appear to be faster than it actually is!
- It can preemptively fetch results by predicting keywords, or as soon as the searcher’s intent can be inferred.
- It can prioritize fetching and rendering content that’s in the searcher’s viewport on the landing page.
- And most importantly, instead of waiting for all results to be collated in the backend before being delivered, it can stream partial results as soon as they are available.
And this forms the foundational principle of how Instant Search Results works (US Patent #11,966,448 B2): change how search is performed to be perceived faster, even if the backend services or infrastructure haven’t actually sped up.
The Problem With the Traditional Search
Search latency isn’t always about raw backend performance. In fact, many search systems are already optimized with hyper-tuned indexes, caching at multiple layers, and massive fan-out across a fleet of services. The system may be fast enough by engineering metrics, but the experience could still feel sluggish to the human on the other side.
Why? Because latency stacks up unevenly.
- Some calls are cheap in terms of milliseconds — like fetching hits from an inverted index based on an exact keyword match.
- Others are expensive — like semantic matches, personalization models, and extensive graph traversals.
Traditional search engines usually wait until all calls in the critical path return before the results can be shown to the user.
Some components can always be lazily loaded and fetched in parallel, but if getting the first set of barebone results on Search’s landing page takes too long, there’s a risk of losing returning users altogether [1].
Instant Search Results
The core idea behind Instant Search Results is simple enough: perceived speed matters as much as the actual speed.
Instead of chasing diminishing returns in raw query performance — like squeezing another 10 ms from an already-optimized index — the invention focuses on optimizing how Search results are delivered. It combines three complementary techniques — each useful on its own, but far more powerful when orchestrated together:
1. Prefetching: Search With Foresight
By anticipating Search queries based on factors such as trending topics, typeahead suggestions, and searcher context, the system can pre-fetch and cache the Search results, even before the user finishes typing. If and when the member enters a prefetched query, the results can then be instantly delivered from the cache.
It’s like Netflix caching the first few seconds of popular shows based on your watch history, unfinished content, and some simple ML-based analysis.
While it comes with the benefit of perceived near-zero latency in some cases, aggressive prefetching risks wasted calls and cache churn. More on this will come later.
2. Fast/Slow Call Orchestration
Not all queries are equal. Some could be blazing fast, such as the top result returned after clicking on a typeahead suggestion when searching. Retrieving and rendering this top result wouldn’t even require running the Search in the backend, making it the perfect candidate to be returned instantly.
And as the top result is rendered on the screen, the full Search is executed in parallel - fetching remaining results below the fold, outside the member’s landing viewport.
The effectiveness of this optimization depends upon the coverage of the cases that can be served via this approach.
3. Streaming
The final piece here borrows directly from video streaming. Instead of batching all results into one response to be rendered in one go, the system streams the results as they become available in the backend.
Streaming Search results hinge on the principle that different types of Search hits have different processing times. This time is based on latency to fetch them from indexes, merging the results and ranking, and ranking based on personalization and decoration needed for UI rendering.
Instead of waiting for the most expensive result types to be available in the backend, the faster retrieved results, or a Search cluster, can be sent to clients for rendering, and the rest can be progressively streamed as and when they are available.
What’s Next
The next time you hit play on Netflix and the episode starts streaming in seconds, remember: you’re not actually watching something that’s fully ready, but you’re watching something that’s been cleverly optimized to feel instant. Search deserves the same treatment.
That’s exactly what Instant Search Results (US Patent 11,966,448 B2) is designed to achieve. While individual impact might be limited, by combining prefetching, fast/slow call orchestration, and streaming, Search delivers value the moment it can.
In the next article in the series, I’ll dive deeper into the first technique: prefetching. We’ll explore how anticipating a searcher’s needs can make search results appear magically instant.
Appendix
Opinions expressed by DZone contributors are their own.
Comments