This article will give you a quick glimpse into a test framework built to validate Couchbase’s new full-text search feature. The idea described here can be extended to test any text search engine in general.
Couchbase Full-Text Search
Searching unstructured schema-less JSON documents in Couchbase is now easy thanks to the full-text capability it offers. What this means is that Couchbase users can now search for phrases, words, and date/numeric-ranges inside JSON documents. These searches are essentially “queries” on full-text indexes. Couchbase full-text search is RESTful and distributed and is driven by Bleve, an indexing and search library written in Go. For more about full-text search, refer to the recommended reading section.
When we first started the task of developing a full-text test framework for Couchbase, given the wide spectrum of queries and limitless possibilities in index mapping, it was obvious that we needed a smart result validation in place. Running a limited, pre-defined set of queries and verifying the results against a static set of results wasn’t going to scale, to say the least. We needed a system that could randomize the application of token filters, analyzers, beyond the understanding of a tester or developer and yet determine the “right” or the widely-accepted results for the same. Hence, we ended up building a framework that validates Couchbase’s results using another well-known search system: Elasticsearch.
Elasticsearch is built on Lucene, a popular Java based full-text search engine. Bleve, the Go library that powers Couchbase full-text search, is inspired by Lucene. So functionally, they are very similar. Elasticsearch also provides a RESTful web interface for searching JSON documents similar to Couchbase. Hence, comparing results against Elasticsearch seemed like a great idea to verify functional correctness.
With the above set goal, a Python test framework was built, including the following key components.
1. Custom Map Generator
A Couchbase full-text search index can be customized to choose which fields must be indexed and how each field should be analyzed. While a default index (that indexes all fields) is well-suited for testing purposes, a custom index is more performant and ideal for production usage. For more on how to customize indexes, click here.
We built a Custom Map Generator to address the problem of testing multiple combinations of type mappings, analyzers with character filters, tokenizers, and token filters. A custom map generator randomly selects fields to be indexed and applies an analyzer (predefined or custom to the indexed fields). The test framework relies on predefined templates for custom analyzer definitions including definitions for custom character filters, token filters, and tokenizers. It then randomly selects the elements from these definitions and forms custom analyzers. Custom Map Generator doesn’t just generate a Couchbase full-text search mapping but also an equivalent Elasticsearch mapping as shown in Figures 2 and 3.
Custom Map Generator also feeds the indexed fields as queryable fields to Random Query Generator (described below) for queries to be built on. Termed “smart-querying,” this aims at querying only the indexed fields. The queries generated are then executed against Couchbase full-text index and Elasticsearch index and results are validated.
Figure 1: Custom Map Generator
As simple as the above idea might seem, it is very effective in generating complex mappings like the below. Although Couchbase full-text search behaves very similar to Elasticsearch, there are minor differences and we had to work around the same in the framework. For example, Elasticsearch standard analyzer does not remove stop-words while Couchbase standard analyzer does. To create comparable systems, we created a custom Elasticsearch index with Couchbase standard analyzer stop-words. Similarly, Couchbase handles indexing of
_all fields differently when compared to Elasticsearch.
A simple generated custom-map for Couchbase is shown below. We have enabled a document type mapping named
emp and only indexed the name field in the JSON of type
emp. There are other values like
analyzer that define how the name field must be analyzed, indexed, and stored. These values are also randomly generated.
Figure 2: Generated index mapping for Couchbase
Figure 3: Generated index mapping for Elasticsearch
2. Dataset Generator/Loader
While testing full-text search, it’s good to consider datasets that have multiple data types, i.e. date, bool, number, string, long text, list/array, optional fields, and nested objects. In our framework, we plugged in a variety of datasets — Wikipedia dumps (in multiple languages) containing large documents (2KB), which seemed like the perfect fit for testing long text fields and analyzers, and small-sized generated JSON (< 0.5KB), which help test the rest. The number of documents we tested with was made configurable up to 10M. The ability to update/delete documents across the systems also helped to test index updates.
Figure 4: JSON CRUD workload
Since queries are closely tied to the data loaded, it helps to define query-ables for every dataset. Query-ables are predicate builders, essentially a set of fields that can be queried upon, along with possible values and are fed to Random Query Generator(RQG) to build meaningful queries.
3. Random Query Generator
The heart of the test framework is Random Query Generator (RQG). Based on the dataset loaded, it generates a wide range of queries on all or a subset of JSON fields. It supports the following queries:
- numeric range
- date range
It generates any number of queries based on a parameter
num_queries passed at runtime. A subset of the above queries can also be generated by specifying the same using another param,
query_types. Like the Random Map Generator, Random Query Generator also generates Couchbase full-text queries of different types and their equivalent Elasticsearch queries.
Figure 5: Random Query Generator
4. Result Comparator
The Result Comparator is a very useful tool in finding differences in the results returned by Couchbase and Elasticsearch. It compares the document IDs/keys returned by the systems for every query and calls out queries that do not have matching results by document IDs which are then compared to identify potential bugs/regressions.
Figure 6: A screenshot of Result Comparison
This test framework has not just helped uncover dissimilarities between the search systems but has also proven greatly useful in validating search results. In addition, indexing and query latency and relevance of hits can also be compared easily. Listed below are some bugs this testing approach has uncovered.
Querying during swap rebalance does not return correct results for some queries
Querying during rebalance of the FTS node yields a lesser number of hits
Prefix query on a string-ified numeral yields 0 results
FTS tokenization is slightly different from ES
Bleve's standard analyzer performs stop word removal by default but ES's doesn't