8 Ways to Customize Couchbase Full Text Search Indexes
In this article, we will go over different ways a Full Text Search index can be customized.
Join the DZone community and get the full member experience.
Join For FreeCouchbase Search service supports the creation of special purpose indexes for Full Text Search to provide extensive capabilities for natural language querying on JSON documents. Couchbase Full Text Search indexes support an extensive range of query types, like:
- Match, Match Phrase, Doc ID, and Prefix queries
- Conjunction, Disjunction, and Boolean field queries
- Numeric Range and Date Range queries
- Geospatial queries
- Query String queries, which employ a special syntax to express the details of each query
To perform a full text search, a Full Text Search Index has to be created first upon a bucket on which the search has to be targeted. The search could be performed on the textual and other contents of documents within a specified bucket.
A Full Text Search Index can be quickly created by selecting a bucket and choosing to index all fields with the default settings, or it can be highly customized in different aspects over a default index. Use of customization in the index definition gives a performance edge over default indexes because of the optimization, and flexibility to choose how each field should be analyzed.
In this article, we will go over different ways a Full Text Search index can be customized. These customizations can be achieved via the UI or the REST API. To make it simple for understanding, I will stick to the UI in the examples.
- Index only the relevant documents. This will help reduce space and memory usage. Choose a type mapping to index a specific set of documents rather than indexing all types of documents in the bucket. There is also an option to specify how would the Full Text Search Index identify what is the type of the document — using the JSON field ‘type,’ or derive it from the Doc ID using separator or regex. The default is using the JSON field ‘type.’
- Choose to index only certain fields in the documents rather than indexing all non-relevant fields as well. Again, the benefit here is to optimize the space and memory consumption. Full Text Search Indexes allows to index simple fields or nested fields. Specify the field type, alias (searchable as) and the analyzer to analyze this field while selecting a field to index. You can also choose ‘store’ and ‘include term vectors’ if you need highlighting in the search results. You can select ‘include in _all field’ if you want this to be searchable without fieldname scoping in the query.
- Create custom analyzers or use out-of-the-box analyzers. There are several analyzers that come with Couchbase Search, like en, fr, keyword, standard, simple, web etc. These might not be the best fit for your dataset. Depending upon the characteristics of your dataset, analyzers can be customized too. Couchbase Search also offers functionality to create your own analyzer by choosing from the standard character filters, tokenizer, and token filters.
4. Create custom character filters, tokenizers, and token filters, or use the out-of-the-box ones. Just like analyzers, some character filters, tokenizers, and token filters are provided out-of-the-box, and Couchbase Search provides functionality to create your own.
5. A custom analyzer can have multiple character filters or token filters. You can specify these while defining a custom analyzer.
6. Choose different analyzers for different fields. If there are multiple fields to be analyzed, each field can be analyzed differently. For eg, you might want to use a ‘web’ analyzer for the URL field, but an ‘en’ analyzer for the name field.
7. Choose to analyze the same field differently. There might be some cases wherein the text in a particular field is multilingual. You might want to apply different language analyzers on such fields to get the most out of the multilingual text.
8. Choose between different Index Types — Version 5.0 (Moss) or Version 6.0 (Scorch). Version 5.0 (Moss) is the standard form of index to be used in test, development, and production. Version 6.0 (Scorch) reduces the size of the index-footprint on disk and provides enhanced performance for indexing and mutation-handling. Performance with Version 5.0 could be better in cases where queries involve usage of Prefix or wildcard queries. In most other cases, Version 6.0 would offer better performance.
Depending on the characteristics of your data, take full advantage of the customization offered by Couchbase Search service for Full Text Search Indexes and boost the performance of the queries.
Further Reading
- An exhaustive description on the usage of the Couchbase Server Search functionality is documented in the official Couchbase Server Search Documentation: https://docs.couchbase.com/server/6.0/fts/full-text-intro.html
- Various blogs about Couchbase Server Search service have a lot of information on efficient usage of Couchbase Search functionality and different possible use cases: https://blog.couchbase.com/category/full-text-search/
Opinions expressed by DZone contributors are their own.
Comments