Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Multi-lingual search with Lucene and Elasticsearch

DZone's Guide to

Multi-lingual search with Lucene and Elasticsearch

· Big Data Zone
Free Resource

Access NoSQL and Big Data through SQL using standard drivers (ODBC, JDBC, ADO.NET). Free Download 

Last night I gave a talk at SkillsMatter London on multi-lingual search with Lucene and Elasticsearch. The talk covered various challenges with indexing texts in various languages: tokenization, term normalization and stemming. I started with demonstrating the challenges on individual languages, and ended with discussing the ability of mixing texts in various languages in one index - whether it is at all possible, and how to approach that.

We had some issues with the recording so I had to repeat the first few slides (this is why I go very quick in the first minutes...) and the audio quality could be better, nevertheless the talk presents real-world issues and offers what I believe to be good paths for solving those issues. Since this is quite a lot to write blog posts about I think I will just leave it in its video existence for now.

The video is available here: https://skillsmatter.com/skillscasts/4968-approaches-to-multi-lingual-text-search-with-elasticsearch-and-lucen.

The fastest databases need the fastest drivers - learn how you can leverage CData Drivers for high performance NoSQL & Big Data Access.

Topics:

Published at DZone with permission of Itamar Syn-hershko, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}