Automata Invasion: Finite-State Technology in Lucene
Here's another great presentation from the just-finished Lucene Revolution 2012 with Robert Muir of Lucid Imagination and Michael Mccandless (a DZone MVB) from IBM.
Finite-state technology, including automata and weighted finite state transducers (wFSTs), are compact data structures well suited to text processing and searching applications. Low level support for both automata and wFSTs is now available in Lucene and has recently enabled a number of surprisingly powerful improvements. In this joint talk, Robert Muir and Michael McCandless will provide an overview of finite-state technology and then describe how it's used today in Lucene: synonym filtering, fuzzy queries, respelling/suggesting, terms dictionary, in-memory postings format (MemoryPostingsFormat) and Japanese analysis (Kuromoji analyzer).