Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Lucene is Beautiful

DZone's Guide to

Lucene is Beautiful

· Big Data Zone
Free Resource

Effortlessly power IoT, predictive analytics, and machine learning applications with an elastic, resilient data infrastructure. Learn how with Mesosphere DC/OS.

So after I finished telling you how much I don’t like the Lucene.net codebase, what's this post about?

Well, I don’t like the code, but then again, I generally don’t like to read low-level code. The ideas behind Lucene are actually amazingly powerful in their simplicity.

At its core, Lucene is just a set of sorted dictionaries on disk (greatly simplified, I know). Everything else is built on top of that, and if you grok what is going on there, you would be quite amazed at the number of things that this has made possible.

Indexing in Lucene is done by a pipeline of documents and fields and analyzers, which all participate together to generate those dictionaries. Searching in lucene is done by traversing those dictionaries in various ways, and combining the results in interesting ways.

I am not going to go into details about how it works, you can read all about that here. The important thing is that once you have grasped the essential structure inside lucene, the rest are just details.

The concept and the way the implementation fell out are quite beautiful.

 

Learn to design and build better data-rich applications with this free eBook from O’Reilly. Brought to you by Mesosphere DC/OS.

Topics:

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}