Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How To Add Search Capability To .Net Applications Using Lucene.net

DZone's Guide to

How To Add Search Capability To .Net Applications Using Lucene.net

· Big Data Zone
Free Resource

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

One of the key factors in determining if a software application is usable is if the users can navigate around it easily, while at the same time finding relevant information and resources through the provision of a search functionality. While it is easy to integrate Google or Bing search libraries into software applications, at times it is necessary to use a library that provides an in-depth search capabilities while at the same time furnishing the developers with search analytics. There are a lot of proprietary search libraries or frameworks that fall into these categories such as Excite. But there is an open source and robust search library that I have found very fascinating when I need to add search capabilities to .net applications; Lucene.net

Lucene.net is a .net port of Lucene, an open source full text search engine written in java. Lucene.net when incorporated into .Net applications provides full text search functionality. And also, Lucene.Net can be used to index entity framework objects to facilitate easy searching, thanks to LINQ to Lucene project. The source code ofLucene.Net can be gotten from https://github.com/apache/lucene.net‎ and compiled or the compiled binaries from nugget Install-Package Lucene.Net 

In this blog post, I will quickly demonstrate how Lucene.Net can be used in a .Net application to provide a full text search. Before we start writing codes, some Lucene concepts and terms need to be explained and clarified. 

Field: this is a name, value pair dictionary or hashtable, the name can be title or subject of a blog post, while the value would be the text. The values of a field can be stored, indexed and analyzed. 

Document: is a sequence of fields 

Directory: this is where Lucene stores indexes and it can be in a folder on the system or in memory. 

IndexWriter: this is the Lucene’s component that creates and optimizes indexes and also adds documents to indexes. 

Analyzer: is the Lucene’s Component that extracts index keys or terms from the text. 


IndexReader: this is the Lucene’s component that allows easy search and retrieval of indexed fields. I will demonstrate the power of Lucene.net by using it to index the pages of an ASP.net MVC application. First, the binaries of Lucene.net needs to be added to the MVC application through nugget. Then a folder needs to be created inside App_Data folder, I named the folder LuceneIndexes, when that is done, the next thing is to initialize the directory and indexwriter that Lucene.net will use and this is best done during application startup, by adding a LuceneSearchConfig class to App_Start folder, with codes as shown below. 

[assembly: WebActivator.PostApplicationStartMethod(typeof(Blog.Web.App_Start.LuceneSearchConfig),
 "InitializeSearch")]
[assembly: WebActivator.ApplicationShutdownMethodAttribute(typeof(Blog.Web.App_Start.
LuceneSearchConfig),"FinalizeSearch")]

namespace Blog.Web.App_Start
{
    using Blog.Domain.Service;
    using Lucene.Net.Analysis;
    using Lucene.Net.Analysis.Standard;
    using Lucene.Net.Documents;
    using Lucene.Net.Index;
    using Lucene.Net.QueryParsers;
    using Lucene.Net.Search;
    using Lucene.Net.Store;
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Threading.Tasks;
    using System.Web;

    public class LuceneSearchConfig
    {
        public static Directory directory;
        public static Analyzer analyzer;
        public static IndexWriter writer;

        public static void InitializeSearch()
        {
            string directoryPath = AppDomain.CurrentDomain.BaseDirectory + @"\App_Data\LuceneIndexes";
            directory = FSDirectory.Open(directoryPath);
            analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
            writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED);
            Task.Factory.StartNew(() => CreateIndex());
        }

        private static void CreateIndex()
        {

            IBlogService blogService = new BlogService(repository);
            var pages = blogService.GetAllPages();
            Document doc;
            string pageUrl = string.Empty;
            string pagePath = string.Empty;
            foreach (var page in pages)
            {
                doc = new Document();
                pageUrl = string.Concat("~/Blog/", page.PageViewName);
                pagePath = string.Concat(AppDomain.CurrentDomain.BaseDirectory, @"\Views\MyBlog\", page.PageViewName, ".cshtml");
                doc.Add(new Field("postUrl", pageUrl, Field.Store.YES, Field.Index.NOT_ANALYZED));
                doc.Add(new Field("postTitle", page.PageTitle, Field.Store.YES, Field.Index.NOT_ANALYZED));
                doc.Add(new Field("postBody", pagePath, Field.Store.YES, Field.Index.ANALYZED));
                writer.AddDocument(doc);
                doc = null;
            }
            writer.Optimize();
            writer.Commit();
            writer.Dispose();
        }

        public static void FinalizeSearch()
        {
            directory.Dispose();
        }
    }
}

Let me break down the codes and explain it. The lines of codes below initializes the folder where the Lucene.net Indexes would be stored while at the same time instantiation of Analyzer and IndexWriter were done. 

string directoryPath = AppDomain.CurrentDomain.BaseDirectory + @"\App_Data\LuceneIndexes";
directory = FSDirectory.Open(directoryPath);
analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED);

The fields to be indexed would be added to the Document object, before adding the object to the index writer as shown in the codes below. 

doc = new Document();
pageUrl = string.Concat("~/Blog/", page.PageViewName);
pagePath = string.Concat(AppDomain.CurrentDomain.BaseDirectory,@"\Views\MyBlog\", page.PageViewName, ".cshtml");
doc.Add(new Field("postUrl", pageUrl, Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("postTitle", page.PageTitle, Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("postBody", pagePath, Field.Store.YES, Field.Index.ANALYZED));
writer.AddDocument(doc);

After the indexes have been added to the document, the Optimize method of the IndexWriter is called to optimize the indexes for faster access, before calling the Commit and Dispose methods, which would save the indexes to the folder and close the IndexWriter. 

writer.Optimize();
writer.Commit();
writer.Dispose();

Creating the query and making the search is relatively easy, once the indexes have been created. The LuceneIndexes directory needs to be opened once again, using the FSDirectory object which would then be passed into the constructor of IndexSearcher object. A Parser object would be created which would pass the search word or query before the Search method on the IndexSearcher is called. A POCO class was created to hold the results of the search which would in turn be passed to the view for further processing. The code is as shown below 

public ActionResult Search(FormCollection formCollection)
        {
              List <SearchResult> searchResults = new List <SearchResult>();
                var query = formCollection["search"];
                string indexDirectory = Server.MapPath("~/App_Data/LuceneIndexes");
                var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
                IndexSearcher searcher = new IndexSearcher(FSDirectory.Open(indexDirectory));
                var parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "postBody", analyzer);
                Query searchQuery = parser.Parse(query);
                TopDocs hits = searcher.Search(searchQuery, 200);
                int results = hits.TotalHits;
                SearchResult searchResult = null;
                for (int i = 0; i < results; i++)
                {
                    Document doc = searcher.Doc(hits.ScoreDocs[i].Doc);
                    searchResult = new SearchResult();
                    searchResult.PostUrl = doc.Get("postUrl");
                    searchResult.PostTitle = doc.Get("postTitle");
                    searchResult.PostIntro = doc.Get("postIntro");
                    searchResults.Add(searchResult);
                    searchResult = null;
               }
              return View(searchResults);
        }

    public class SearchResult
    {
        public string PostUrl { get; set; }

        public string PostTitle { get; set; }

        public string PostIntro { get; set; }
    }

In this blog post, I have demonstrated how Lucene.net can be quickly used in an ASP.net MVC application to provide a full text search.

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:
dotnet ,how-to ,tools ,text ,visual studio ,search ,data access ,.net & windows ,c-sharp ,full

Published at DZone with permission of Ayobami Adewole, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}