DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Data Engineering
  3. Databases
  4. Apache Lucene and Solr 3.6 Release! New Language Analysis, Joins, and Finite-State APIs

Apache Lucene and Solr 3.6 Release! New Language Analysis, Joins, and Finite-State APIs

Robert Muir user avatar by
Robert Muir
·
Apr. 13, 12 · Interview
Like (0)
Save
Tweet
Share
7.92K Views

Join the DZone community and get the full member experience.

Join For Free

Lucene / Solr 3.6 has been released and is available for download.

As release manager, here’s my take on the new features:

  • Language analysis:
    • Newly added morphological analysis and part-of-speech tagger for the Japanese Language, geared for search, contributed by Christian Moen.
    • CJK Analysis improvements inspired by the folks at Hathitrust, who are indexing terabytes of text across hundreds of languages with Apache Solr. I encourage you to investigate their blog if you are interested in reading about large-scale search challenges.
    • Lucene/Solr analysis for many languages was tuned and simplified. For instance, to get started with the new Japanese capabilities described above, simply use the text_ja language defined in Solr’s example schema.xml. We configured this for 30 languages out-of-box.
  • Joins:
    • Ability to do index-time block-joins in the opposite direction, useful when you have indexed a parent-child relationship already but sometimes want ungrouped child documents as the result.
    • Addition of query-time joins: an alternative when index-time joins are not feasible.
    • Important bugfixes to index-time joins.
  • Auto-suggest and finite-state APIs:
    • New Weighted FST suggester that offers more fine-grained ranking for suggestions.
    • FST APIs were extended to support reverse-lookups for monotonically increasing outputs, and support n-shortest-path algorithms by weight.
    • Improved suggester API that exploits our incremental automata construction to build suggester FSTs from huge amounts of data.
    • FST compression support, based on research by Lucene/Solr committer Dawid Weiss.
    • Additions to Apache Solr for easier integration of phrase-based auto-suggest, e.g. for previous phrases recorded from query logs.
  • Miscellaneous:
    • A new index pruning module with configurable policies supports faster and smaller indexes that give similar relevance to a complete index.
    • Added phonetic analysis module, for accomplishing sounds-like search: different algorithms and languages are supported from Apache’s commons-codec project.
    • Performance improvements for index splitter tools.
  • Solr improvements:
    • Better defaults and configuration for multi-term queries. Queries such as wildcard queries have better interaction with the analysis chain, especially regarding case- or accent- insensitivity.
    • Distributed date and number range-faceting support.
    • Improved concurrency control for distributed search.
    • SolrJ support for latest HttpComponents release.
    • Clustering improvements: new support for clustering multilingual search results and for clustering on multiple fields.
    • Upgraded Tika integration to 1.0, with improved RTF, Word, and PDF parsing support.
  • Highlighting improvements:
    • A new HTMLStripCharFilter implementation, faster and reliable for matching result snippets to the underlying raw html.
    • Performance improvements for FastVectorHighlighter.
    • Bugfixes to many analysis components that would cause corner-case highlighting bugs.



If you want to hear more about these features, many of the committers who worked on them will be giving talks at Lucene Revolution in Boston, including:

  • Mark Miller will explain the SolrCloud architecture for distributed indexing.
  • Grant Ingersoll will tie together Solr, Hadoop, and Mahout.
  • Martijn van Groningen will be giving a talk about grouping and join features.
  • Erick Erickson will talk about SolrCloud from the user perspective.
  • Christian Moen will be giving a talk introducing Lucene/Solr’s new Japanese language capabilities.
  • Andrzej Bialecki will share adventures into Lucene 4.0′s codec APIs: including updateable fields.
  • Simon Willnauer will discuss some of the challenges of implementing high-performance search in Java.
  • Uwe Schindler will be talking about refactoring of the upcoming Lucene 4.0 IndexReader API.
  • Mike McCandless and I will discuss current and future improvements related to finite-state technology.
  • Chris Hostetter will play the chump, please try to stump him with your questions!


Hope to see you there!

Database Joins (concurrency library) Lucene Apache Lucene Release (agency)

Published at DZone with permission of Robert Muir. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Spring Boot, Quarkus, or Micronaut?
  • Real-Time Analytics for IoT
  • 5 Software Developer Competencies: How To Recognize a Good Programmer
  • Key Elements of Site Reliability Engineering (SRE)

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: