Hibernate Search expands the well known Hibernate and JPA annotations to let you define how your Java POJOs should be indexed.
An entity will be indexed if it’s marked with the @Indexed annotation; the second step will be to pick properties for index inclusion via the @Field annotation; an entry in the Lucene index is represented by a Lucene Document, which has several named fields. By default an entity property will be mapped to a Lucene Field with the same name.
Basic mapping example
import java.time.LocalDate;
import javax.persistence.Entity;
import javax.persistence.Id;
import org.hibernate.search.annotations.Field;
import org.hibernate.search.annotations.Indexed;
@Entity
@Indexed
public class Explorer {
@Id int id;
@Field String name;
@Field LocalDate dateOfBirth;
}
The @Indexed annotation allows - among others - to pick an index name.
The @Field annotation has several interesting attributes:
- name: to customize the name of the Field in the Lucene Document.
- store: allows to store the value in the index, allowing loading over Projection
- analyze: when disabled, the value will be treated as a single keyword rather than being processed by the Analyzer chain (see Analyzers).
- bridge: allows customizing the encoding/decoding to and from the index format. Not needed for all basic Java types and dates, but required for custom types.
Make it sortable
If you want to be able to sort on a field, you have to prepare the index format for the operation using the @Sortable field:
@Entity
@Indexed
public class Explorer {
@Id int id;
@SortableField
@Field(store=Store.YES)
String name;
@Field LocalDate dateOfBirth;
}
Indexing spatial locations
Using the @Spatial annotation, you can index pairs of coordinates to search for entries within a distance from a point, or sort by distance from a point.
This feature should not be confused with Hibernate Spatial, which integrates with RDBMs based types; the Hibernate Search support for Spatial integration is limited to distances from points but shines when this factor has to be combined with a full text query.
For example you might want to search for a restaurant by approximate name and by not being too far away:
@Indexed @Spatial @Entity
public class Restaurant {
@Id id;
@Field name;
@Latitude
Double latitude;
@Longitude
Double longitude;
}
The Spatial feature of Hibernate Search is extensive and we suggest reading the reference documentation (Chapter 9) to better understand its features. For example, it can use different indexing techniques, you can have multiple sets of spatial coordinates per entity, or represent points by implementing the org.hibernate.search.spatial.Coordinates interface.
Index associations
The index technology is not a good fit for “join queries”, and generally to represent associations. Hibernate Search allows to embed fields from related entities into its main document, but remember that while these fields are useful to narrow down your queries it is not a true association: you’re in practice creating a denormalized representation of your object graph.
In the following example, the Movie entity will generate an index having two indexed fields: “title” and “actors.name”, in addition to the id field which is always indexed. The “actors.name” field might be repeated multiple times in the index, and therefore match simultaneously various different names.
@Indexed @Entity
public class Movie {
@Id @GeneratedValue private Long id;
@Field private String title;
@ManyToMany
@IndexedEmbedded
private Set<Actor> actors;
...
}
@Entity
public class Actor {
@Id @GeneratedValue private Integer id;
@Field private String name;
@ManyToMany(mappedBy=”actors”)
@ContainedIn
private Set<Movie> movies;
...
}
Quick reference of all Hibernate Search annotations
Table of all Hibernate Search annotations - the package name is “org.hibernate.search.annotations” :
@AnalyzerDef |
Define an Analyzer and associate it with an Analyzer name. Requires a Tokenizer, and optionally CharFilters and TokenFilters, and a unique name to bind to. |
@AnalyzerDefs |
Contains multiple @AnalyzerDef annotations. |
@AnalyzerDiscriminator |
Used to choose an Analyzer dynamically based on an instance’s state. |
@Analyzer |
Defines which Analyzer shall be used; can be applied on the entity, a method or attribute, or within a @Field annotation. |
@Boost |
Statically “boost” a property or an entity, to have more (or less) influence on the score during queries. |
@CalendarBridge |
Set indexing options on a property of type Calendar. |
@CharFilterDef |
Create a character filter definition; used as parameter of @AnalyzerDef. You can list multiple character filters, they will be applied in order. |
@ClassBridge |
Customizes the conversion process from entity to indexed fields by implementing a “org.hibernate.search.bridge.FieldBridge”. |
@ClassBridges |
Contains multiple @ClassBridge annotations. |
@ContainedIn |
Marks the inverse association of an @IndexedEmbedded relation. |
@DateBridge |
Set indexing options on a property of type Date. |
@DocumentId |
Override the property being used as primary identifier in the index. Defaults to the property with JPA’s @Id. |
@DynamicBoost |
Similar to @Boost but allows dynamic selection of a boost value, based on the instance properties. |
@Facet |
Marks a property to allow faceting queries. |
@Facets |
Contains multiple @Facet annotations. |
@FieldBridge |
Specifies a custom FieldBridge implementation class to use to convert a property type into its index representation. |
@Field |
Marks an entity property to be indexed. Supports various options to customize the indexing format. |
@Fields |
Contains multiple @Field annotations. Indexes the annotated property multiple times: each @Field shall have a different name and can have different options. |
@FullTextFilterDef |
Define a filter, which can be applied to full-text queries. |
@FullTextFilterDefs |
Define multiple filters by repeating @FullTextFilterDef |
@IndexedEmbedded |
Recurses into associations to include indexed fields in the parent index document. Allows a prefix name, and strategies to limit the recursion. Often to be used with @ContainedIn on the inverse side of the relation. |
@Indexed |
Marks which entities shall be indexed; allows to override the index name. Only @Indexed entities can be searched. |
@Latitude |
Marks a numeric property as being the “latitude” component of a coordinates pair for a spatial index. |
@Longitude |
Marks a numeric property as being the “longitude” component of a coordinates pair for a spatial index. |
@NumericField |
Applies on a property having a numeric value, allows customizing the index format such as the precision of its representation. |
@NumericFields |
Allows indexing a single numeric property into multiple different numeric index fields, and customize each, by listing several @NumericField annotations. |
@Parameter |
Used to pass an initialization parameter to tokenizers and filters in the scope of an @AnalyzerDef, or to pass parameters to your custom bridge implementations in the scope of a @FieldBridge. |
@SortableField |
Marks a property so that it will be indexed in such way to allow efficient sorting on it. |
@SortableFields |
When a property is indexed generating multiple fields, this allows to choose multiple of these fields to be good candidates for sorting. Container for @SortableField. |
@Spatial |
Defines a named couple of coordinates. You can have it on an entity, or on a getter returning an implementation of org.hibernate.search.spatial.Coordinates |
@Spatials |
Used to define multiple @Spatial pairs of coordinates; assign a unique name to each of them. |
@TikaBridge |
Applied on an indexed property it will be treated as a resource processed with Apache Tika to extract meaningful text from it, for example metadata from music or plain text from office documents. The property can point to the resource (String containing a path, URI) or contain the resource (byte[], java.sql.Blob). |
@TokenFilterDef |
Defines the Token Filter(s) in an @AnalyzerDef. You can list multiple filters, they will be applied in order. |
@TokenizerDef |
Chooses the Tokenizer to be used in an @AnalyzerDev. |
A simple Analyzer, for example, could apply this process:The Analyzer is the Lucene component that processes a stream of text into signals of interest to be stored in the index.
- split the input string on any whitespace character
- lowercase each of these tokens
- replace accented characters with the simpler form
The problem might seem simple when limited to whitespace chunking and lowercasing, but your application can provide much better results - and therefore be much more useful - if you adjust your analyzer chain to your specific needs.
For example you might want to use N-Grams to minimize the effect of user typos, an exotic language might require special handling for some lower casing, you might want to highlight specific keywords, cleanup HTML tags, replace each generic noun with a canonical synonym, expand domain specific acronyms, apply stemming.
While an internet search is targeting a wide audience and indexing disparate content, when building your own application you can embed it with domain specific, specialized search engine just by picking the right analyzers.
The Apache Lucene project documentation provides a great introduction to the Analyzer chain here: http://lucene.apache.org/core/5_3_0/core/org/apache/lucene/analysis/package-summary.html
Do not be scared by the examples of how to use an Analyzer though, as with Hibernate Search you merely have to pick the Tokenizer(s) and Filter(s) implementations you want to use.
Let’s now make a fairly complex example. We will define two analyzers, named “en” and “de”. We define these using the @AnalyzerDef annotation on an indexed entity, and this makes both Analyzers available to Hibernate Search, and we can refer to it by name:
@Entity
@Indexed
@AnalyzerDefs({
@AnalyzerDef(name = "en",
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = {
@Parameter(name = "language", value = "English")
})
}),
@AnalyzerDef(name = "de",
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = GermanStemFilterFactory.class)
})
})
public class Article {
private Integer id;
private String language;
private String text;
@Id @GeneratedValue
public Integer getId() {
return id;
}
public void setId(Integer id) {
this.id = id;
}
@Field(store = Store.YES)
@AnalyzerDiscriminator(impl = LanguageDiscriminator.class)
public String getLanguage() {
return language;
}
public void setLanguage(String language) {
this.language = language;
}
@Field(store = Store.YES)
public String getText() {
return text;
}
...
}
The “en” Analyzer is defined as: tokenize all input using the StandardTokenizerFactory, then process it with LowerCaseFilterFactory and finally process it with the SnowballPorterFilterFactory, initialized with the “language” parameter set to “English”.
The “de” Analyzer is similar as it also begins with a StandardTokenizerFactory followed by lower casing thanks to LowerCaseFilterFactory, but is then processed by the GermanStemFilterFactory, which requires no parameters.
In the above example, we also introduced the rather advanced annotation @AnalyzerDiscriminator. This annotation refers to a custom implementation of org.hibernate.search.analyzer.Discriminator; a suitable implementation for the above example would have to return either “de” or “en” and this would allow the Article entity to be processed with the corresponding Analyzer depending on its language property:
public class LanguageDiscriminator implements Discriminator {
@Override
public String getAnalyzerDefinitionName(Object value, Object entity, String field) {
if ( value == null || !( entity instanceof Article ) ) {
return null;
}
return (String) value;
}
}
{{ parent.title || parent.header.title}}
{{ parent.tldr }}
{{ parent.linkDescription }}
{{ parent.urlSource.name }}