Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

OpenNLP Named Entity Recognition

DZone's Guide to

OpenNLP Named Entity Recognition

Learn about named entity recognition to extract a named entity from a text with OpenNLP in a Java project using pre-trained model files.

· AI Zone ·
Free Resource

EdgeVerve’s Business Applications built on AI platform Infosys Nia™ enables your enterprise to manage specific business areas and make the move from a deterministic to cognitive approach.

In this article, we will discuss how to extract a named entity from a text using Apache OpenNLP. We will create a sample Maven-based Java project and will configure OpenNLP in it. We will be using pre-trained model files such as en-ner-location.binen-ner-person.bin, and en-ner-organization.bin, which have been provided by OpenNLP for this.

Named-entity recognition (NER) is a subtask of information extraction that seeks to locate and classify named entities in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.

The very first requirement to get started with NER is to download the required model file from here. I downloaded en-ner-location.binen-ner-person.bin, and en-ner-token.bin and kept them in my local workspace under the /resources folder.

The next step is to update the Maven dependencies required for this setup.

<dependency>
  <groupId>org.apache.opennlp</groupId>
  <artifactId>opennlp-tools</artifactId>
  <version>1.8.1</version>
</dependency>

There is a common way provided by OpenNLP to detect all these named entities. First, we need to load the pre-trained models and then instantiate the TokenNameFinderModel object.

Let's get started withen-ner-person.bin .

InputStream inputStream = getClass().getResourceAsStream("/en-ner-person.bin");
TokenNameFinderModel model = new TokenNameFinderModel(inputStream);

After this model is loaded, we need to instantiate the NameFinderME class and use the find() method to find the respective entities. This method requires tokens of a text to find named entities. Hence, we are first required to tokenize the text. You can visit my other post about OpenNLP tokenization to learn more about tokenization. Following is an example to extract person names from tokens.

NameFinderME nameFinder = new NameFinderME(model);
String[] tokens = tokenize(paragraph);

Span nameSpans[] = nameFinder.find(tokens);

The find() method above returns an array of Span. To find the actual text of the named entity, we need to read each span in a loop. Following is an example to read each span and extract the named entity.

 for(Span s: nameSpans){
    System.out.println(tokens[s.getStart()]);
 }

This will print the name of a person from the text if there is any. Similarly, we can load en-ner-location.bin or en-ner-organization.bin and follow a similar approach to extract the location and organization name from any text.

This article has been all about named entity recognition using OpenNLP in a Java project. In the next article, we will look into named entity recognition using the Stanford NLP.

Adopting a digital strategy is just the beginning. For enterprise-wide digital transformation to truly take effect, you need an infrastructure that’s #BuiltOnAI. Click here to learn more.

Topics:
ai ,nlp ,entity recognition ,opennlp ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}