Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Use the Apache Open NLP POS Tagger

DZone's Guide to

How to Use the Apache Open NLP POS Tagger

The Apache Open NLP POS Tagger is used to mark up text to be processed by natural language processing and NLP. Read on to learn how to use it!

· AI Zone
Free Resource

As per Wikipedia, POS tagging is "the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context — i.e. its relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc."

To begin, any part of speech is tokenized — it is divided into tokens and then these tokens are tagged as per grammar rules by NLP for further processing. Tagging is the basic pre-processing of any POS for text retrieval and text indexing. You can see an Apache Open NLP POS tokenization example here.

To get started with OpenNLP tagging, first we include following dependencies in the pom.xml file.

<dependency>
    <groupId>org.apache.opennlp</groupId>
    <artifactId>opennlp-tools</artifactId>
    <version>1.8.1</version>
</dependency>

OpenNLP provides a pre-trained model called en-pos-maxent.bin for any POS tagging. For tagging any POS, we first load en-pos-maxent.bin. The following lines of code will load this model.

public void initialize() {
 try {
  InputStream modelStream = getClass().getResourceAsStream("/en-pos-maxent.bin");
  model = new POSModel(modelStream);
  tagger = new POSTaggerME(model);
 } catch (IOException e) {
  System.out.println(e.getMessage());
 }
}

After the tagger is initialized, we basically tokenize any POS and apply tags on the tokenized string. Here is an example:

public void tag(String sentence) {
 initialize();
 try {
  if (model != null) {
   POSTaggerME tagger = new POSTaggerME(model);
   if (tagger != null) {
    String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE
     .tokenize(sentence);
    String[] tags = tagger.tag(whitespaceTokenizerLine);
    for (int i = 0; i < whitespaceTokenizerLine.length; i++) {
     String word = whitespaceTokenizerLine[i].trim();
     String tag = tags[i].trim();
     System.out.print(tag + ":" + word + "  ");
    }
   }
  }
 } catch (Exception e) {
  e.printStackTrace();
 }
}

The output will be similar to following for a sentence like Otri is from Mars and she loves coding

NNP:Otri VBZ:is IN:from NNP:Mars CC:and PRP:she VBZ:loves .:coding.

And that's it! Next time, we'll look at the Standford NLP POStagger with Maven.

Topics:
nlp ,ai ,tutorial ,apache ,post tagger

Published at DZone with permission of Dhiraj Ray. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}