Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Use the Apache Open NLP POS Tagger

DZone's Guide to

How to Use the Apache Open NLP POS Tagger

The Apache Open NLP POS Tagger is used to mark up text to be processed by natural language processing and NLP. Read on to learn how to use it!

· AI Zone ·
Free Resource

Did you know that 50- 80% of your enterprise business processes can be automated with AssistEdge?  Identify processes, deploy bots and scale effortlessly with AssistEdge.

As per Wikipedia, POS tagging is "the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context — i.e. its relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc."

To begin, any part of speech is tokenized — it is divided into tokens and then these tokens are tagged as per grammar rules by NLP for further processing. Tagging is the basic pre-processing of any POS for text retrieval and text indexing. You can see an Apache Open NLP POS tokenization example here.

To get started with OpenNLP tagging, first we include following dependencies in the pom.xml file.

<dependency>
    <groupId>org.apache.opennlp</groupId>
    <artifactId>opennlp-tools</artifactId>
    <version>1.8.1</version>
</dependency>

OpenNLP provides a pre-trained model called en-pos-maxent.bin for any POS tagging. For tagging any POS, we first load en-pos-maxent.bin. The following lines of code will load this model.

public void initialize() {
 try {
  InputStream modelStream = getClass().getResourceAsStream("/en-pos-maxent.bin");
  model = new POSModel(modelStream);
  tagger = new POSTaggerME(model);
 } catch (IOException e) {
  System.out.println(e.getMessage());
 }
}

After the tagger is initialized, we basically tokenize any POS and apply tags on the tokenized string. Here is an example:

public void tag(String sentence) {
 initialize();
 try {
  if (model != null) {
   POSTaggerME tagger = new POSTaggerME(model);
   if (tagger != null) {
    String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE
     .tokenize(sentence);
    String[] tags = tagger.tag(whitespaceTokenizerLine);
    for (int i = 0; i < whitespaceTokenizerLine.length; i++) {
     String word = whitespaceTokenizerLine[i].trim();
     String tag = tags[i].trim();
     System.out.print(tag + ":" + word + "  ");
    }
   }
  }
 } catch (Exception e) {
  e.printStackTrace();
 }
}

The output will be similar to following for a sentence like Otri is from Mars and she loves coding

NNP:Otri VBZ:is IN:from NNP:Mars CC:and PRP:she VBZ:loves .:coding.

And that's it! Next time, we'll look at the Standford NLP POStagger with Maven.

Consuming AI in byte sized applications is the best way to transform digitally. #BuiltOnAI, EdgeVerve’s business application, provides you with everything you need to plug & play AI into your enterprise.  Learn more.

Topics:
nlp ,ai ,tutorial ,apache ,post tagger

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}