Over a million developers have joined DZone.

How to Use the Apache Open NLP POS Tagger

DZone's Guide to

How to Use the Apache Open NLP POS Tagger

The Apache Open NLP POS Tagger is used to mark up text to be processed by natural language processing and NLP. Read on to learn how to use it!

· AI Zone ·
Free Resource

EdgeVerve’s Business Applications built on AI platform Infosys Nia™ enables your enterprise to manage specific business areas and make the move from a deterministic to cognitive approach.

As per Wikipedia, POS tagging is "the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context — i.e. its relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc."

To begin, any part of speech is tokenized — it is divided into tokens and then these tokens are tagged as per grammar rules by NLP for further processing. Tagging is the basic pre-processing of any POS for text retrieval and text indexing. You can see an Apache Open NLP POS tokenization example here.

To get started with OpenNLP tagging, first we include following dependencies in the pom.xml file.


OpenNLP provides a pre-trained model called en-pos-maxent.bin for any POS tagging. For tagging any POS, we first load en-pos-maxent.bin. The following lines of code will load this model.

public void initialize() {
 try {
  InputStream modelStream = getClass().getResourceAsStream("/en-pos-maxent.bin");
  model = new POSModel(modelStream);
  tagger = new POSTaggerME(model);
 } catch (IOException e) {

After the tagger is initialized, we basically tokenize any POS and apply tags on the tokenized string. Here is an example:

public void tag(String sentence) {
 try {
  if (model != null) {
   POSTaggerME tagger = new POSTaggerME(model);
   if (tagger != null) {
    String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE
    String[] tags = tagger.tag(whitespaceTokenizerLine);
    for (int i = 0; i < whitespaceTokenizerLine.length; i++) {
     String word = whitespaceTokenizerLine[i].trim();
     String tag = tags[i].trim();
     System.out.print(tag + ":" + word + "  ");
 } catch (Exception e) {

The output will be similar to following for a sentence like Otri is from Mars and she loves coding

NNP:Otri VBZ:is IN:from NNP:Mars CC:and PRP:she VBZ:loves .:coding.

And that's it! Next time, we'll look at the Standford NLP POStagger with Maven.

Adopting a digital strategy is just the beginning. For enterprise-wide digital transformation to truly take effect, you need an infrastructure that’s #BuiltOnAI. Click here to learn more.

nlp ,ai ,tutorial ,apache ,post tagger

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}