DZone
AI Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > AI Zone > How to Use the Apache Open NLP POS Tagger

How to Use the Apache Open NLP POS Tagger

The Apache Open NLP POS Tagger is used to mark up text to be processed by natural language processing and NLP. Read on to learn how to use it!

Dhiraj Ray user avatar by
Dhiraj Ray
·
Jul. 15, 17 · AI Zone · Tutorial
Like (1)
Save
Tweet
5.39K Views

Join the DZone community and get the full member experience.

Join For Free

As per Wikipedia, POS tagging is "the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context — i.e. its relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc."

To begin, any part of speech is tokenized — it is divided into tokens and then these tokens are tagged as per grammar rules by NLP for further processing. Tagging is the basic pre-processing of any POS for text retrieval and text indexing. You can see an Apache Open NLP POS tokenization example here.

To get started with OpenNLP tagging, first we include following dependencies in the pom.xml file.

<dependency>
    <groupId>org.apache.opennlp</groupId>
    <artifactId>opennlp-tools</artifactId>
    <version>1.8.1</version>
</dependency>

OpenNLP provides a pre-trained model called en-pos-maxent.bin for any POS tagging. For tagging any POS, we first load en-pos-maxent.bin. The following lines of code will load this model.

public void initialize() {
 try {
  InputStream modelStream = getClass().getResourceAsStream("/en-pos-maxent.bin");
  model = new POSModel(modelStream);
  tagger = new POSTaggerME(model);
 } catch (IOException e) {
  System.out.println(e.getMessage());
 }
}

After the tagger is initialized, we basically tokenize any POS and apply tags on the tokenized string. Here is an example:

public void tag(String sentence) {
 initialize();
 try {
  if (model != null) {
   POSTaggerME tagger = new POSTaggerME(model);
   if (tagger != null) {
    String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE
     .tokenize(sentence);
    String[] tags = tagger.tag(whitespaceTokenizerLine);
    for (int i = 0; i < whitespaceTokenizerLine.length; i++) {
     String word = whitespaceTokenizerLine[i].trim();
     String tag = tags[i].trim();
     System.out.print(tag + ":" + word + "  ");
    }
   }
  }
 } catch (Exception e) {
  e.printStackTrace();
 }
}

The output will be similar to following for a sentence like Otri is from Mars and she loves coding. 

NNP:Otri VBZ:is IN:from NNP:Mars CC:and PRP:she VBZ:loves .:coding.

And that's it! Next time, we'll look at the Standford NLP POStagger with Maven.

NLP

Published at DZone with permission of Dhiraj Ray. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Build a Java Microservice With AuraDB Free
  • A Concise Guide to DevSecOps and Their Importance in CI/CD Pipeline
  • No Sprint Goal, No Cohesion, No Collaboration
  • Developing a Cloud Adoption Strategy

Comments

AI Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo