How to Extract Sentences and Entities From a String in Java
Perform sentence segmentation or extract entities from an input string using Natural Language Processing APIs.
Join the DZone community and get the full member experience.Join For Free
In this article, we will be discussing more great ways to utilize Natural Language Processing. As we have discussed in previous articles, natural language processing combines linguistics and artificial intelligence to perform large amounts of natural language data analysis. Essentially, this technology can simplify the scanning of content by categorizing and organizing it through machine learning. While these rules were formerly coded by hand, automatic learning has improved the process by leveraging statistical inference algorithms to produce models that can process unfamiliar or inaccurate information.
The two tasks that we will be covering today are how to extract sentences and how to extract entities from a string in Java. Extracting sentences from a string can be an incredibly time-consuming operation if you’re trying to parse chunks of text, but with the help of an NLP API, it becomes a quick and easy step. The API will scan the input string and return the separated sentences as individual strings, instantly making the text more readable for you or your customers.
Extracting entities from a string is a similar process for a different type of target. Entities are more complex than sentences as they are characterized not only by relationships but also by additional attributes, which include various identifiers. In most relational databases we have a representation of one instance of an entity type and a representation of an attribute type. As with sentence extraction, entity extraction can be a tedious production, but the information provided by entities such as addresses, URLs, or phone numbers is invaluable for businesses. Using an NLP API will cut down processing time and still allow you to obtain the information.
Now, before we start using either API, we will need to install the Maven SDK. To do this, add a reference to the repository:
Then, add a reference to the dependency:
And this is where our different tasks fork paths. To perform the sentence segmentation action, we will add the imports to the top of the controller, and call the function with the following code:
This will instantly return your separated sentence strings as well as a sentence count.
Now, we will move on to the entity extraction function. This API will draw out the identified entities from the string and will return both the entity type and text as individual strings. To perform this operation, we will once again add our imports and call our function:
To ensure both of the processes run smoothly, you only need to include two parameters:
- Input String – the string you wish to perform the extraction on.
- API Key – to retrieve your free API key, head to the Cloudmersive website and register for an account that will give you access to 800 calls/month across our library of APIs.
And that about does it for this tutorial!
Opinions expressed by DZone contributors are their own.