Chatbot for eCommerce
When a user sends a message, it might contain invalid characters. Therefore, we need to remove them to get the actual keywords to help us reply to the user correctly.
Join the DZone community and get the full member experience.Join For Free
There are various scenarios that chatbots cover these days, for example, customer support which I wrote an article about it before, you can find it here. In this article, you will learn about how a chatbot reply to a search message.
The use case scenario is when a user is looking for an item and requests that via a chatbot on your website or mobile app. Then chatbot parses the message and based on the keyword, replies to the user with a search result in which the user can choose one of the items.
I used Java and ApacheOpenNLP to build this chatbot. In the following steps, you will learn how a chatbot parse a message:
Remove Invalid Characters From the Message
When a user sends a message, it might contain some invalid characters. Therefore, we need to remove them to get the actual keywords that help us to reply to the user correctly.
Here is an example of a cleaned up the message:
In Java you can use this regular expression to remove invalid characters:
Then we need to tokenize the message using OpneNLP Tokenization which is the process of chopping the given sentence into smaller parts (tokens) is known as tokenization. In general, the given raw text is tokenized based on a set of delimiters (mostly whitespaces).
- Processing searches.
- Identifying parts of speech.
- Sentence detection.
- Document classification of documents.
In the following code, we first train the tokenizer, using TokenizerMe and TokenizerModel.
TokenizerME — This class converts raw text into separate tokens. It uses Maximum Entropy to make its decisions.
Entropy in machine learning is a measure of uncertainty (1 is completely certain and 0 is completely uncertain).
Then we tokenize the input message:
The following image demonstrates the tokenized message:
After tokenizing the message, we need to detect the type of each token and remove those token that is not helpful. I explained that in the next step.
OpenNLP Part of Speech
Detect the parts of a given sentence and tag each tag belongs to which type, noun? Verb? Adverb? Adjective?
Here is the code I used:
POSTaggerME — This class predicts the parts of speech of the given raw text. It uses Maximum Entropy to make its decisions.
After that we specify the type of each token, we remove those that are not necessary:
Then we have actual keywords and we could return the relevant result to the user:
Opinions expressed by DZone contributors are their own.