Here’s Why Developing Natural Language Interface Is Hard
Here’s Why Developing Natural Language Interface Is Hard
Look at why NLI (Natural Language Interface) is often so hard to do. Explore a semi-trivial example to show that the NLI presents a formidable problem to solve.
Join the DZone community and get the full member experience.Join For Free
In this article, I would like to show why NLI (Natural Language Interface) is often so hard to do. To illustrate this idea I’ll use a semi-trivial example to show that even for this simple use case the natural language interface presents a formidable problem to solve.
Any Chance of Rain?
For our example, let’s imagine we want to build (yet another) weather bot. Our bot will answer weather-related questions for a given city and a date range. It will also support past, present, and future (forecast) weather requests. Our goal is to support natural language interface to our bot as close to human cognition as possible, i.e. as if our users would be talking to a real human being — trying to achieve that elusive free-form natural language comprehension.
Let’s start with simple and obvious examples of the requests we need to support:
"What’s the current weather in New York?"
“Show me San Jose's forecast for the next 5 days."
These requests seem rather trivial to encode using common intent-based matching. You basically have to detect three main entities:
A weather request indicator
A city name
A date range
Once you’ve built the model (in whatever tool you prefer) to detect all three types of entities, you can relatively quickly build an action (i.e. an intent callback) that would return a weather information for a given city and given date range.
Most of the tutorials and examples stop right here. However, this is far from being even remotely equal to how real humans converse about weather.
Pretty obvious initial improvements one would need to make is to assume that city and date range elements are optional. Indeed, if city isn’t present the user is likely asking about her current location, and if date is not present she’s asking about the current date. These seem to be reasonable assumptions:
What’s my current weather?
-> result for the current date and current location
What’s Chicago's weather?
-> result for the current date and city of Chicago
Any chance of snow this Friday?
-> result for the city of Chicago and this Friday
However, these assumptions have to be processed in a special way through conversation management.
Another thing you’ll notice right away is that you need to support conversational context. Frequently, when people inquire about the weather, they don’t just ask a single question but often have follow-ups. For example:
What’s the current Moscow weather?
-> result for Moscow today
Hm, what about tomorrow?
-> result for Moscow tomorrow (Moscow is taken from the conversation)
Any chance of rain?
-> result for precipitation in Moscow tomorrow (Moscow and date are taken from the conversation)
While in everyday life these seem rather trivial, the programmatic logic for supporting this type of conversation management is far from trivial. For example:
- When does the conversation switch context and previous context should be “forgotten”?
- When does the conversation time out?
- Which parts can or cannot be taken from previous sentences?
Depending on the framework you use, this can be a significant project on its own.
Yet another problem you’ll discover pretty quickly as you let users play with your bot is that your current model doesn’t distinguish between these two sentences:
What’s the local Moscow weather?
What’s the local weather?
In the first example, the user is clearly asking about current Moscow weather, while in the second she’s asking about her current location. But then it conflicts with the conversation support we discussed above because the city element “Moscow” is optional and we can pick it up from conversation context, which should make second example equal to first! We have a contradiction.
That’s where things get complicated and naive conversation management doesn’t cut it anymore. The one rule you can possibly come up with to bypass this dilemma is this: if there is a word “local” (or its semantic siblings) and there’s no city in the current sentence — then user is asking about the weather at her current location; otherwise — fall back to default conversation management.
As a side note, your NLP toolkit should clearly disambiguate between New York (state) and New York (city), Moscow (Russia) and Moscow (USA, ID), etc. It should also support common slang and abbreviations like LA (for Los Angeles and not for State of Louisiana), Big Apple, NYC, SF, etc.
Another, more subtle, problem arises when we try to deal with date ranges. Look at these examples:
What’s my current forecast?
-> result for the default 5-day forecast from today
What’s the precipitation forecast for Sep 25 - Sep 30th?
-> result for given date range
What was the ice storm forecast last week?
-> result for the last week
All examples have word “forecast” meaning future by default. However, the second example also specifies an explicit data range. Yet third example has word “forecast” but is asking about past date range. The situation gets even more confusing when we account for conversational context.
You can probably come up with some basic set of rules:
- Word “forecast” assumes standard 5-day forward weather forecast
- Word “past” assumes some date range looking back, say past 3 days
- Default forecast/past date ranges are overridden if there’s explicit date range in the user request
Another complication is all about weather request indicator we’ve mentioned at the very beginning.
Essentially, a weather request is some form of a question about a meteorological condition. In my own model for this type of bot, I have almost 10,000 different ways to express that, which makes it almost impossible to just “train” the model in supervised fashion. You need some formalized way to effectively encode this model that would allow for proper versioning, testing, future extension, etc.
Make sure that whatever tool you select to build this bot you are not asked to list all these 10,000 utterances manually!
If you are somewhat confused by now, it’s absolutely fine. You have to be. The problem is that even for this trivialized example the free-form natural language interface is rather a non-trivial task. A lot of people jump head first into creating different NLI/NLU apps and chatbots just to realize that users hate the interaction experience because it doesn’t match the human cognition by a long shot. Technology is still developing in this space.
Opinions expressed by DZone contributors are their own.