The Problem With Conversational Interfaces: The Conversation
The Problem With Conversational Interfaces: The Conversation
Let's explore the problem with conversational interfaces. First, look at what the conversation is, then look at the types of conversations, and lastly, look at examples.
Join the DZone community and get the full member experience.Join For Free
The most visionary programmers today dream of what a robot could do, just like their counterparts in 1976 dreamed of what personal computers could do. Read more on MistyRobotics.com and enter to win your own Misty.
This title probably looks contrarian at a glance (so is my last post), but I truly believe we are largely misunderstanding what a natural language interface to our applications should look like. Here are my thoughts on the role of conversation in NLU/P systems.
What Is the Conversation?
Let’s define what we mean by conversation in the context of NLU/P systems. First off, conversation happens between two or more participants (computers talking to themselves at night is outside of the scope of this blog). Second, the conversation is a sequence of two or more sentences that are tightly coupled to each other by their context and time.
For the sake of simplicity, we leave outside the human’s ability to hold multiple conversations in the same time with different participants as well as our ability to hold the “same” conversations over the long period of time (
“I’ve been talking to my mom for years…”).
We also leave out the trivial confirmations (
“Are you sure you want to close your account?”) and clarifications (
“Do you mean London, OH or London UK?”). Although technically they constitute a conversation, they are rather of little interest here. Note that as humans, we also don’t regard these as real conversations.
Types of Conversations
They are many different types of conversations that academic linguists can name, but for the purpose of our argument, lets simply separate all conversations into two large categories:
- (1) Drill-down question and answers. A typical example would be a rule-based medical or support chatbot where the system needs to gather a significant amount of hierarchical or otherwise organized information that cannot be expressed in a single sentence. It would rumble on and on with tedious questions and clarifications until it gets all necessary information. UX is typically very linear and cannot be changed.
- (2) All other. Just about any conversation you have on a daily basis with your friends, kids and spouses, co-workers, customers and business counter-parties. These conversations are not usually about a strict gathering of information. UX is typically non-linear and such conversations can change the direction at any time.
This categorization may sound crude at the first glance. In fact, as humans, we have dozens of conversations every day and almost none of them qualify as type (1). Even when we go to a doctor’s office (using the same medical example) we typically have much more nuanced conversation than (1) involving emotions, small talk, general information exchange, etc. In other words, almost none of the conversations we have in real-word a straight down questions-and-answer (Q&A) ones.
Yet, the absolute majority of conversational NLU/P systems today are (mis)designed for exactly this — straight down Q&A — type (1) of conversation. By doing so, they are forcing end users into awkward robotic Q&A sequences often mimicking famous 1–800 telephone numbers with their press “0” for that, press “1” for this, etc. In fact, most of today’s chatbots are nothing more than a textualized versions of 1–800 telephone numbers.
Do We Really Need the Conversation?
The answer is yes, of course. In modern NLU/P systems, we need to support type (2) conversations and try to minimize type (1) conversations.
It’s been shown definitively that conversations are more of a social and emotional mechanism rather than the information gathering one. Surprisingly, from a pure technical point of view, the conversation is often a communication crutch, i.e. a mechanism to gather the necessary information missing in the previous sentence. In an ideal world, everybody would always speak in a fully loaded, properly constructed sentence without any need for additional information. In a real world, however, it doesn’t happen every time.
The key point I’d like to make is this:
NLU/P systems do need to fall back to Q&A conversation only when the original sentence is missing necessary data. By default, however, NLU/P systems should be able to understand the whatever sentence is given, in whatever form and grammar as long as it can be equally understood by a human, and extract all information from it.
There’s also an important psychological point here. For the foreseeable future, humans will easily detect when they interact with a computer. Ask 10 people on the street and all 10 will tell you that they absolutely abhor having a conversation with a computer or a robot. We have an automatic rejection of these “conversations” because we feel they are fake and made up. It will probably change, but we are probably a few generations before that time.
A Few Examples
Here’s some good examples of spoken or textual natural language interface with a minimal amount of conversation. These are very simple intents that can be readily implemented using IBM Watson, Microsoft LUIS, Amazon Lex or DataLingvo:
“I need UberXL from my home to JFK scheduled for tomorrow at 7:15am”
“Done. Your ride will be $95. We’ll notify you when your ride is ready.”
”Ugh… actually n/m, cancel this ride”
“Your ride is cancelled.”
“What’s the price for UberX from JFK to my home on Wednesday night?”
“Your ride will be between $70 and $95 depending on conditions.”
“Can I get all my transactions from Amazon for the last 6 months above 20 dollars (excl. prime membership fees)?”
“Click on this <<link>> to get the requested transactions list.”
“I need to change asap my flight to tomorrow the same time”
“Flight UA1234 to Houston. Rescheduling fee is $250. Using the same card you purchased with?”
“No, use my points for that”
“1000 points, ok?”
“Hm, no, use the card”
“Your flight rescheduled. You’ll receive email with confirmation.”
NOTE: in the last example, it is reasonable to ask for payment clarification instead of relying on default. If the user would have said
“… using my points”, no clarification would be required.
Here is a more complex example, which would stress any given NLU/P systems. Imagine a voice-controlled marketing/sales analytics system:
“What’s the average user retention for the east coast on a 2-week cohorts?”
“What about monthly cohorts and how it compares to the west cost for the same time period?”
“Correlate with opportunities in $100K-$500K range from US Q118 pipeline.”
NOTE: in the above example, the system understands fairly complex intents and doesn’t require or burden the user with unnecessary and redundant Q&A — it just takes a normal spoken language and replies. In the same time, it is smart enough to maintain the context of the conversation.
Here’s the dry residual: we need to strive to minimize the amount of “noisy conversations” — those that I called type (1) Q&A ones. To achieve that, we need to improve our support for a free-form natural language comprehension, which would naturally eliminate the need for Q&A noise in the first place.
Published at DZone with permission of Aaron Radzinski , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.