Debating with an AI...Really?
Debating with an AI...Really?
Does the reality of Google Duplex cause you angst as you try to untangle it from all your unrealistically hopeful dreams and desires? Now IBM is demonstrating a system that debates human beings. How should we think about this?
Join the DZone community and get the full member experience.Join For Free
On Monday, June 18 in an IBM office in a San Francisco, California, IBM demonstrated an AI-based system that purports to hold its own in a debate with a human being. They named the system (wait for it) IBM Debater. For the human opponent, they picked a proven, skilled debater: Noa Ovadia, a college senior who won a debating championship in 2016.
[Me stepping on a soapbox...feel free to skip]
I don't know if this qualifies as a spoiler alert but before we delve into what actually happened, I would like to suggest that the reader take a cool and analytical perspective. In particular, be careful not to overload the interaction with your hopes and dreams for AI. For the record, I have those same hopes and dreams (and possibly more) than you, but my concern is that we often hype these sorts of demonstrations to our friends and associates (who are usually less tech savvy than you the reader). We unintentionally induce our friends to imagine utterly fantastical and totally unrealistic capabilities. I am not in any way implying that these demonstrations are fake. I'm absolutely convinced that they are real interactions. I also believe that these demonstrations (Google Duplex, IBM Debater, etc.) are real advances in the world of AI conversational interaction, but they are incremental advances. They are not giant radical revolutionary steps into the sci-fi future that we are longing for. My disclaimer is not to dampen the excitement of new innovations, but rather to suggest that we tend the slow fire that leads to widespread implementation. The hype around conversational AI is becoming so hot that it will lead to expectations in the general population, which will surely be disappointed. This will lead to yet another AI winter, and I don't want to buy another AI parka. Thank you, I will now resume the regular programming.
[Me stepping off a soapbox]
The IBM Debater program (presumably built around Watson components) was trained/designed to argue on approximately 100 different topics, and the rules for the debate were highly restricted, in fact, they seem more like: Make a long-winded comment on your opponents' long-winded statement, then sum it up. The rules were:
make a four-minute opening statement
listen to the opponents' rebuttal
rebut the rebuttal
make a concluding summary statement of your position.
Clearly nothing like the recent US presidential primary debates!
The topic chosen (by IBM) for this debate was "government support for space exploration." The IBM Debater system argued in favor of support. During the interaction, the program did seem to connect to Ms. Ovadia’s arguments (albeit somewhat jointly and tenuously) and rendered three short speeches/rebuttals using a synthesized (TTS) voice. Here's a snippet from part of the long rebuttal:
“Another point that I believe my opponent made is that there are more important things than space exploration to spend money on. It is very easy to say there are more important things to spend money on, and I dispute this. No one is claiming that this is the only item on our expense list.”
Here is a video clip of some of the actual interaction:
This work has been quietly under development for the past six years and has its roots in the same technology (Watson) that played Jeopardy (beating the human champions). Other subprojects under this Watson umbrella are efforts involving education such as interactive tutoring for children. All of these projects involve language understanding but require different approaches because of the modalities.
Jeopardy was a one-shot, best-fit answer, which is a lot like what Siri and Google do with their phone-based command/response agents.
IBM Debater analyzes larger chunks of language and attempts to understand a richer cluster of details of what is being communicated and ultimately generating a longer, richer summary/response (in a more rhetorical style).
Watson Intelligent Tutor involves more interactions which are shorter and must support a great deal of variation in the navigational path toward the goal (this is more like the Google Duplex system). Here's a short video of the tutor using a natural language, text-based interaction.
There's also another class of conversational interaction systems: the chatbot. Most people use the term Chatbot to refer to any text-based or spoken interaction with the computer, but I reserve the term for aimless and usually inconsequential chit-chat (e.g. talking to a stranger on the elevator or the person next to you in the grocery line). A good example of a real system is Xiaoice in China, which is built on Microsoft technology and is quite good at a couple of minutes of inane conversation. Here's a two-minute video.
One IBM researcher, Noam Slonim, who was helping oversee IBM Debater, admitted that this current technology would only have a meaningful debate about 40% of the time. Witnesses to the debate thought it was noticeable how the underlying system took phrases and sentence structures from its training materials and wove them together to present arguments and rebuttals. I presume there were a huge number of human-human debate transcripts dating back to the early 1960s that were focused on whether or not the government should help subsidize a space program.
Another snippet of the IBM debater's opening statement:
“...inspires our children to pursue education and careers in science and technology and mathematics. It is more important than good roads or improved schools or better health care.”
Google Duplex is also limited to narrow tasks. (It can “schedule hair salon appointments” or “get holiday hours” as well as book restaurant reservations.) Because Google has revealed the system only in brief demonstrations, it is unclear how well it really performs. [My take on Google Duplexe here] Certainly, systems like Xiaoice are a long way from passing the Turing Test, the challenge laid down by the British computing pioneer Alan Turing in the 1950s that asks whether a machine can play “the imitation game” to mimic humans. No one would mistake these systems for a human — at least not after a lengthy conversation.
One of the things that is interesting about this demonstration project is that it's tackling a problem that does not seem to have a business model. I have not noticed much, if any, buzz about building products that can debate with humans. In fact, I personally have been rebuffed when proposing ideas to corporations that even approach the idea of extended unguided exploration through conversation. Virtual assistant searches and commands eventually lead to services (paid for with money or your privacy) or advertising (paid for with your attention), so it's easy to make a business case for them. However, engaging in an intellectual argument (a debate) or an exploration of the topic does not currently present enough payment "hooks" to invite corporate sponsored projects or start-up funding. I speak from personal experience with one of my digital creations, Cassandra.
Perhaps she is doomed by the same curse as her mythological namesake?
I do think the time for this kind of interaction with synthetic beings will eventually come, and I applaud IBM for actively plowing the terrain for this technology. Even Mr. Slonim was willing to admit that IBM Debater had no direct path to a new product or service by stating unambiguously “Debating is not a business.” Future progress in this field may expose new and better ways to understand and process all information such that it can be used in natural and effortless conversation.
Natural conversation between the computer and a human still presents many hurdles.
How the computer:
Ingests the raw data that represents the underlying topic
Interprets what the human says in the context of that topic
Models the human state of mind as it changes
Makes its own subgoals and handles them dynamically
Decides the important concepts for its reply
Generates the natural language response
Applies prosody (speed/pitch/amplitude/duration) to aid in comprehension
Many of these and numerous other related problems are being tackled independently, but for true conversation to occur, all of them must be assimilated into a cohesive coordinated system. I (the eternal optimist) believe that the next couple of years will show serious progress in this direction.
In the meantime, try not to overhype this to your friends. The next AI winter is likely to be very cold, and as I said before, I absolutely refuse to buy an AI parka!
Opinions expressed by DZone contributors are their own.