Mastering the Game of Go Is Easy: Conversing Like a Kid Remains Intractable
Mastering the Game of Go Is Easy: Conversing Like a Kid Remains Intractable
We're often in awe of ML systems that outclass us at chess or the game of Go. So, why do our conversational AI systems fail in ways that a child never would?
Join the DZone community and get the full member experience.Join For Free
The most visionary programmers today dream of what a robot could do, just like their counterparts in 1976 dreamed of what personal computers could do. Read more on MistyRobotics.com and enter to win your own Misty.
It wasn't that long ago that Deepmind's AlphaGo proved it could play the game better than the best humans. From the standpoint of the range of possible future moves, the game of Go is not a searchable problem. It represents search spaces that are astronomically larger than all the potential moves in chess. Yet, the individual moves are far simpler and more atomic than chess (and almost any other game) partly because of the incredible simplicity of the rules combined with a giant catalog of hundreds of thousands of human played games. Because it was relatively easy to have it play a large number (countless millions) of games against itself, the game is a good fit for deep learning. Deepmind's network learned where best to place the next stone of the board. In fact, by playing games against itself, it uncovered moves that human experts deemed as unwise and even mistakes. Humans generally do not consider such random odd moves because they would never be able to play enough games in their lifetime to discover that one-in-a-million useful move. AlphaGo did have the time. AlphaGo found new moves. AlphaGo beat the humans.
Boardgames have a lot of turns. It's almost like an extended conversation with a limited well-defined symbolic language. In the game of Go, the language is (almost) equivalent to grunts: white stone here, black stone there, white stone, etc. In fact, it feels as if it were an inherently a binary language.
Chess is a little more sophisticated. I like to imagine each of the different pieces as words, and since some of the pieces move arbitrary distances, I imagine that as adding tense or plural/singular modifications to those words. It is still a very abstract conversation, but I think we can agree that it is richer than the language of Go. It's also more like a conversation because the moves lend themselves to storytelling: my rook is blocking your queen, and if you move your queen, my knight will come to the defense of my bishop. It almost sounds like an improvisational fairytale.
When the first chess programs were written, the idea of learning how to play was in its infancy. Before neural nets were considered for the chess problem, there were straightforward, algorithmic, rule-based, procedural programs that were pretty good and could easily beat the average player (me). It's not surprising that that sort of approach was taken for the first human-computer conversational systems. Almost all of the automated phone systems you've ever spoken with were based on simple rule-based systems not very different from the algorithmic chess interaction style.
If we imagine the trajectory from the game of Go through chess into the realm of role-playing boardgames that are popular today, it becomes clear that there is a progressive qualitative difference. There is far less abstraction and the nature of a win is often vague and ambiguous. The world of the game is much more like the world we live in. The experience doesn't feel as though it's governed by rigid, mathematical-like rules, rather it feels more like values and judgments. There seems to be an infinitude of these role/strategy boardgames. (I believe that between my daughter and her posse they own a third of them!) Many of them focus on building some sort of community such as a medieval village, or a space colony, or a business. They all center around allocating resources (e.g.people, money, corn, rocket fuel, magic crystals). They also include some randomization of outcomes (e.g. how many people get sick, what percentage of the crop fails, does the rocketship blow up) which relegates the predictability of future moves to the realm of statistics. It's almost like quantum mechanics: The player doesn't know if their cat token on the board is alive or dead until they roll the dice.
[Semi-relevant aside: Roger Penrose believes human consciousness involves quantum effects.]
These role-playing games remind me much more of real life human-to-human conversations. Each player attempts to make moves that make sense for their long-term objectives as well as addressing their short-term issues. Sometimes the long-term and short-term are in contention. Often, the other players move to insert themselves into your narrative. The randomized results lead to modified strategies. Comparing this to a human conversation it is somewhat like the subtle misunderstandings we make as we try to convert our conversational partner's acoustic output into words and then convert those words into meanings. Even if you're not aware of it during an actual conversation, it is painfully obvious if you've ever read through a protracted twitter thread. People have more time to craft a reply based on their misunderstanding (honest misunderstanding, not trolling) and the response is usually longer because they have not been interrupted as they would be in a real-time conversation. A nerdy way of thinking of this is that the conversational prediction vector veers at a greater angle and as a larger scalar value: The prediction of where the conversation is going has a larger error band.
Until recently, the only way we could address the complexity of natural conversation was to limit it unnaturally and constrain it to a finite number of well-defined paths. We couldn't support too many paths because of the combinatorial explosion of possible states at any future turn. These constrained interactions seemed pretty mechanical...because they were.
One of the (big) missing pieces of scientific understanding for the study of natural conversation is a workable formalism with which to model an open-ended conversation governed by a vague yet purposeful agenda. Some methodologies have been around for years that purport to digest the individual exchanges (phrases) of the conversational participants into more abstract categories (e.g. speech acts). Current technology, which drives Google Home and Amazon's Alexa, relies heavily on generalizing ranges of phrases into individual categories, which in today's technical vernacular are called intents (a small example of the Google Dialogflow tech for this). Even these methodologies still lead to systems that use highly procedural decision systems to guide the conversation.
[Some research is being done on statistical predictions for transition states, but these only work well with local interaction: The next state. If you want the statistical machine learnable approach to work for longer coherent conversations, then you need a huge number of similar conversations centered around the same domain and agenda. That's the rub (a big rub, and the topic for another article if not an entire book.) This is why mundane, repetitive, call center-based recorded interactions are the focus of this type of research.]
I've touched on this topic before with Siri, Google Duplex, and the IBM Watson debating system, but all of these conversational systems sadly lack most of the capabilities we need. One key capability is our ability to use logic, but not too logically. How many science fiction stories employ the premise of an evil or dangerous computer that destroys itself when the human confronts it with its own illogical behavior? Another capability humans use is to employ previous experiential models to similar but novel experiences. A three-year-old child can drift effortlessly between the concept of a bird "swimming in the air" and the Penguin "flying in the water."
Conversations we find interesting and influential are at the same time unique and one-of-a-kind. We enjoy watching a talkshow host engage with a celebrity, and even if on another occasion the same talk show host speaks with the same celebrity, the conversation is different. It is unique. As a human, you or I could listen to 10 or 20 recorded call center interactions and be able to do a pretty good job as the call center agent for that domain. We somehow can generalize and abstract an unimaginable number of variable future interactions and handle them correctly. Conversational AI doesn't have anything like that.
Conversational AI needs this.
Opinions expressed by DZone contributors are their own.