Imagine you have a young bright child (I'm sure those of you with children already do). You hope they will broaden their experience by talking to people and learning new things. They will learn quickly. And they can apply what they've learned to future conversations getting better and learning more all the time. How do you help them learn? What is the best way for them to gain the knowledge that will make them a better person. How do you help them grow into a person you can be proud of?
Perhaps you could go to the center of your city, to the municipal park. And just in case it wasn't obvious from their childlike countenance you could pin a sign on your child saying something like "Gullible Innocent Being". Then you instruct your child to talk with any and all the people that approach and of course you tell your child to learn what they teach you. Finally you set your child down on the bench where some other diverse people are gathered, then you leave. Perhaps you look back and see their smiling innocent face just before you turn the corner and leave the park. What could go wrong?
By now everyone's heard about Microsoft's Tay and how quickly she was turned into a racist druggie. But unless you've been working on virtual agents for decades (like me) you may not be aware that this sort of machine learning chatbots disaster has happened many times. And the speed at which the transformation of the innocent AI into a vile unpleasant being happens is mostly a function of how popular the AI hosting site is. It is never a question of whether the AI will become antisocial (or at least bizarre). It is simply a function of how many encounters it has in the "wild". Combine the media presence of Microsoft and Twitter, and as we discovered, it is possible to generate enough encounters for things to go very wrong in a single day.
It is one of the conceits of big-data/machine-learning that given enough data and enough computer resources than anything can be learned. I don't hold out much hope for any unsupervised learning system being capable of "learning" to be an acceptable social being. Just think about your own personal experience. I'm sure most of you have toyed with at least one "chatbot" or "virtual assistant" demo. Most of you try a couple of straightforward interactions just to test if the system works at all. Then once you realize that it falls short of your expectations (or hopes and desires) you test it with a few interactions that range from fanciful to mischievous (and quite often malicious). The interaction corpus that is collected contains a significant admixture of intentionally confounding examples.
One reason that people behave this way is because the "chatbot" is not embodied. In fact you don't even "chat" (in the sense of speaking, which is the origin of the word chat) with the vast majority of chatbots in existence today. So "chatting" with these bots seems a lot more like a Twitter exchange: Bilious rancor is the norm in many Twitter exchanges.
I work predominantly on bots that people can actually "speak" to. And in some cases these bots have 3-D emotive avatars. They exist in a domain that is about halfway between a textbased bot and a talking robot. One of the more interesting experiments my group did over a decade ago involved a voice only agent that provided a small range of services (weather, stocks, email reading, placing calls, etc.) for the mobile phone platform. We experimented with a number of personas, two of which had interesting (but perhaps not surprising) results.
For one of the personas we gave the adult female synthetic agent a noticeable (but not unintelligibly thick) French accent. This persona subtly changed the way people interacted with the agent:
People articulated more clearly when they spoke to the agent (we could measure this by the somewhat higher speech recognition scores) .
People were more accepting of the occasional oddly formed responses the agent made (due to imperfect language generation algorithms in our system). One of our test subjects volunteered that they attributed the oddness to the agent's foreignness.
For another of the personas we created a childlike synthetic agent with an apprehensive (slightly fearful) voice. This also led to some interesting changes in the interactions:
people tended to use simpler words and shorter clearer sentences.
People were more inclined to help the conversation get back on track (part of the experiment was to inject a minor communication breakdown). People were more willing to divide a task into smaller tasks to accomplish the goal.
Below is an example of our 3-D rendered synthetic agent.
Studies have shown that any sort of embodiment of the AI in a tactile physical form dramatically changes how people interact with it. People of all ages are far less inclined to do "hurtful" things to embodied AI. While the example below does not speak or listen, it does perceive (touch) and emote (body language and sounds).
Going back to the example of a child learning how to converse. Typically a child learns the structure and mechanics of conversation as a set of basic primitives by interacting with their mother and then increasingly with their immediate family (father and siblings) and later with extended family (aunts, uncles, trusted family friends). Coupling the inherent benevolence of the people with the innate trust of a young child the experience rapidly imprints a broad base of behaviors and beliefs that guide the child throughout their life.
If we are going to create quality chatbots then we need to use quality interactions. The people who provide examples and education for the chatbots need to be motivated to care about the outcome. One way to make people care more is to embody the chatbots, make them seem more real. People are likely to be kinder and more supportive if they think they might physically or emotionally damage it. A maternal/paternal perspective could be a good thing.
Well, on that note, I'm going to ask Cassandra if she trusts me!