Over a million developers have joined DZone.

3 AI Fails and Why They Happened

DZone 's Guide to

3 AI Fails and Why They Happened

Advancements in AI are accelerating — but failure is still king. Take a look at 3 major AI fails and learn what went wrong.

· AI Zone ·
Free Resource

In little over a decade, AI has made leaps and bounds. Every day, new headlines showcase the most recent advancement in AI. In fact, advancements are accelerating:

  • 2004: DARPA sponsors a driverless car grand challenge. Technology developed by the participants eventually allows Google to develop a driverless automobile and modify existing transportation laws.
  • 2005: Honda's ASIMO humanoid robot is able to walk as fast as a human, delivering trays to customers in a restaurant setting. The same technology is now used in military robots.
  • 2007: Computers learned to play a perfect game of checkers, and in the process opened the door for algorithms capable of searching vast databases of information.
  • 2011: IBM’s Watson wins Jeopardy against top human champions. It is currently training to provide medical advice to doctors. It is capable of mastering any domain of knowledge.
  • 2012: Google releases its Knowledge Graph, a semantic search knowledge base, likely to be the first step toward true artificial intelligence.
  • 2013: Facebook releases Graph Search, a semantic search engine with intimate knowledge about Facebook’s users, essentially making it impossible for us to hide anything from the intelligent algorithms.
  • 2013: BRAIN initiative aimed at reverse engineering the human brain receives $3 billion in funding by the White House, following an earlier billion euro European initiative to accomplish the same.
  • 2014: Chatbot convinced 33% of the judges that it was human and by doing so passed a restricted version of a Turing Test.
  • 2015: Single piece of general software learns to outperform human players in dozens of Atari video games.
  • 2016: Go-playing deep neural network beats world champion.

Source: Artificial Intelligence Safety and Cybersecurity: a Timeline of AI Failures.

Failure Is King

Failure is at the core of human advancement. For example, the microwave’s invention was a failed attempt at making a military grade radar during WW2. Percy Spencer noticed a melting chocolate bar in his pocket while working on magnetrons for Raytheon, a major U.S. defense contractor.

“From that embarrassing accident came a multimillion dollar industry — and one of the great twin blessings and curses of the American kitchen.” Wired

More recently, major corporations have begun to embrace the value of failure. Unsurprisingly, Gatorade’s brand strategy is all about winning. It’s been at the core of their marketing efforts for the past decade. What’s more surprising is their latest campaign:

AI Is No Different

Being a recent advancement in which many efforts are still considered R&D investments, notable failures are emerging. This is a normal path towards improvement. To list a few:

While most of these accidents or mishaps are not directly linked to the AI itself failing, they are all linked to AI in some way. For example, in the self-driving car accident, it is impossible to state if the AI is responsible for the accident, but we can ask the question: Could the accident have been avoided if the AI was better-trained? It goes to show that as we move forward with AI development, we must be extremely confident in the algorithms making decisions on our behalf particularly when these decisions have complex variables and consequences can be fatal as in the case of driverless cars.

Let’s take three recent and largely covered examples of AI failures.

Tay, Microsoft’s (Racist and Bigoted) Chatbot

The most recognized failure in AI this past year has easily been Tay:

“[...] an artificially intelligent chatbot developed by Microsoft's Technology and Research and Bing teams to experiment with and conduct research on conversational understanding. Tay is designed to engage and entertain people where they connect with each other online through casual and playful conversation. The more you chat with Tay the smarter she gets, so the experience can be more personalized for you.”

Tay was an attempt in Natural Language Understanding (NLU). Basically, its learning algorithms were set to read, interpret, and adapt to written content that you were feeding it. The goal was to personalize and personify interactions with a robot. It is a key strategic advancement many tech giants would like to see accomplished. The goal was to be something in the likes of Her, and we can clearly see why. In the high tech sector, there are usually three pillars to commercial success: acquisition, engagement, and conversion. Having a fully human and personal experience that can pass a rigorous Turing test would redefine how we go about creating engagement.

Alexa Mistakenly Offers Porn to Child

When the child in the video tells Alexa to "play ‘Digger, Digger,’" Alexa answers, "You want to hear a station for porn detected...hot chick amateur girl sexy..." (full article).

Some would argue that this isn’t considered an AI failing; rather, it is the voice command. While those people would be right, keep in mind that Alexa is trained in voice recognition with machine learning.

Inspirobot Gives Questionable Advice

My personal favorite, InspiroBot, is designed to provide you with a daily dose of inspiring quotes. Ironically, the fact it regularly fails at creating motivational messages will most likely lighten up your day. At Arcbees, we enjoy the occasional dark humor joke. We had a couple good laughs when these came up:

It is only appropriate to follow up these quotes with another:

“Those who cannot learn from history are doomed to repeat it.” — George Santayana

In cases like these, AI fails can be entertaining, but I’m more interested in the reason they have failed and what we can learn from it.

Why Did They Fail?

It could be because of precision, context, or training.


AI has shown it can produce results in all industries — from predicting insurance sales opportunities and reducing medicine research times to automating production lines, optimizing transportation routes, and much much more. While its domains of applications are far reaching, successfully putting AI into production requires a having very specific problem to solve.

For example, fraud detection can be treated in a very specific manner when used in conjunction with a neural network that has few input and output terms. Your output terms (transactions) can be limited to fraudulent or non-fraudulent. Remember, in this type of a situation, you're modeling an algorithm to correctly classify data into two classes. With such a limited number of possible outputs, it is easier to model an algorithm that will efficiently classify transactions.

Tay failed in part because of its lack of precision. The desired output, other than grammatically correct interactions, was not bound to clearly defined parameters. That's the challenge, though; human interaction is not precise. People participating in the Tay experiment used different vocabulary and syntax producing dispersed and largely variable entry data, making it difficult to build coherent results.


For all three examples and for AI in general, context remains a challenge. Context is an extension of precision in some regards but still merits a place in the conversation especially in cases where humans interact with AI. If you chat with Tay, ask Alexa for information, or look to InspiroBot for motivation, you are set in a context where time, place, emotion, weather, identity, company, etc. will impact how you interpret and appreciate the provided outcome.

A classic example would be, "Hey Siri, call me an ambulance," and she replies: "OK, from now on, I will call you Ambulance." It succeeds in automating a task but fails in understanding the context in which the task is given.

Tay failed to act as a respectful conversational virtual agent because both its training and interactions were subject to unlimited contexts. It was able to identify words and build minimally coherent responses but wasn’t able to understand the meaning of those words and the weight they carried throughout a conversation. Virtual agents do work, however. When they are specific to a context, for example reporting an accident to your insurer, the subject matter, possible questions and answers are far less ambiguous. Although, businesses usually opt for a decision tree model when creating these agents.

Similarly, InspiroBot fails because positivity is in the realm of context. Its content, while generic, is sufficiently rich and descriptive to nourish our interpretation of the possible applications of its advice. It successfully creates quotes but lacks the intelligence to understand the content, meaning, and possible interpretation of its quotes.


While neural networks are capable of back propagation to adjust their ability to produce the desired result, they are bound to do so with the data and parameters they are trained with.

You’ve probably heard the expression: garbage in, garbage out. With Tay, this played a major role in its failure. Instead of training the chatbot behind closed doors and in a controlled environment before releasing it to the world, Tay was designed to learn while interacting with the open public. Everything went haywire in less than 24 hours because tech-savvy communities (notably the 4chan and 8chan) thought it would be interesting to feed the learning algorithms with questionable content. Needless to say, they succeeded.

With Alexa, it’s a little different. Commands are already set to trigger the appropriate response. Alexa’s training aims to understand what commands to trigger according to the audio bit it captures. Its success lies in matching commands with a wide range of vocabulary, syntax, pitch, tones, rhythms, accents and pronunciation for it to be of any value. The hard part is balance — using a large enough variety of audio patterns to match the world’s diversity while and being specific enough to match them with the correct commands. Pushing for this balance may also mean a bigger margin of error and this why Alexa failed in this case. With more training, Alexa could be taught to identify a child’s voice and prompt a parental control if necessary.

If InspiroBot were to use fewer words, template sentences, pre-validated optimistic vocabulary it would be easier to increase performance in creating what we consider to be a motivational quote. However, it would also defy the purpose of using AI. Oversimplification of the parameters negates the use of machine learning as it becomes simpler to model an algorithm without it.

Embracing Failure

We learn from our mistakes. It’s for this reason that we should embrace our failures in AI. If we really do believe in the advancement of AI as a community, we should share, discuss, analyze, and experiment with failure. Have you seen or experienced any failures in AI? Feel free to share them in the comments section or hit me up on Twitter.

ai ,machine learning ,ai fails ,alexa ,chatbot ,neural networks

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}