Avoiding AI/Chatbot Failures

DZone 's Guide to

Avoiding AI/Chatbot Failures

The ecosystem around AI and chatbots is growing rapidly. As you begin to define your AI strategy, consider your quality requirements at the beginning to ensure success.

· AI Zone ·
Free Resource

New AI-driven user interfaces are the most recent trend in digital transformation. Not surprisingly, serious companies are leveraging it to provide new, innovative, customized, and streamlined services to their audience, while expanding their learning about the end user. For end users, chatbots represent a streamlined interface to accomplish a task in a fun, engaging, informative manner "on my time and my device." Over 50% of users would rather interact with a brand over messaging rather than over the phone.

Growing User Expectations

Of course, the expectation for a correct, responsive, engaging experience grows. Users think nothing of the challenges of customizing the experience and making it continuous for each user. The concept of maintaining context across conversations is possibly trivial for humans — less so for machines!

Therefore, failure to deliver to these expectations has a significant penalty. It's one thing to have chatbots fail once in a while; twice can be deadly. There are broad dependencies involved — the AI engine, its ability to maintain and tie context, speech recognition, and (one of the more difficult areas) handling sarcasm and slang.

Image title

Avoid Quality Failures

As another interface into the application, chatbots/AI is another area that requires a methodical testing approach to avoid the embarrassing brand news. Whether you're a newbie or an expert, here are some tips for planning your process.

Standard Stuff 

When designing and selecting your test tool, keep the following considerations in mind.


Your tool will need to provide access to all the smartphones, tablets, desktop browsers, IoT devices, and whatever other forms of UI your app will have. Within all those interfaces, all relevant functions will need to be supported. For example, if the user needs to use fingerprint authentication to log onto the app before starting to use the chatbot, that needs to be supported.


To eliminate any delays to the development process and ensure defects are fixed quickly, you want the phones/tablets/browsers to be available reliably 24/7 so you can run your tests reliably at any time — maybe on every code commit!

Test Automation

Realistically, if you want quality, you want test cases to be created with the requirements at the launch of the sprint. This means you're eliminating any manual testing as much as possible and automating as much as you can, ideally on standard open-source interfaces such as Appium and Selenium. 

Scalability and Parallel Executions

If you're successful, your AI/chatbot does a lot of things, so there's a lot of testing to achieve in a limited timeframe. Your lab should allow parallel executions in a way that scales reliably (meaning that failures don't increase because now you're scaled the executions).

Reporting at Scale

If you have many scaled executions, you need a solution that will address two challenges: 

  1. See the forest from the trees. Let's say I run 10K scripts nightly (that's not uncommon) and I have 5% failure. That's a lot of failures to plow through. I'll quickly identify common defects across those and then quickly drill in to find the root cause. 

  2. Visibility to quality across to organization. The more the app becomes the front of the brand, the more people will be impacted by the quality of the app. The reporting solution needs to cater to business owners, marketers, product owners, etc. to make decisions regarding go/no go to launch the next build, decisions about business investments, etc.

Fun Stuff

This is the more unique stuff!

Speech Functions

Generally, chatbots will respond to both text input/output as well as voice. Clearly, the former can be tested using standard testing tools with limited effort and in a scalable and reliable manner. When looking at speech input/output, consider both scale and coverage.

  • Speech testing solution. You want a solution that can handle both text (string) input and output, as well as an actual voice file. Text can be converted to speech and injected to the chatbot; a voice from the chatbot can be recorded and converted to text; and then it can be validated.

  • Priority. When you're not doing straight text entry/validation, I recommend using the text/string approach (and have the solution convert the text to speech on the fly) for the majority of test cases. Validation can be done on the text the chatbot presents as a response or you can record the audio, turn it into text, and validate. In this way, you can create a dictionary of sentences that will create data-driven testing and will allow you to scale the testing.

  • Testing voice imperfections. Unavoidably, you will need to test stutter, accents, and other unique scenarios. For that purpose, ensure your test solution allows the intake of voice files. As it is more time-consuming both to create/maintain those files and execute the test suites, I recommend allocating the minority of testing time for these scenarios.

Speech-enabled test lab

Advanced Dictionary for Sarcasm, Slang, and Random Statements

Unlike almost-predictable, somewhat controlled input, interactions over open text fields can include text, numbers, currency, images, and media. Here are some thoughts how to address those, ordered by priority.

  • Functional testing of what's risky and sensitive. For example, if you're working for a bank, you want to match your chatbot/AI test priorities to those of the app. These are functional tests that ensure the chatbot drives the right functions as needed.

  • Security. Secure your app/backend from text entry "stings," etc.

  • Test the main flows according to the provided functionality, expected text entry, and response.

  • Test session context maintenance. There are two approaches here: 

    1. Extend the interval between user entries to see if the chatbot is losing context and requires restarting the conversation.

    2. Mix queries by starting a new conversation in the midst of another to see whether the context is maintained as expected.

  • Slang/sarcasm/random statements. This is the trickiest piece. Unfortunately, there is no good "black box" testing for this scenario (the test needs to know the implementation). The recommended approach is to have very small unit tests for each scenario, and expect a significant number of those. As far as random statements go, use analytics to define popular scenarios and define test cases for those.

Image title

Get Started

The entire ecosystem around AI and chatbots is going to grow rapidly. Businesses, call centers, offerings, customers interactions, and commerce are all going to be impacted. Beyond functionality and streamlined experience, the quality of the service is what will be key to success or failure. As you begin to define your AI strategy, make sure to consider your quality requirements from the beginning to ensure success.

ai, bot development, chatbot

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}