This Is How You Test Voice Apps in An Automated Fashion
Take the example application from the grocery app Bring! Shopping List to learn to develop tests for voice-based apps.
Join the DZone community and get the full member experience.
Join For FreeVoice-First Testing — The Big Picture
We face different challenges when testing voice apps than when we test GUI apps. For instance, GUI apps limit the number of possible interactions a user might do. Voice, on the other hand, allows a much richer and complex set of spoken interactions, increasing the difficulty of testing. Additionally, the backend behind voice apps includes several components not owned by developers. These AI-powered elements are constantly learning and evolving by gathering insights from the myriad of interactions they receive. This is why they get constant updates and improvements, which requires us to keep up on our side by doing continuous verification to be sure nothing has broken, and that our app continues to deliver great voice experiences to our users.
We are witnessing a notable increase in the complexity of voice applications as a result of the effort companies do to provide enriched experiences that allow users to solve real and day-to-day problems. In this scenario, testing voice apps is a must.
It doesn’t matter if your approach to testing follows the popular waterfall model (requirements, analysis, design, coding, testing, deployment) or test-driven development practices (TDD). In any case, it's vital that you find the bugs in your code before your customers do. A voice app free from errors is the key to ensuring that your users enjoy the content you’re offering.
In this article, we will introduce tools that support all aspects of quality assurance for Alexa skills or Google Actions. We’ll start by making a brief introduction to the tools and defining an expectation for each type of test you have to perform. Then we’ll explore a real-life example to help you get started with voice app testing. Let’s begin!
The Voice App Testing Layers
Unit Testing
This type of test is targeted to voice app developers. They need to do unit testing to ensure the code is working correctly in isolation. As you perform unit testing while coding, you need it to be fast, so it doesn’t interrupt your coding pace. Unit testing is focused on making sure your code and logic are correct so there is no need to hit the cloud (where most voice apps backend lives), or call real external services. It is important that the unit testing tool you choose supports mocks and preferably runs locally.
End-to-End Testing
This type of test is targeted to QA teams. It ensures the entire system is working correctly — AI (ASR + NLU), code, and external services. The tests should be comprehensive and if possible easy to write, read and maintain. As closely as possible, we want to imitate what real users do in their everyday interactions with voice apps.
Continuous Testing (Monitoring)
This type of test is targeted to Ops people and is to ensure that a service, once deployed, works flawlessly. This kind of testing verifies your voice app on a regular interval. It should be easy to set up and the results should be delivered immediately when the voice app has stopped behaving as expected.
Usability Performance Testing
Usability Performance Testing makes sure that the AI components of the app are working correctly. It is targeted to Product Managers and developers. The main goal is to identify issues with the speech recognition and NLU behavior of the assistant and your app. It consists of comprehensive testing of the interaction model, creating a baseline set of results. These results, in turn, are the basis for making improvements to the interaction model and the code of the skill - once revisions are completed, additional tests are run to ensure everything is working as expected.
End-To-End Testing, a Practical Example
Let’s continue with this article with an example of how to perform end-to-end testing for voice apps. But first, some prerequisites.
Installing Bespoken CLI
To be able to use Bespoken’s automatic Voice Test Scripts you first need to install our CLI. To install, run this on your command line (you need NPM first, get it here):
$ npm install -g bespoken-tools
Get a Bespoken Virtual Device
Bespoken Virtual Devices are the core components to perform end-to-end testing and monitoring. A Virtual Device works like a physical device, such as an Amazon Echo or Google Home, but with the important difference that it exists only as software. With the Bespoken Dashboard, you can create and manage a fleet of Virtual Devices. Follow this guide to create your Virtual Device.
Now we are all set, let's continue.
Bring! Shopping List is an excellent voice app to manage your grocery store lists. The skill is super intuitive and very well done, it has an Alexa skill available in English US/GB and German. It also has an Android and iPhone app and a web interface, so you can check your lists whenever and wherever you go. Let’s use it to show you how to get started with End-to-end (e2e) testing.
The main functionalities Bring! offers are:
Add articles with “Open Bring and add two liters of milk.”
Remove articles with “Open Bring and remove milk.”
Let Alexa read the articles on your list with “Open Bring and read my list.”
Change the list with “Open Bring and change my list to Home.”
Let's focus on the first three: Add/remove items from the list and read the contents of it.
Based on these three key functionalities we create our test plan: three test scripts, which as you can imagine are linked to three intents in the skill (we can also create one script to test the Launch Request).
Setting Up the Voice Test Scripts Project
In order to make it easy to maintain, we recommend creating a project for your test scripts. You can use your favorite IDE. This is what our demo looks like:
If your voice app supports multiple languages, you can create a folder for each locale. Inside each folder, we have the test scripts for each intent (or functionality).
Then we have the testing.json
file, which is the configuration file for the test execution:
{
"type": "e2e",
"homophones": {
"bring": ["ring", "bing"],
"lettuce": ["let us"],
"figs": ["six","6","vicks"],
"carrots": ["carried"]
},
"platform": "alexa",
"trace": false,
"jest": {
"silent": false,
"testMatch": ["**/test/*.yml", "**/tests/*.yml", "**/*.e2e.yml", "**/*.spec.yml"]
},
"virtualDeviceToken": "alexa-xxxx-xxxx-xxxx-xxxx-xxxx"
}
In this file, we tell Bespoken Tools this is an end-to-end test script ("type": "e2e"), and we define the Virtual Device token to be used, the platform for the test execution, and some other configuration options that can be reviewed here.
Note: To help you get started please download or clone this e2e test project from our GitHub repository.
Creating End-to-End Test Scripts
This is what a test script looks like:
---
configuration:
locale: en-US
voiceId: Joey
---
- test: Launch request, no further interaction
- open bring:
- prompt:
- welcome to bring
- how should we start
- how can i assist you
- how can i help
- what can i do for you
---
- test: Launch request, one interaction (card not present in skill, just for demo purposes)
- open bring:
- prompt:
- welcome to bring
- how should we start
- how can i assist you
- how can i help
- what can i do for you
- reprompt: undefined
- card: undefined
- what is on my list:
- prompt: you have the following items on your list
- cardTitle: "This is the content of your list:"
- cardContent: Milk\nEggs
- cardImageURL: "https://s3-eu-west-1.amazonaws.com/bring-images/images/1200x800.jpg"
It begins with a configuration section where you define your locale and the Amazon Polly voice to use when doing Text-To-Speech (check the available voices here; some voices work better than others in certain locales).
Then we have sequences of interactions, each sequence represents a test scenario, meaning a piece of functionality we want to verify.
As you will notice there are two parts on each line separated by a colon (:), the left part is the utterance we execute against the skill. Then, the right part is the expected result (also called the transcript or prompt).
When we run this script we will execute each intent and compare the actual result with the expected one, and then show the outcome of the test.
The syntax we use is based on YAML and allows to have several nice things when defining the transcripts like:
Wildcards: By default, we do a partial match. What does this mean? If the expected result is “Welcome to my skill” and the actual is “Hi, welcome to my skill, hope you enjoy it,” the test will succeed. It will also succeed if we have “welcome * skill.” This is a very useful way to focus only on specific words or phrases when running the tests.
Lists: Knowing it’s a good practice to make skills conversational, it’s common to have variable responses — any of them should succeed when comparing it with the actual response. To allow this we have Lists. For example, the next interaction shows the prompt with several valid options, if the actual response has any of them, the test will succeed.
- test: Launch request, no further interaction
- open bring:
- prompt:
- welcome to bring
- how should we start
- how can i assist you
- how can i help
- what can i do for you
Cards: If your skill generates a card in the response, we can also test that. It is as simple as adding the card fields and the expected result in the transcript like this example:
---
- test: Launch request, one interaction (card not present in skill, just for demo purposes)
- open bring:
- prompt:
- welcome to bring
- how should we start
- how can i assist you
- how can i help
- what can i do for you
- reprompt: undefined
- card: undefined
- what is on my list:
- prompt: you have the following items on your list
- cardTitle: "This is the content of your list:"
- cardContent: Milk\nEggs
- cardImageURL: "https://s3-eu-west-1.amazonaws.com/bring-images/images/1200x800.jpg"
As you can see, the Card object is located below the prompt. Take into account that here the match is exact, so capital letters and punctuation count. To add a change of line use “\n” as shown in the example.
Executing the Voice Test Scripts
$ bst test en-US\removeItems.e2e.yml
The result of the execution looks like the image below (green means success, red means there was an issue):
If you prefer to run the entire set of scripts for the en-US locale you can execute this:
$ bst test en-US\
Start Testing Your Voice apps!
Now it is time for you to create test scripts for your Alexa skills or Google actions. While this article should provide an overview, we hope this only whets your appetite for a deeper dive on testing and automation. Reach out via with questions or to learn more.
Happy testing!
Opinions expressed by DZone contributors are their own.
Comments