Implement a Custom Alexa Skill Using Spring Cloud Microservices (Part 1)
Dive into this series teaching you how to create and implement your own custom skills for Alex with the help of Spring Cloud and microservices.
Join the DZone community and get the full member experience.
Join For FreeIn this series of articles, we will implement a custom skill for Amazon Alexa virtual assistant using microservices. The microservices system will be implemented with Spring Cloud.
Amazon Echo is a smart speaker system developed by Amazon. It provides a voice-driven user interface to interact with speech applications. For example, Echo can play music from streaming services, read audio books, control smart home devices, and provide weather, traffic, and news.
A basic element of Echo is the Alexa virtual assistant. In addition to Amazon Echo, Alexa supports several other Amazon devices, such as Fire TV and Fire HD 8. Various manufacturers are also partnering with Amazon to integrate Alexa into their products such as Pebble and LG.
A built-in capability in Alexa is called a skill. The Alexa Skills Kit is a set of APIs to extend capabilities of Alexa by developing custom skills. A custom skill could integrate with an external software service. The diagram below shows a user interacting with a software service via Amazon Echo and Alexa.
The user speaks to Echo to initiate the custom skill deployed in Alexa. Then, the user and Alexa enter a conversation regarding the custom skill. For example, the skill could be answering game scores, requesting a taxi cab on behalf of the user, or turning on lights in the house. The skill maintains a conversation with the user according to user's utterances. At the same time, the skill executes custom logic based on the conversation and communicates with the external software service. Based on responses from the service, the skill creates text responses which are read to the user by Echo.
In this article, we will implement a custom skill that integrates with a medical record management service to record or query patient vitals such as temperature, pulse and blood pressure. The skill will support three main conversations:
User requests Alexa to read the most recent vitals of a specific patient. Alexa obtains that information from the medical record management service and reads it back to the user via Echo.
User requests Alexa to read the names of the patients who have abnormal vitals, i.e. vitals outside the normal range. Alexa obtains that information from the medical record management service and reads it back to the user via Echo.
User requests Alexa to record vitals for a specific patient. Then, the user starts reading the vitals for the patient. Alexa records the vitals and sends them to the medical record management service for saving those records in permanent storage.
The skill will be implemented in Java and deployed into Amazon Web Services (AWS) as a Lambda function. The medical record management service will be implemented as a Java microservice using Spring Cloud and deployed into Amazon Elastic Compute Cloud (Amazon EC2) environment.
The code can be downloaded from https://github.com/kunyelio/Amazon-Alexa-Custom-Skill.
Technical Overview
Creating a Custom Skill With the Alexa Skills Kit
A custom skill consists of an interaction model and its functional implementation. Below we introduce those concepts. (For a complete guideline to create custom Alexa skills using Alexa Skills Kit we refer to this Amazon reference.)
Interaction Model
The interaction model defines the intents and sample utterances Alexa should expect from the user. An intent could be viewed as a request or response during a conversation. Because the user could express the same intent in many different ways, the interaction model should consist of as many sample utterances as possible for a particular intent, in order to ensure Alexa correctly recognizes the intent. For example, a particular intent could be requesting the list of patients in the hospital whose latest vital measurements are in abnormal range. Let us assume the name of the intent is AbnormalVitalsIsIntent. The following could be sample utterances for the intent:
Read me the patients with abnormal vitals.
Get me the patients with abnormal vitals.
Who are the patients with abnormal vitals?
Are there patients with abnormal vitals?
An intent could contain parameters, called slots to represent specific information e.g. a patient's medical record number that uniquely identifies the patient in the hospital. A slot is represented by a name and a data type. Utterances for the intent should indicate where a slot is expected.
As an example, assume that a particular intent is requesting the latest vital measurements for a particular patient, named ReadVitalIsIntent. A patient is identified by a medical record number, mrn. The following could be sample utterances for the intent.
Read me the vitals of patient {mrn}.
Read me the vitals of {mrn}.
What are the vitals of patient {mrn}?
What are the vitals of {mrn}?
Amazon provides a web-based user interface, configuration console, to configure the interaction model of a custom skill where the intent schema, a JSON formatted list of intents and corresponding slots, can be manually entered. In the same web interface, a list of sample utterances should also be supplied. Below is a portion of the intent schema for the sample application in this article. Notice that ReadVitalIsIntent contains a slot, mrn, which is of type AMAZON.NUMBER.
{
"intents": [
{
"intent": "AbnormalVitalsIsIntent"
},
{
"intent": "ReadVitalIsIntent",
"slots": [
{
"name": "mrn",
"type": "AMAZON.NUMBER"
}
]
},
...
]
}
The following figure shows the configuration console for the custom skill where the intent schema is entered.
The below figure shows the configuration page for the custom skill where sample utterances are supplied. Each line corresponds to a particular utterance where the utterance follows name of the intent it is associated with. (The sample utterances for AbnormalVitalsIsIntent and ReadVitalIsIntent are highlighted.)
Other Attributes of a Custom Skill
When configuring a custom skill the following information is supplied.
Name of the skill.
Application ID, which is required to link the skill's interaction model with the functional implementation.
Invocation name of the skill. For related information see this Amazon reference.
Type of the skill, one of custom skill, smart home skill and flash briefing skill. In this article, our focus is the custom skill. Additional information on skill types can be found in this Amazon document.
Language(s) supported by the skill. In the sample application, we will support English (U.S.) only.
Whether or not the skill supports audio player directives, for streaming audio and monitoring playback progression. The sample application does not support audio player directives. For a detailed discussion of this topic, see the AudioPlayer interface documentation.
The following figure shows the configuration screen in Amazon's web-based interface where those attributes are set for our sample application.
Functional Implementation
The functional implementation of a custom skill handles user requests modeled based on the interaction model and generates the corresponding responses to be read back to user via Echo. The implementation can be done using either a web service or an AWS Lambda function. While implementing via a Lambda function, it is possible to use Node.js, Java or Python. In this article, we will use a Java-based Lambda function.
Amazon provides a Java library to streamline implementation of a custom skill as a Lambda function. The implementation utilizes a SpeechletRequestStreamHandler as an entry point to receive the requests. Then, SpeechletRequestStreamHandler dispatches the request to a Speechlet, which provides the main business logic for the custom skill, maintains a conversation with the user, and interfaces with an external software service if needed.
For further details on functional implementation of a custom skill using web services or Lambda functions see this Amazon reference documentation.
In many cases, a custom skill may provide a voice-based front end to an already existing software service. In our sample application, there is a medical record management service to query or update patient vitals, that has been implemented via microservices. It exposes its API via the following REST endpoints:
Obtain vitals of one or more patients via name matching.
Obtain vitals of one particular patient via medical record number.
Obtain patients with abnormal vitals.
Set vitals of a particular patient.
The functional implementation of the custom skill will introduce a new, voice-driven user interface for the medical record management service utilizing the existing REST API.
Microservices
Microservices are a programming paradigm to build distributed software systems. A microservices architecture focuses on loose coupling and localized, incremental changes. A microservice is a small unit of application code and related configuration that can be quickly developed, tested and moved to production. Although a microservice could depend on some other service, it should be possible to build, unit test, package, deploy and start/stop a microservice independently. Some basic features of microservices are the following. (For a more detailed discussion of those topics see this whitepaper.)
Microservices can be brought from concept to production quickly.
Microservices provide visibility and transparency to their intrinsic state so that their application state and health can be easily monitored via external tools, e.g. Spring Boot Actuator.
Microservices support configuration via an external, centralized configuration service, e.g. Spring Cloud Configuration Server.
Microservices can register themselves with a service registry, e.g. Netflix Eureka, so that they can be searched and located by their clients via discovery services.
Multiple instances of a microservice can be deployed to support fault tolerance. A microservice architecture enables client-side load balancing, e.g. using Netflix Ribbon, for clients of a microservice to access multiple instances of the microservice in a load-balanced manner. A microservice itself could utilize client-side load balancing while accessing other services it has a dependency on.
In addition, microservice architectures support:
integration with cluster configuration and coordination services, e.g. Apache Zookeeper, for managing microservices in a clustered environment.
synchronization of message distribution and processing to allow asynchronous communication between microservices, e.g. via Spring Integration.
commonly used authentication and authorization technologies such as OpenID, OAuth and SAML.
In this article, the medical record management service (the external software service to be used by our custom Alexa skill) will be implemented using Spring Cloud microservices architecture.
Application Architecture
The following diagram shows the interaction between components of our sample application.
Name of the Alexa custom skill is Patient Monitor, a Java-based Lambda function. It is deployed as a jar file and consists of the following main classes:
MonitorSpeechletRequestStreamHandler extends com.amazon.speech.speechlet.lambda.SpeechletRequestStreamHandler.
MonitorSpeechlet implements com.amazon.speech.speechlet.Speechlet interface.
Patient is a plain old Java object (POJO) that represents a patient's data.
The medical record management service consists of three modules, web, patient and registration, that are all implemented as Spring Cloud applications.
The registration module comprises a Eureka discovery server. It functions as a service registry where services can register themselves and service clients can lookup services.
The patient module is a microservice that encapsulates access to a patient database. It provides the core business functionality to query and update patient vitals. The individual instances of the patient module register themselves with the registration server. The patient module exposes its services as REST calls.
The web module encapsulates access to the patient service. It queries registration module to obtain a list of instances implementing the patient service. Then, it uses a load-balanced HTTP client, Netflix Ribbon, to choose an instance from that list while accessing the patient service.
We show the deployment diagram below.
Each of the registration, web and patient modules is deployed into an individual server in an Amazon VPC (virtual private cloud). Those modules are inside the same network, not directly accessible from public internet. The Lambda service Patient Monitor is hosted in an environment proprietary to Amazon. We configured the security rules for the VPC to let Patient Monitor access the same network as the registration, web and patient modules. (For configuring a Lambda function to access resources in a VPC, see this Amazon reference.)
Now that you've gotten a look at the methodology and the architecture, stay tuned for our next post, which will dive into the code itself.
Opinions expressed by DZone contributors are their own.
Trending
-
Micro Frontends on Monorepo With Remote State Management
-
Five Java Books Beginners and Professionals Should Read
-
Which Is Better for IoT: Azure RTOS or FreeRTOS?
-
Essential Architecture Framework: In the World of Overengineering, Being Essential Is the Answer
Comments