Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Using the Actions SDK for Google Assistant Development

DZone's Guide to

Using the Actions SDK for Google Assistant Development

Let's dive into the Actions SDK to see how you can use it to build Google Home actions. Build an app from start to finish to see how it works.

· IoT Zone
Free Resource

Discover why Bluetooth mesh is the next evolution of IoT solutions. Download the mesh overview.

In today's post, I'm going to show how to develop a simple app for the Google Assistant. For developing this app, I will be using the Actions SDK. My next post will use Dialogflow (formerly api.ai) instead. After reading both posts you hopefully will know enough to decide which approach is better suited for you.

About the App

In my hometown, I usually take the bus when taking our youngest to kindergarten before heading off to work. In winter and also due to some long-lasting street works, one of the two lines near our home is quite late all the time. But naturally unreliably so. So I wonder every morning when to leave for the bus. We do have an app for bus information in Münster — but some limitations of this app make it not really suitable for the task.

Thus an assistant app that tells me when the next buses are due and whether we should hurry up or whether we can idle around a bit more is what I'm going to build. The first version in this post with the Actions SDK. Then, in the next post, I'll build an improved version with Dialogflow.

Here's a sketch of what a conversation with this app should look like:

Sample dialog of the app
Sample dialog of the app — Made with botsociety.io


On the right is what I'm going to speak. On the left are the assistant's answers. The first is by the assistant itself — I changed the color for that one to emphasize the difference. The second and third are responses of the app. After the last response, the app finishes — and with this also the current assistant's action. Thanks to botsociety.io, you can even see this website showcasing this prototype.

What Is the Actions SDK?

The Actions SDK is one way to develop apps for the Google Assistant. As the name of the SDK implies, you can create actions which basically are speech snippets of your user and your response to them. Let's go a bit more into the base concepts here.

The relevant things you need to know are:

  • Actions
  • Intents
  • Fulfillment

Actions

An action is a wrapper combining an intent and the fulfillment for this intent. For a truly conversational app, where there's a dialog between the user and the assistant, apps have multiple actions. But often it makes sense to create a one-shot app. In this case, the app has only one action and answers directly with the response the user is looking for. The sample app would work well as a one-shot app. I only added one more action to make it a better sample for this post.

Intents

We already know intents from Android. And with Actions on Google, they are not much different. An intent specifies what the user intends your app to do.

In contrast to Android, though, you are very limited when it comes to intents with the Actions SDK.

Every app has to define one actions.intent.MAIN intent, that is used to start up your app, if no custom intent is better suited to the invocation phrase. Depending on your kind of app, this intent might directly provide the answer and then stop the app. Or it might kind of introduce the app to the user in some way.

For the start of your app you can also define custom intents and which user utterances trigger that intent. You might even add some parameters to that (for example numbers or a date). And if the Assistant detects that one of those phrases was used by the user, it starts up your app using this intent.

If you were to build a more complex app with the Actions SDK, you would use one intent most of the time: actions.intent.TEXT. Meaning: You have one intent that has to deal with nearly all possible user input. And thus your app would have to decide what to do based on the text spoken or typed in by the user.

There are a few more intents available. For example, android.intent.OPTION, which is useful if you provide a set of options to the user, of which they might select one. But when dealing with user input, most of the time, you would have to deal with actions.intent.TEXT.

This limitation is the reason why I've written in my intro post about the Assistant, that the Action SDK is mostly useful when you are proficient with natural language processing or have a simple project with a command line interface.

Unless you have some language processing superpowers, it boils down to this:

  • Use the Actions SDK for one-shot apps. These are apps that provide the required answer directly after being invoked and then stop. They give you just this one result. A typical example would be setting a timer to ten minutes.

  • For all other apps, for those that are really conversational, where there are multiple paths to follow and where you want your user to provide more information during the conversation, I recommend using Dialogflow (api.ai) instead of the Actions SDK. I will cover Dialogflow in my next post.

Fulfillments

Fulfillment is just another word for your backend. Every action must be backed by some backend of yours. This is where you generate the text spoken to the user. Or — on devices that have a graphical user interface — that's where you create visual cues for your user.

Of course, the response is based on which intent was triggered and what additional cues the user gave in her question.

When Google calls your backend, it provides a JSON file that contains plenty of information as you will see later in this post. Some of the things you get access to:

  • The spoken text as Google has detected it
  • The intent invoked
  • The parameters to the intent
  • The session id
  • Information about the device's capabilities

In this post, I'm only going to make use of some of them, but I'm going to cover more aspects in future posts.

The next picture shows where your app — or more precisely, your fulfillment code — fits into the overall picture. The assistant knows which app to call based on the invocation phrase used by the user. It also provides voice recognition of the user utterances and text to speech transformation of your answers to the user. Your app then takes the recognized text and creates an appropriate response for that.

Action SDK workflow
Action SDK workflow — picture Google, Inc.


Preparations

When you want to use Actions on Google, your account must be prepared for that. You have to do three things:

  1. Create a project within the Actions on Google Dev Console
  2. Install the gactions command line tool
  3. Initialize the project using the gactions tool

In the next paragraphs, I'm going to cover those steps.

Create a Project Within the Action on Google Dev Console

The very first step is to use the Dev Console to create a project. If it's your first project check the terms of the services — since by clicking the "Create Project" button you signify to adhere to them:

Creating a project on the dev console
Creating a project on the Actions on Google Dev Console


Afterwards, you have to decide which API to use to build your project:

Actions on Google Api / SDK Selection
Actions on Google Api / SDK Selection


As you can see, there are three prominent ways to develop apps: The Actions SDK or one of the two services Dialogflow or Converse AI. If you click on "Actions SDK", you will see a tiny popup where you can copy a gactions command. So the next step is to install the gactions command line tool.

Install the gactions Command Line Tool

You can download the gactions tool from Google's site. It's a binary file you can store, wherever you see fit. Make sure to make this file executable (if you're using Linux or Mac OS X) and - since you're going to use this tool a lot — I recommend to add it to your PATH.

If you already had this tool installed, you can use the selfupdate option to update it:

gactions selfupdate


That's all that is needed in this step.

Initialize the Project Using the gactions Tool

The last step is to simply initialize your project. Afterwards you're set to start developing actions using the Actions SDK.

Simply create some folder in which you whish to create the project and then switch to this folder.

Within this folder enter:

gactions init


If you have never run gactions before, it will require authorization to access the Actions Console. Simply follow the steps outlined on the command line. After pasting your authorization code into the command line, Google gets your access token which is used for all subsequent communication between the gactions tool and the Actions Console. If you ever want to use another account, simply remove the creds.data file.

The gactions init command will create a file named action.json. This file contains the configuration of your project. In its current state, its content is not valid. It contains plenty of <INSERT YOUR ... HERE> markers. You are going to fill those in in the remainder of this post.

With this, you have finished the preparations and are good to go. So let's start creating an action.

The Config File Explained

Let's start by looking at the actions.json file that was just created for us:


{
	"actions": [
		{
			"description": "Default Welcome Intent",
			"name": "MAIN",
			"fulfillment": {
				"conversationName": "<INSERT YOUR CONVERSATION NAME HERE>"
			},
			"intent": {
				"name": "actions.intent.MAIN",
				"trigger": {
					"queryPatterns": [
						"talk to <INSERT YOUR NAME HERE>"
					]
				}
			}
		}
	],
	"conversations": {
		"<INSERT YOUR CONVERSATION NAME HERE>": {
			"name": "<INSERT YOUR CONVERSATION NAME HERE>",
			"url": "<INSERT YOUR FULLFILLMENT URL HERE>"
		}
	}
}


As you can see, some objects match the concepts explained above. At the top you have the actions object, which contains an array of action objects - each of which contains one intent. The second object is named conversations, which contains a map of conversation objects. I don't know why it's named "conversations", but basically this object covers the description of where to find the app's fulfillment.

You can see the full specs in the reference section for Actions on Google.

The Config File of the Sample App

The sample app has one intent to deal with the start of the app — the actions.intent.MAIN intent.

This app is a good example for a one-shot app - meaning it would provide the answer directly with the actions.intent.MAIN intent and finish itself directly after giving the answer.

To make this tutorial more useful, I've added the possibility for the user to acknowledge the answer of the app before it ends itself and gives the control back to the assistant. And - if the user didn't understand the answer - she can ask for a repetition.

Thus there is one intent of type actions.intent.TEXT that deals with both of these follow-up responses by the user and that also provides an answer in case the app doesn't understand the user's intention.

The resulting actions part looks now like this:

{
	"actions": [
		{
			"description": "Default Welcome Intent",
			"name": "MAIN",
			"fulfillment": {
				"conversationName": "do-i-have-to-rush"
			},
			"intent": {
				"name": "actions.intent.MAIN",
				"trigger": {
					"queryPatterns": [
						"talk to let's rush",
						"let's rush",
						"do i have to rush",
						"do i have to hurry",
						"start let's rush"
					]
				}
			}
		},
		{
			"description": "Everything Else Intent",
			"name": "allElse",
			"fulfillment": {
				"conversationName": "do-i-have-to-rush"
			},
			"intent": {
				"name": "actions.intent.TEXT"
			}
		}
	],
	"conversations": {
		"do-i-have-to-rush": {
			"name": "do-i-have-to-rush",
			"url": "<INSERT YOUR FULLFILLMENT URL HERE>"
		}
	}
}


The intent represents the type of information the user tells the app. You first have to specify a name for this intent and then you have to define how this intent can be triggered by the user. So basically what the user has to say, in order for the assistant to select this intent.

As you can see from the JSON above, the trigger contains an array of possible phrases. You can use it for custom intents — but also for the initial, the main intent. My sample app has the name "Do I have to Rush". But the user might use other phrases to trigger your app. In the prototype shown at the beginning of this post, I am using "Let's rush". And that is working as well because I added that phrase to the trigger section of the intent.

For actions.intent.TEXT intents you do not need to specify trigger phrases — since this intent gets called automatically for follow-up phrases of the user.

The above file is still incomplete. While the fulfillments link to a conversation, the conversation's URL is still not set. I will add that information after I've created the function in the next section. I am using Cloud Functions for Firebase — and the URL can only be set after deploying for the first time.

About time to delve into the fulfillment code.

The Fulfillment Code

In the config file, you specified how many and which intents you have. In your fulfillment, you have to provide the responses to these intents. That is: You tell the assistant what to say to the user and — if the device is capable of those — what visual elements to show the user.

For the fulfillment's backend, I am going to use Firebase's Cloud Functions with Javascript. Please see my introduction tp Cloud Functions for Firebase and especially the section on setting up your Cloud Functions project to create the necessary files and project structure.

If you want to follow along set up your project as described in my previous post.

The Actions on Google Client Library

Google provides us with an Actions on Google client library for JavaScript. For the code in this tutorial, I'm making use of that lib.

Note that there is also an unofficial Kotlin port available. I haven't had a look at it since I prefer to use Cloud Functions. Anyway: If anyone has experience with it, please let me know in the comments or on social media.

Assuming you have created the necessary structure with firebase init the next step is to add Google's library to your project.

The simplest way to do so is to use npm install within the functions folder of your project:

npm install actions-on-google


You will get a result telling you the number of installed packages. And your package.json file afterwards contains this additional dependency:

{
    //...
    "dependencies": {
        "actions-on-google": "^1.5.0",
        //...
    },
    //...
}


1.5.0 is the most recent version while writing this post. But when you read this, the newest version might very well have a higher version number since Google is eagerly working on everything related to the Google Assistant.

If you want to update to a newer version, you can update the version using

npm update actions-on-google


The Fulfillment Function

When it comes to the actual function code, you have to do three things:

  1. Initialize the Assistant library
  2. Create a mapping from intents to functions
  3. Write a function for every intent

Your Function After firebase init

You should already have code similar to this in your project if you set up Cloud Functions correctly:

const functions = require('firebase-functions');

exports.shouldIRush = functions.https.onRequest((request, response) => {
   
}


Add the Google Actions on Google SDK

The very first step is to add the actions-on-google dependency at the top and to create the ActionsSdkApp object within the function:

const functions = require('firebase-functions');
var ActionsSdkApp = require('actions-on-google').ActionsSdkApp;


exports.shouldIRush = functions.https.onRequest((request, response) => {
    let app = new ActionsSdkApp({request, response});
    
}


Note that you are using a specific part of the actions-on-google lib. You only need the ActionsSdkApp. In my next post about Dialogflow, this will change and you will use the DialogflowApp instead. You must not forget the specific part you are interested in nor must you use the wrong one. One thing you need to know: In its samples, Google usually uses an ActionsSdk alias for the ActionsSdkApp class.

When you have added the lib and create the object, you can use the app object to query for information about the request and to provide the answer to the user.

Create the Mapping From Intents to Your Handler Functions

As you've seen in the action.json file, the sample project has a main and an allElse action. So first, let's create empty functions for those two action's intents. Next, we create a mapping from the two intents to these two functions. Note that the mapping is based on the intent name and not the action name. Finally, call app.handleRequest() with this mapping to allow the lib to do its magic:

exports.shouldIRush = functions.https.onRequest((request, response) => {
            let app = new ActionsSdkApp({
                request, response
            });

            function handleMainIntent() {}

            function handleTextIntent() {}



            let actionMap = new Map();
            actionMap.set(app.StandardIntents.MAIN, handleMainIntent);


            actionMap.set(app.StandardIntents.TEXT, handleTextIntent);



            app.handleRequest(actionMap);
        }


Write a Function for Every Intent

Now I'm not going to show all I've done within those functions. I'm simply highlighting some specific aspects that are important to understanding the Actions SDK.

For my intents, I simply call the use case:

function handleMainIntent() {
    console.log(`starting mainIntent - ${getDebugInfo()} - at: ${new Date()}`);
    usecase.callApiAndPrepareResponse(new UsecaseCallback());
}

function handleTextIntent() {
    console.log(`starting textIntent - ${getDebugInfo()} - at: ${new Date()}`);
    usecase.handleFollowUpPhrases(app.getRawInput(), new UsecaseCallback());
}


The use case generates the response that the assistant should present to the user. With those texts, it then calls the appropriate methods of the callback object. The callback object passed to the use case is of this type:

class UsecaseCallback {
    answerNormalRequestSuccessfully(answerText) {
        app.ask(answerText);
    }
    answerNormalRequestWithErrorMessage(errorMessage) {
        app.ask(errorMessage);
    }
    endConversation() {
        app.tell('See you!');
    }
    reactToUnknownPhrase() {
        
        
        app.ask('I\'m sorry, but I\'m not able to help you with this.');
    }
}


So sometimes I call app.ask() with a text and in one case I call app.tell(). Even though the former is named "ask", it doesn't necessarily mean it has to be a question. In my case, the answer is presented using ask(). But ask() keeps the conversation alive. It basically expects another user reaction. The tell() method, on the other hand, doesn't expect a user reaction. In fact, it closes the conversation and passes the baton back to the Assistant itself.

Of course, there are more options available than just to ask or to tell, but I won't go into them.

From the code above you can see, that I also have a function to generate some debugging information:

function getDebugInfo() { 
    return `user: ${app.getUser().userId} - conversation: ${app.getConversationId()}`;
}


While the usefulness of the userId depends on a few factors, the conversationId is nearly always useful. For one, you can use it to store conversation state on your backend. Since you only have limited possibilities to keep a conversational state with Actions on Google this id can be used as a key for that.

A second good use for the conversationId is for tracking down problems by using this id within all of your log messages and thus enabling you to join all log messages of one conversation together.

Method / Attribute

Type

Use

 A few important methods and objects of the ActionSdkApp object 

ask

method

Presents text to the user and awaits a textual answer

tell

method

Presents text to the user and ends the conversation

getAvailableSurfaces

method

Returns a set of Surface objects. Those currently can be

SurfaceCapabilites

.AUDIO_OUTPUT or

SurfaceCapabilites

.SCREEN_OUTPUT

askForPermission

method

Requests the specified permissions from the user

getDeviceLocation

method

The location of the device or

null

if no permission has been requested/granted

buildCard

method

One of the many methods to present a visual output if the devices

supports

that

StandardIntents

object

Contains all available standard intents as constants (e.g. StandardIntents.TEXT)

SupportedPermissions

object

Contains all available permissions as constants (e.g. SupportedPermissions.NAME)


Deploy the Fulfillment Function to Firebase

To deploy your fulfillment function, simply call:

firebase deploy


The deployment lasts a few seconds and when it's done, you get the function URL, you can use in the conversations section of the action.json file. Simply copy this URL and replace the <INSERT YOUR FULFILLMENT URL HERE> placeholder with your URL:

{
    //...
    "conversations": {
        "do-i-have-to-rush": {
            "name": "do-i-have-to-rush",
            "url": "https://us-central1-gcloinudProjectId.cloudfunctions.net/yourFunction"
        }
    }
    //...
}


Deploying an App to the Actions Console

If you have deployed your code to Firebase and configured your action.json file, all you have to do is to deploy your app to the Actions Console.

Actually, all Google needs is your actions.json file — or files in case of internationalized apps. It's not as if you deploy code to the Actions Console. You only define your actions and where to find your code.

So to upload the newest version of your file simply issue this command:

gactions update --action_package action.json --project yourActionsConsoleId


Now if you do not know the Action Console id anymore, simply go the Actions Console, select the project, and click on the gear icon. Then select "Project settings" to see the project id:

Find out the id via project settings
Select the project settings to find out the id of your project


Testing the Actions SDK-Based App

Google provides a web-based testing tool for actions which is really a good way to test and demonstrate your app:

Running your app in the emulator
Running an assistant app in the web-based emulator


You can test your app also on your Google Home, your Android or iOS devices and on any other device that offers the assistant (for example the AIY based devices). To do so, those devices must use the same account and must match the locale of your test.

But before you can test, you must enable the test. The easiest way is to use the command line:

gactions test --action_package action.json --project yourActionsConsoleId


But you can also enable testing in the Actions Console. Simply go to your project and click on the "Simulator" link in the menu on the left and than click the "Start Testing" button.

Enable testing via the Actions Console
Enable testing via the Actions Console


While writing this, I do not see this button for one project or get an error when trying to enable testing via command line due to a glitch on Google's side. As I've mentioned, Google is working a lot on Actions on Google. The downside of this is that there are occasional hiccups. One this week, one last week. In both cases, though, the response by the Google team (alerted via the Actions on Google community on Google+) was quick and helpful.

Sample Project

For more detailed code see my ActionsSDK sample project on GitHub. On GitHub, you find the full implementation of the sample app outlined at the beginning of this post.

After cloning the sample project, you have to install all the packages. Switch into the functions folder of the project and issue the following command:

npm install


The project only contains the cloud function's code and the action.json file. So be sure to follow the relevant steps of this tutorial for deploying the function and deploying the project to the Actions Console.

Summary

Creating this project was fun. But it also made obvious the current limitations with the Actions SDK approach. To judge for yourself have a look at the timetable-usecase.js file. That's not the best and most flexible way to react to user phrases

Thus for anything other than one-shot apps I recommend to use Dialogflow. In one of my next posts, I'm going to improve upon this app by switching to Dialogflow.

Ok,  Google  dear reader, what's your take on this? What are you going to do with the Actions SDK? Please let me know.

Until next time!

Take a deep dive into Bluetooth mesh. Read the tech overview and discover new IoT innovations.

Topics:
iot ,google home actions ,actions sdk ,intents ,tutorial

Published at DZone with permission of Wolfram Rittmeyer, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}