Home · Android & Kotlin Tutorials

Building an Action for Google Assistant: Getting Started

In this tutorial, you’ll learn how to create a conversational experience with Google Assistant.

5/5 3 Ratings

Version

  • Kotlin 1.3, Android 10.0, Android Studio 3.5

Update note: This article has been updated to use the Google Actions Library for Java and Kotlin and Android Studio 3.5. The original article was also written by Jenn Bailey.

Are you a fan of Google Assistant? Do you say things like, “OK Google, pass the salt!” at the dinner table? Have you asked Google Assistant random questions to see what it would say in reply?

Google Assistant is a voice assistant from Google found on mobile devices, Google Home, and more. It allows users to perform tasks and get information in a hands-free fashion via a Voice User Interface, or VUI. It allows users another way to interact with your brand other than your Android app. Google Assistant is on more than 500 million devices worldwide, and Google expects that number to reach one billion soon.

Did you know you can write your own actions for Google Assistant? In this tutorial, you’ll learn how to create your own conversational experience with Google Assistant.

VUI and Conversational Design

VUI design lets you create a natural and intuitive conversation for the user that can scale across different surfaces such as screen or speaker only. When designing a conversation, it’s important to consider the following:

  • Is VUI an appropriate means of accomplishing the task the action creates? Here’s a quiz to help you determine whether VUI is a good fit.
  • Who is the audience for this action?
  • What is the personality of your action? For example: An action created for a trendy surfboard shop will have a much different persona than an action created for a high-scale professional clothing company. Look at the difference between these two conversations:

"Aloha! Surf’s Up! Would you like to hear the status of  your custom surfboard order?" "That would be awesome!"

"Good afternoon,  what can I help you find today?" "I’m interested in ladies blouses,  size medium."

Designing an Action’s Conversation

A good conversation design includes the following components:

  • A Happy Path, or the shortest path through the conversation that accomplishes the task.
  • Conversation repair scenarios that allow the conversation to recover and continue in cases where the user says something unexpected or the user is not properly understood.
  • Opportunities for the user to exit the conversation gracefully.
  • Varied greetings when starting the action to keep the experience new and spontaneous for the user.

Getting Started

Get the projects by clicking the Download Materials button at the top or bottom of this tutorial. In this tutorial you’ll write an action to play the raywenderlich.com Podcast right from the assistant!.

You need:

  • An active Google account. Sign up for one here.
  • An Android phone/tablet logged in with the same Google account that you’ll use in this tutorial.

Checking Permissions Settings

First, you need to check some permissions for your set up. Go to the Google Wide Controls page. Enable the following permissions:

Expand Web & App Activity, and check Include Chrome history and activity from sites, apps, and devices that use Google services and Include voice and audio recordings:
Web and App Activity

Also be sure to enable Voice & Audio Activity and Device Information.

Note: It is helpful, but not required to have some understanding of Kotlin.

Creating the Action Project

You need to start off by creating a project to work with. Go to the Actions Console and login with the Google account you’d like to use for development. Click New project:
New Actions Console Project

Then, accept the terms of agreement if prompted to, name the project RWPodcast and click Create project:
Create Project Actions Console

Scroll down to the More Options section. Select Conversational:
More Options Conversational

Expand Build your Action and click Add Action(s):
Add Actions Actions Console

Click Add your first action.
Add Your First Action Actions Console

Select the Custom category on the left-hand pane and click Build:
Select custom category

This will take you to the Dialogflow console webpage. If prompted, select or login with your Google account, and click Allow. You might also need to accept terms of service.

Creating the Dialogflow Agent

Next, create your agent. The Dialogflow page will prompt you to create an auto-configured Dialogflow agent. Click Create.

Create Dialogflow Agent

Agents and Intents

Agents are Natural Language Understanding (NLU) modules. Agents translate what the user says into actionable data. When the utterance of the user matches one of the agent’s intents, the agent performs the translation of the user’s request into actionable data and returns the result to the user.

Intents match user input to the appropriate responses. In the training phrases of the intent, you:

  • Define examples of user utterances that can trigger the intent.
  • Specify what to extract from the utterance.
  • Specify how to respond.

Generally, an intent represents a single ‘turn’ in the conversation.

Intents consist of four different components:

  • Name: An identifier for the intent that reference by the fulfillment.
  • Training Phrases: A defined collection of example phrases that invoke a particular intent. Dialogflow will automatically match similar phrases with the ones provided.
  • Actions and Parameters: Define which parts of the user utterances to extract. These often include information such as dates, times, quantities and places.
  • Response: The utterance displayed or spoken back to the user.

Note: Dialogflow supports a feature related to intents called contexts. Contexts are used to have more control over intent matching and manage the state of the conversation over multiple intents. To learn more, check the documentation.

A typical agent has several intents that address different user intentions. When a Dialogflow agent hears an utterance from the user, it attempts to match that utterance to one of the training phrases defined in the intents. Then, the agent returns the response from that intent.

There are special types of intents. By default, a new agent includes two of these, fallback intent and the the welcome intent.

The agent invokes a fallback intent when the user says something that the agent can’t recognize.

The agent invokes a welcome intent when the user starts a conversation with the agent. The welcome intent informs the user what the action does or how to start a conversation.

Click on the Intents tab on the left pane. Then select Default Welcome Intent to see the default welcome intent:

Click Default Welcome Intent

Notice the predefined training phrases:
Pre defined training phrases

Also, notice the predefined responses lower on the page:
Predefined responses

You can also create custom intents, choose your own training phrases and define your own responses.

Lastly, there are follow-up intents. Follow-up intents nest below a parent intent. Use them to gather follow-up information.

Running your Action

Run the action in the Simulator by clicking IntegrationsIntegration Settings:
Test Action Integrations

Select the Auto-Preview Changes option and click Test:
Select Auto-Preview Changes and Click Test

Another tab will open. When Dialogflow finishes Updating the Action in the Actions on Google Console, you’ll see the action loaded in the Actions Simulator.

The default welcome and fallback intents are operational. To see what I mean, have a conversation with your action!

Select the Talk to my Test App Suggestion Chip to begin. You’ll see a friendly greeting randomly selected from the list of responses each time the action runs.
Test the App in the Simulator

Modifying the Welcome Intent

Time to try adding your first custom response! Return to the Dialogflow web browser tab. Select IntentsDefault Welcome Intent and scroll to the Responses section.

First, delete all the default welcome intent’s responses by selecting the trashcan:
Delete The Welcome Responses

Now, add your own responses. Make sure you click Save when you’re done!
Enter Your Own Welcome Responses

Finally, click IntegrationsIntegration SettingsTest to run the action. Click the Suggestion Chip to Talk to my test app and Cancel a few times to see how the action randomly chooses between the custom welcome intent responses.
Test Action in Simulator with Custom Welcome Response

Testing Actions on a Device

In addition to the Simulator, Google Assistant allows you to run your action on a device. Open Google Assistant on most Android devices by long pressing home. Swipe up on the Assistant to open it.

Note: This guide explains how to use Google Assistant on different platforms including iOS.

Make sure the Assistant is logged in with the account you’re using for development. To change accounts, click your account avatar in the top right corner. Then, click Account and select the appropriate account. If the development account isn’t on your device, add it through the device settings and try again.
Change Accounts In Assistant

Note: Make sure your phone location is set to English (US) to find your app.

By typing or speaking, tell Google Assistant to Talk to my test app.

Talk To My Test Action Device

Google Assistant will run your test action.
Test Action Running In Device

Now that you’ve modified your welcome intent and tested your action, you need to upload the Dialogflow project from the starter project to start developing your own intents. Keep reading to learn how!

Uploading the Dialogflow Agent

To upload the preconfigured agent in the sample project, select Gear Icon ▸ Export and Import ▸ Restore From Zip.
Upload Dialogflow Starter

Drag and drop to attach or browse to the file RWPodcast.zip from the starter materials for this tutorial. If you haven’t downloaded them yet, you can download them using the Download Materials button at the top or bottom of this tutorial. Then, type RESTORE in the appropriate text box and click Restore. After it uploads, click Done.
Upload agent window

Now that you know how to get started building a Dialogflow agent, it’s time to learn how to fulfill more complicated user natural language requests by developing Fulfillment.

Fulfillment

Fulfillment for Google Assistant is code deployed as a webhook. Each intent in the agent has corresponding business logic in the webhook. Information extracted by the agent can generate dynamic responses or trigger actions on the back end.

Most actions require fulfillment to extend an agent’s capabilities and perform tasks such as returning information from a database, implementing game logic or placing orders for the customer.

You can implement a simple webhook inside Dialogflow by utilizing the Inline Editor. More complicated actions benefit from a different approach.

Setting up the Local Development Environment

Installing and Configuring Google Cloud SDK

Due to limitations of the Inline Editor, it’s best to do development for the webhooks locally. First, download and install the Google Cloud SDK. Configure the Gradle with App Engine Plugin by running the following commands at the command line:

  1. Run gcloud auth application-default login and login with your Gooogle account.
  2. Install and update the App Engine component by running gcloud components install app-engine-java
  3. Update other components with gcloud components update

Configuring the Project

Still at the command line, run the command gcloud init. If you’ve run this command before you can select [1] Re-initialize this configuration [default] with new settings. Select your google account. Pick a cloud project to use by selecting the Actions Console project you created in the previous steps.

Opening the Project and Deploying to the App Engine

Open Android Studio, select Open an Existing Android Studio Project, locate and select the Starter folder for this tutorial. Once the project loads in Android studio, open the Gradle tab on the right near the top and click Execute Gradle Task.

Execute Gradle Task Gradle Tab

Now run the command appengineDeploy to deploy the App Action to the App Engine.

App Engine Deploy

Now you’re ready for the next steps!

Implementing the Webhook

In Dialogflow, find your project’s ID by selecting gear icon ▸ General ▸ Google Project Section ▸ Project ID. Retain this project ID for the next step. In Dialogflow under the Fulfillment tab, enable the Webhook and enter https://your_project_id.appspot.com. Finally, click Save.

Enter Function URL

Open the Welcome Intent again under Intents, scroll down and make sure that under the Fulfillment section the Webhook is enabled.
Welcome Intent Fullfillment Enabled

This informs the Dialogflow agent to run the code found in the Welcome Intent function in RWPodcastApp.kt instead of utilizing the default text responses defined in Dialogflow. Open open Starter ▸ src ▸ main ▸ java ▸ com.raywenderlich ▸ RWPodcastApp.kt in Android Studio.and find this code:

@ForIntent("Welcome Intent")
fun welcome(request: ActionRequest): ActionResponse {
  xml = URL(rssUrl).readText()
  jsonRSS = XML.toJSONObject(xml)
  val responseBuilder = getResponseBuilder(request)
  println("Hello! Welcome to the Action!")
  responseBuilder.add("Hello World!")
  return responseBuilder.build()
}

The above code retrieves and stores the rss feed in a jsonObject for later use. It then uses the responseBuilder object and adds the words Hello World! and returns the response. Test the action in the Simulator or on your device. This time it replies “Hello World!”.
Test App Local Dev Environment

Now you’ve set up your local development environment. Each time you make changes in the RWPodcastApp.kt file, deploy the app to the App Engine utilizing the Execute Gradle Task button on the Gradle tab in Android Studio. You may also hit the run button in Android Studio after the first time you deploy the app to the app engine. Then relaunch the simulator. On a device, the most current version of the action deploys automatically.

Viewing the Stackdriver Logs

To see the output of the println statement, view the stackdriver logs. You can view the logs from the simulator by expanding the menu at the top left of the simulator and selecting View Stackdriver Logs.
View Stack Driver Logs

Then use the filtering mechanism to find your text.
Stack Driver Logs

Note: There can only be 15 versions of the app in production, each time you deploy the app to the app engine it creates a new version by default. You can expand the sidebar menu and under App Engine select Versions.
Versions
From here, all versions except the active version can be deleted if you’ve reached or exceeded 15.
Delete Old Versions

Creating and Fulfilling a Custom Intent

Creating the Intent

In Dialogflow, go to IntentsCreate Intent or press the plus sign next to Intents. Enter the name play_the_latest_episode and add the Training Phrases shown below:
CustomIntent Add Training Phrases
Enable webhook call for this intent under Fulfillment and Save:
Enable Webhook

Fulfilling the Intent

Add this intent to the in RWPodcastApp.kt below the welcome intent:

@ForIntent("play_the_latest_episode")
fun playLatestEpisode (request: ActionRequest): ActionResponse
{
  val responseBuilder = getResponseBuilder(request)
  val episode1 = jsonRSS.getJSONObject(rss).getJSONObject(channel)
    .getJSONArray(item).getJSONObject(0)
  val mediaObjects = ArrayList<MediaObject>()
  mediaObjects.add(
    MediaObject()
      .setName(episode1.getString("title"))
      .setDescription(episode1.getString(summary))
      .setContentUrl(episode1.getJSONObject(enclosure).getString(audioUrl))
      .setIcon(
        Image()
          .setUrl(logoUrl)
          .setAccessibilityText(requestBundle.getString("media_image_alt_text"))))
  responseBuilder
    .add(requestBundle.getString("latest_episode"))
    .addSuggestions(suggestions)
    .add(MediaResponse().setMediaObjects(mediaObjects).setMediaType("AUDIO"))
  return responseBuilder.build()
}

What’s going on in the code above?

  1. The code creates a responseBuilder to return the response.
  2. The latest episode is parsed from the rss feed into episode1
  3. The text string called latest_episode is fetched from resources_en_US.properties in the resources folder and added to the response. Strings can be stored in this location much like the strings.xml file in an Android App, making it easier to translate the Action later.
  4. AMediaObject is created and populated with fields from episode1, including the description, an URL to an image and the link to the audio file.
  5. Suggestion chips are created and added to the response to make it easy for the user to select a valid option.
  6. The response is built and returned.

Save RWPodcastApp.kt and deploy it to the App Engine. Once you have done this from the Gradle tab, you can then use the run button at the top of Android Studio to launch the command. Now, run the test action again on your phone or in the simulator. When the action welcomes you, respond with the words Latest episode. The latest episode of the podcast should play. Notice the Media player that displays.

Run the Action Latest Episode

Detecting Surfaces

You can detect what sort of device an action is initiated from. Replace the body of welcome in RWPodcastApp.k with the following:

xml = URL(rssUrl).readText()
jsonRSS = XML.toJSONObject(xml)
val responseBuilder = getResponseBuilder(request)
val episodes = jsonRSS.getJSONObject(rss).getJSONObject(channel)
  .getJSONArray(item)
if (!request.hasCapability(Capability.SCREEN_OUTPUT.value)) {
  if (!request.hasCapability(Capability.MEDIA_RESPONSE_AUDIO.value)) {
    // 1
    responseBuilder.add(requestBundle.getString("msg_no_media"))
  } else {
    // 2
    responseBuilder
      .add(requestBundle.getString("conf_placeholder"))
      .add(Confirmation()
      .setConfirmationText(requestBundle.getString("conf_text")))
  }
}
return responseBuilder.build()

Here are some things to notice about the code above:

  1. If the current conversation doesn’t have actions.capability.SCREEN_OUTPUT and MEDIA_RESPONSE_AUDIO, the device doesn’t support media playback so the user is informed the episode cannot be played.
  2. If the device supports audio but has no screen, ask the user whether or not to play the latest episode. This is to make the action more convenient on a hands-free, screen-free surface that relies on voice control only. Confirmation is a Helper that asks for a yes or no response.

Note: The play_latest_episode_confirmation Intent in Dialogflow has an actions_intent_confirmation Event. An Event is another way to trigger an intent with predefined values such as yes or no.

Find the play_latest_episode_confirmation in the Dialogflow Intents list. Enable the webhook call for this intent and Save the intent.

Handling a Confirmation Event

Since you created a Confirmation above, you need to handle the response. Add the code shown below to RWPodcastApp.kt inside the play_latest_episode_confirmation intent:

   
val episode1 = jsonRSS.getJSONObject(rss).getJSONObject(channel)
  .getJSONArray(item).getJSONObject(0)
if(request.getUserConfirmation()) {
  val mediaObjects = ArrayList<MediaObject>()
    mediaObjects.add(
      MediaObject()
        .setName(episode1.getString(title))
        .setDescription(getSummary(episode1))
        .setContentUrl(episode1.getJSONObject(enclosure).getString(audioUrl))
        .setIcon(Image().setUrl(logoUrl)
        .setAccessibilityText(requestBundle.getString("media_image_alt_text"))))
      responseBuilder
        .add(requestBundle.getString("latest_episode"))
        .addSuggestions(suggestions)
        .add(MediaResponse().setMediaObjects(mediaObjects).setMediaType("AUDIO"))
  return responseBuilder.build()
}

Here’s what’s happening above:

  1. If the confirmation parameter is true, the user confirmed they want to play the podcast. You then fetch and play the episode the same way you do in the play_the_latest_episode intent.
  2. If not, the user doesn’t want to play the podcast. Prompt the user to trigger other intents.

Save RWPodcastApp.kt, and in the terminal, and press run to deploy the changes to the appEngine. Open the test app in the Simulator and Talk to my Test App.

Select Speaker as the Surface and walk through the conversation to play the latest episode. You should also try walking through saying “no” when prompted:
Run the Action Speaker

Using an Option List

You can also set up an option list to give the user some options for what they can do. Add the else block below the outer if block in the welcome intent:

else {
  val episodes = jsonRSS.getJSONObject(rss).getJSONObject(channel)
    .getJSONArray(item)
  val items = ArrayList<ListSelectListItem>()
  var item: ListSelectListItem
  for (i in 0..9) {
     item = ListSelectListItem()
     item.setTitle(episodes.getJSONObject(i).getString(title))
       .setDescription(getSummary(episodes.getJSONObject(i)))
       .setImage(
          Image()
            .setUrl(logoUrl)
            .setAccessibilityText(requestBundle.getString("list_image_alt_text")))
            .optionInfo = OptionInfo().setKey((i).toString())
    items.add(item)
  }
  responseBuilder
    .add(requestBundle.getString("recent_episodes"))
    .add(SelectionList().setItems(items))
    .addSuggestions(suggestions).build()

The code above creates a list that displays the ten most recent podcasts. It then uses Suggestions to provide some suggestion chips.

get_episode_option, which you’ll implement next, handles the event when the user selects a podcast from the list. The user may also select one of the suggestion chips required to provide after a list.

Note: In Dialogflow, be sure to add actions_intent_option as an Intent and enable the web hook. Also, add the Google Assistant Option in the events category.
Get Episode Option

Add the following code inside the get_episode_option intent before the return statement:

val selectedItem = request.getSelectedOption()
val option = selectedItem!!.toInt()
val episode = jsonRSS.getJSONObject(rss).getJSONObject(channel)
  .getJSONArray(item).getJSONObject(option)
val mediaObjects = ArrayList<MediaObject>()
  mediaObjects.add(
    MediaObject()
      .setName(episode.getString(title))
      .setDescription(episode.getString(summary))
      .setContentUrl(episode.getJSONObject(enclosure).getString(audioUrl))
      .setIcon(
         Image()
           .setUrl(logoUrl)
           .setAccessibilityText(requestBundle.getString("media_image_alt_text"))))
responseBuilder
  .add(episode.getString(title))
  .addSuggestions(suggestions)
  .add(MediaResponse().setMediaObjects(mediaObjects).setMediaType("AUDIO"))

In the code above, the value of the option parameter is checked so the MediaObject can be created with the selected episode based on the index. It then adds the title, media object and suggestion chips to responseBuilder and returns the built response.

Save and deploy the changes. Then, test the app on a phone to see the list.
List On Screen Surface

When the user selects a podcast from the list, it plays. Nice, right?
Selected Podcast From List

Entities and Parameters

A Training Phrase often contains useful data such as words or phrases that specify things like quantity or date. You use Parameters to represent this information. Parameters in training phrases are represented by an Entity. Entities identify and extract useful data from natural language inputs in Dialogflow.

Entities

Entities extract information such as date, time, color, ordinal number and unit.

The Entity Type defines the type of extracted data. Each parameter will have a corresponding Entity.

For each type of entity, there can be many Entity Entries. Entries are a set of equivalent words and phrases. Sample entity entries for the subject of a technical podcast might be iOS, iPhone, and iPad.

System Entities are built in to Dialogflow. Some system entities include date, time, airports, music artists and colors.

For a generic entity type, use Sys.Any, representing any type of information. If the system entities are not specific enough, define your own Developer Entities.

Below is an example of a developer entity defined to represent the subject of a technical podcast.

Podcast Subject Entity

You can see the @Subject Entity created when you uploaded the sample project by selecting Entities in the left pane.

Action and Parameters

A training phrase recognized as an entity is highlighted and designated a parameter, and then it appears in a table below the training phrases.

Dialogflow automatically identifies and tags some more common types of system entities as parameters.

In the example below, Dialogflow identifies the time and date parameters and assigns them the correct type.

Training Phrases with Actions and Parameters

You can see an example using podcast subjects be going to Intentsplay_an_episode_about.

In Dialogflow, Action and Parameters are below the training phrases of an intent.

There are a couple of attributes for parameters:

  • A Required checkbox to determine whether the parameter is necessary.
  • The Parameter Name identifies the parameter.
  • The Entity or type of the parameter.
  • The Value of the parameter refers to the parameter in responses.

Actions and Parameters in Dialogflow

Parameters can appear as part of a text response:
Parameters in text response

Using an Entity and a Parameter

Time to try using the Entities and Parameters you learned about! In Dialogflow, make sure the webhook on the play_an_episode_about intent is enabled. Then add the following code to the intent handler RWPodcastApp.kt:

val platform = request.getParameter(subject) as String
val episodes = jsonRSS.getJSONObject(rss).getJSONObject(channel)
  .getJSONArray(item)
var episode : JSONObject? =  null
for (i in 0 until episodes.length()) {
  val currentEpisode = episodes.getJSONObject(i)
  if(platform.toUpperCase() in currentEpisode.optString(title).toUpperCase()) {
    episode = currentEpisode
    break
  }
}
if(episode != null) {
  val mediaObjects = ArrayList<MediaObject>()
  mediaObjects.add(
  MediaObject()
    .setName(episode.getString(title))
    .setDescription(getSummary(episode))
    .setContentUrl(episode.getJSONObject(enclosure).getString(audioUrl))
    .setIcon(
      Image()
        .setUrl(logoUrl)
        .setAccessibilityText(requestBundle.getString("media_image_alt_text"))))
  responseBuilder
    .add(episode.getString(title))
    .addSuggestions(suggestions)
    .add(MediaResponse().setMediaObjects(mediaObjects).setMediaType("AUDIO"))
} else {
  responseBuilder.add("There are no episodes about $platform")
    .addSuggestions(suggestions)
}

The code above searches the feed for the newest episode. Here’s how:

  • The code looks for the value of the Subject parameter.
  • The user utterance provides the parameter.
  • The Entity defined in Dialogflow provides the utterance.

The first podcast containing the Subject keyword will be passed into the MediaObject and played.

Selected Podcast By SubjectSuggsetionChip

Where to Go From Here?

Wow, that was a lot of work! You’re awesome!

In this tutorial, you created a Google Action in the Google Actions Console and utilized Dialogflow to design a conversation for that action. You provided fulfillment for the action by implementing a webhook.

Get the final project by clicking the Download Materials button at the top or bottom of this tutorial. If you want to keep moving forward with this, here are some suggestions:

  • You can learn more about publishing your action here.
  • Within 24 hours of publishing your action, the Analytics section of the Actions Console will display the collected data. Use the Analytics section to perform a health check and get information about the usage, health and discovery of the action. Learn more here.
  • Want to go even deeper into Google assistant? Google has an extensive Conversation Design Guide. Find Google Codelabs about the Assistant here and a few Google Samples here. Google has a complete documentation.

If you have any questions or comments, please join the forum discussion below!

Average Rating

5/5

Add a rating for this content

3 ratings

More like this

Contributors

Comments