How to Build a Phone Assistant Using Voximplant Connector for Dialogflow

DZone 's Guide to

How to Build a Phone Assistant Using Voximplant Connector for Dialogflow

Learn how to build a phone assistant using Voximplant Connector for Dialogflow.

· AI Zone ·
Free Resource

Dialogflow, Google’s solution for building AI-powered voice and text-based conversational interfaces, has released an updated API that allows Dialogflow agents to handle speech recognition, speech synthesis, and natural language understanding (NLU) at the same time. It’s now possible to stream audio directly to a Dialogflow agent rather than sending text, including audio generated using a built-in Text-to-Speech engine and WaveNet synthetic voices.

Voximplant, a cloud platform for developing real-time communication apps, has used this new API to create Dialogflow Connector, which allows developers to connect a dynamic full-voice Dialogflow agent for use on Voximplant calls in just minutes.

For an example of how this works, let’s create a bot that can take pizza delivery orders and connect it to process inbound calls to a specific phone number.

Flowchard of call fulfillment using Voximplant and Dialogflow(Flowchart of call fulfillment using Voximplant and Dialogflow)

Dialogflow Setup

Set up a Dialogflow agent to use API V2, since API V1 doesn’t support streaming. Under Settings-> Enable beta features and APIs, enable beta features so that speech synthesis (currently a beta feature) will be available.

Next, create and download the service account JSON file associated with the agent from GCP console (read Dialogflow’s Setting up authentication article for assistance). Select Dialogflow API Client as the role when creating the service account. The JSON allows Voximplant to be authorized, which is required to send audio data to the agent.

To set up speech synthesis and select options, click the Speech tab in the agent settings.

Image title

In these settings, enable Automatic Text to Speech, and choose MP3 or OGG (the only currently supported formats) for Output Audio Encoding. Also select an available voice – the WaveNet-powered voices are the better sounding options. Then click save.

It’s important to know that speech recognition and speech synthesis are both paid features, and your GCP account will be billed as you use the agent. Alternatively, a free plan with usage limits is available. Your Voximplant account will be charged as well, as you use Dialogflow Connector.

Next, we’ll import the “Pizza Order & Delivery” agent demo, available from Voximplant as a zip file here. In Dialogflow, open the Export and Import tab, click IMPORT FROM ZIP, and upload the zip file. With the agent deployed, we’ll now set up Voximplant.

Voximplant Setup

From the Voximplant control panel, navigate to Settings->Dialogflow Connector.

Image title

From the left menu, choose Add Service Account. Then click on “Choose file”, select your JSON file from GCP console, and click Add. With the file uploaded, you’ll see this screen:

Image title

Now click Create test app, which will deploy an application with a test phone number connected to your Dialogflow agent. You’ll then see this screen:

Image title

Follow the listed instructions to make a test call and speak with your Dialogflow agent.

If an error occurred and this didn’t work, it may be necessary to delete the test application (under Applications in the control panel, with names starting with df-) and the test scenario (under Scenarios, names starting with DF_) before creating the test application again. For Dialogflow agents using languages other than English, it’s also necessary to change the test VoxEngine scenario to match the correct language.

Connecting Voximplant to a Real Phone Number

Once you’ve successfully tried the test phone number, select Phone numbers from the menu and click Buy phone number. Next, select the phone number’s location (country, state/region, city) and type (landline, tollfree, etc.), and then click on Buy. With a number purchased, navigate to My phone numbers and bind that number to the test application.

Image title

If your number is from a country with no special verification requirements, such as the U.S., you can now call the number and communicate with the Dialogflow agent. Otherwise, it’s necessary to upload verification documents required by the country in question. The number will then become active after a verification period (assuming the documents are correct).

Voximplant Advanced Setup

To review, creating the test application involves these steps: creation of the Voximplant application, creation of the Voxengine scenario, purchase of the phone number, connection of the application by creating an application rule forwarding all calls from the phone number to the scenario. The service account JSON file binds to the application. At the code level, the Voxengine scenario connecting inbound calls with a Dialogflow agent looks like this:


var dialogflow, call

// Inbound call processing

VoxEngine.addEventListener(AppEvents.CallAlerting, (e) => {

     call = e.call

     call.addEventListener(CallEvents.Connected, onCallConnected)

     call.addEventListener(CallEvents.Disconnected, VoxEngine.terminate)



function onCallConnected(e) {

  // Create Dialogflow object

     dialogflow = AI.createDialogflow({

       lang: DialogflowLanguage.ENGLISH_US


     dialogflow.addEventListener(AI.Events.DialogflowResponse, onDialogflowResponse)

    // Sending WELCOME event to let the agent says a welcome message

    dialogflow.sendQuery({event : {name: "WELCOME", language_code:"en"}})

    // Playback marker used for better user experience


    // Start sending media from Dialogflow to the call


    dialogflow.addEventListener(AI.Events.DialogflowPlaybackFinished, (e) => {

      // Dialogflow TTS playback finished


    dialogflow.addEventListener(AI.Events.DialogflowPlaybackStarted, (e) => {

      // Dialogflow TTS playback started


    dialogflow.addEventListener(AI.Events.DialogflowPlaybackMarkerReached, (e) => {

      // Playback marker reached - start sending audio from the call to Dialogflow




// Handle Dialogflow responses

function onDialogflowResponse(e) {

  // If DialogflowResponse with queryResult received - the call stops sending media to Dialogflow

  // in case of response with queryResult but without responseId we can continue sending media to dialogflow

  if (e.response.queryResult !== undefined && e.response.responseId === undefined) {


  } else if (e.response.queryResult !== undefined && e.response.responseId !== undefined) {

     // Do whatever required with e.response.queryResult or e.response.webhookStatus

    // Telephony messages arrive in fulfillmentMessages array

    if (e.response.queryResult.fulfillmentMessages != undefined) {

    e.response.queryResult.fulfillmentMessages.forEach((msg) => {

                if (msg.platform !== undefined && msg.platform === "TELEPHONY") processTelephonyMessage(msg)





// Process telephony messages from Dialogflow

function processTelephonyMessage(msg) {

  // Transfer call to msg.telephonyTransferCall.phoneNumber

  if (msg.telephonyTransferCall !== undefined) {


    * Example:

    * dialogflow.stop()

    * let newcall = VoxEngine.callPSTN(msg.telephonyTransferCall.phoneNumber, "put verified CALLER_ID here")

    * VoxEngine.easyProcess(call, newcall)



  // Synthesize speech from msg.telephonySynthesizeSpeech.text

  if (msg.telephonySynthesizeSpeech !== undefined) {

    // See the list of available TTS languages at https://voximplant.com/docs/references/voxengine/language

    // Example: call.say(msg.telephonySynthesizeSpeech.text, Premium.US_ENGLISH_FEMALE)


  // Play audio file located at msg.telephonyPlayAudio.audioUri

  if (msg.telephonyPlayAudio !== undefined) {

    // audioUri contains Google Storage URI (gs://), we need to transform it to URL (https://)

    let url = msg.telephonyPlayAudio.audioUri.replace("gs://", "https://storage.cloud.google.com/")

    // Example: call.startPlayback(url)



This scenario sends media from an inbound call to the Dialogflow agent, and returns query results in DialogflowQueryResult events (representing Dialogflow’s QueryResult object). Because speech synthesis is enabled, there’s a DialogflowPlaybackStarted event after each DialogflowQueryResult event that contains fullfilmentText, showing that the audio stream with generated speech is indeed being played.

To experience the automated pizza ordering demonstration of Dialogflow Connector directly, visit https://demos05.voximplant.com/dialogflow-connector.

ai, ai tutorial, ai-powered technology, api, bot development, customer experience, dialogflow, phone assistant, real-time communication technology, text-to-speech

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}