Over a million developers have joined DZone.

Voice Recognition, Translation, and Text-to-Speech on Mobile (Video)

DZone's Guide to

Voice Recognition, Translation, and Text-to-Speech on Mobile (Video)

In this article we applied machine learning natural language processing to our mobile app by utilizing API-Driven machine learning managed cloud services like Amazon Translate and Amazon Polly.

· AI Zone ·
Free Resource

The most visionary programmers today dream of what a robot could do, just like their counterparts in 1976 dreamed of what personal computers could do. Read more on MistyRobotics.com and enter to win your own Misty. 

Not multilingual? That’s okay, there’s an app for that. Check out this video for a full walkthrough!

You can now add a professional translator and friendly voice to any mobile app using Amazon Translate and Amazon Polly. If you haven’t tried AWS yet, these two services are possibly the easiest API implementation I’ve seen to date.

In this article, we’ll build a mobile app that will recognize our voice and convert it to text (speech-to-text), translate the text to a language of our choice, and convert our translated text to synthesized speech (text-to-speech).

By building this solution, we’re applying machine learning (ML) natural language processing to our mobile app by using built-in iOS Speech API and API-Driven managed cloud services. We’ll simply call APIs to make our app smarter.

Here’s an architectural diagram of this solution:

Speech Translator Architecture Diagram

There are two easy steps to building this solution: 1. Configure backend by creating an Amazon Cognito Identity Pool, IAM Role(s), and adding permission to those roles for accessing Amazon Translate and Polly directly from a mobile app. 2. Create a mobile app to showcase natural language processing by cloning my sample app from GitHub and configuring it to use the values created in step #1.

Part 1: Configure Backend (1 minute)

I created a CloudFormation Stack to automate the creation of the Cognito Identity Pool, IAM Roles, and permissions so we can start playing with the app! The other services do not require any backend configuration and will be called directly from our mobile app.

  1. Click on the Launch Stack buttonCreate a CloudFormation Stack
  2. Click Next on the Select Template page
  3. Click Next
  4. On the Options page, leave all the defaults and click Next
  5. On the Review page, check the box to acknowledge that CloudFormation will create IAM resources and click Create.IAM Role Creation Warning
  6. Wait for the speechtranslator-stack stack to reach a status of CREATE_COMPLETE
  7. With the speechtranslator-stack selected, click on the Outputs tab, and you should see three rows.
  8. Copy the Value for each of the three resources as we’ll be pasting those values into our service config in the AppDelegate of our mobile app.Image title

For this application, we’ll utilize Amazon Cognito, Amazon Translate, and Amazon Polly.

For Amazon Translate and Amazon Polly, no backend configuration is required, however, we do need to create an Amazon Cognito Identity Pool to allow our mobile users to call Amazon Translate and Amazon Polly directly from the app. With an identity pool, you can obtain temporary AWS credentials with permissions defined in IAM Roles to directly access AWS services.

That’s it for the backend configuration! Let’s move onto the mobile app.

Part 2: Create a Mobile App (3 1/2 minutes)

To get you going quickly, I uploaded a full solution iOS Swift app on GitHub here.

Follow the instruction in the README and you’ll be up and running in just a few minutes.


Now that we’ve configured our backend API resources and got the app running, let me explain how to the voice interaction works. On the surface, this seems like a very simple application, but it really only turned out that way because we utilize some really powerful, yet simple cloud solutions and built-in Apple APIs.

For voice recognition, we’re using the Apple speech API to turn our voice to text. The app then passes the transcribed text to Amazon Translate for text translation into the language of our choice and returns the translated text back to the app. Once the app receives the translated text, it passes it to Amazon Polly, which then provides synthesized speech as streamed .mp3 audio.

Final Thoughts

Pretty straightforward, right? In this article we applied machine learning natural language processing to our mobile app by utilizing API-driven machine learning managed cloud services like Amazon Translate and Amazon Polly. We left all the learning and model training behind and quickly deployed our mobile app using managed cloud services to provide translation and speech. Oh, and the best part is, we only pay for what we use and don’t have to manage any servers.


Robot Development Platforms: What the heck is ROS and are there any frameworks to make coding a robot easier? Read more on MistyRobotics.com

translation ,speech-to-text ,text-to-speech ,speech synthesis ,natural language processing ,aws ,mobile ,artifical intelligence ,ai

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}