Over a million developers have joined DZone.

Voice Recognition (NOT Speech Recognition) Is Here

DZone 's Guide to

Voice Recognition (NOT Speech Recognition) Is Here

Voice vs. speech: Computers can recognize the words we speak, and now they can recognize who spoke those words.

· AI Zone ·
Free Resource

A lot of people seem to use the terms "voice recognition" and "speech recognition" interchangeably, but they're really not interchangeable. They are very different things. As a researcher and developer of speech technology (almost since its beginnings), I feel compelled to ask anyone who uses the term "voice recognition," "Do you mean speech recognition?" It seems like it should be obvious:

  • Speech recognition recognizes speech

  • Voice recognition recognizes voices

Speech recognition recognizes the words that were spoken whereas voice recognition recognizes/identifies the voice that is speaking those words. Some of you may have used a system (usually a banking system) that incorporates voice verification technology, which gives a measure of confidence that the person speaking is the same person that set up the account.

Keep reading — there is a point to this rant, I'm not just in a crabby mood. While some of you Alexa application developers out there may have noticed that Alexa does use voice recognition on some of its built-in applications, this week Amazon announced that developers can access the voice recognition features for use in their own application development. For those of you that have Alexa and have multiple users (family, roommates, etc.) you may have noticed the following scenario:

Image title

Me: Alexa, call Madelyn at her work.

Alexa: OK, Emmett, I'll call Madelyn at work.

Maybe you didn't even notice it; it does seem quite natural. In fact, I was a little surprised until I realized that the first time I tried to make that call after I had set up Madelyn's Alexa app on her phone, the dialogue was a little different:

Me: Alexa, call Madelyn at work

Alexa: Is this Emmett?

Me: Yes.

Alexa: Calling Madelyn at work.

I didn't think much about it; again, it seemed quite natural. But in retrospect, it seems obvious that Alexa had registered my voice characteristics with my name. Now, Alexa can casually insert my name in utterances, which makes the whole interaction a bit more personal. And because the system only has to distinguish between voices registered for your device in your home, the accuracy rate is quite high. In our house, I have a rather low and booming voice, whereas my wife (Madelyn) has a relatively high and soft voice, so Alexa gets it right every time. But I suspect it could be a problem with same-gendered siblings who are close in age. My developer brain thinks that Alexa just avoids the personalization when she's not sure.

I think this leads to a lot of interesting conversational possibilities. Certainly, conversations can insert the simplistic token <username> into the text-to-speech generated utterances, as we saw for the phone-calling app. But it could be used in more subtle and inferential ways. Imagine if you asked about the traffic and the application could infer that perhaps you were planning to drive somewhere soon, and it knows that you have a shopping list for milk and eggs. The conversation might go something like:

Human: What's the traffic like?

Computer: Currently, the traffic is light to moderate. Are you planning to pick up things on your grocery list?

Human: Not sure.

Computer: If you do, the traffic to the Holiday Market is very light, and you wanted milk and eggs.

Human: Thanks!

Computer: No problem, Emmett!

Since the system knows me, it could know where I usually drive at that time of day, where my local grocery store is, and what is on my personal grocery list, and fold that all into one easy interaction.

In all fairness to early developers of speech systems, applications have taken advantage of voice recognition for a long time. In fact, over a decade ago, I developed a voice-based warehouse picking system that was capable of recognizing which employee had put on the headset and started the application simply by listening to their first command. Also, one of the competitors to Alexa is Google, and it announced multiuser voice recognition based technology in April of this year (however, it has not yet made the third-party developer API available yet).

Voice recognition is a new thing to the masses and I believe it will contribute substantially to the next level of natural interaction with the computer!

speech recognition ,voice recognition ,ai ,deep learning ,alexa ,machine learning

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}