Who's Speaking? Speaker Recognition With Watson Speech-to-Text API
Learn how to take advantage of IBM Watson’s speaker diarization feature, which distinguishes between speakers in real time.
Join the DZone community and get the full member experience.
Join For FreeDistinguishing between two people in a conversation is pretty difficult especially when you are hearing them virtually or for the first-time. Same can be the case when multiple voices interact with AI/Cognitive systems, virtual assistants, and home assistants like Alexa or Google Home. To overcome this, Watson’s Speech To Text API has been enhanced to support real-time speaker diarization.
On this post about building a popular chatbot using Watson services called “WatBot,” there are a couple of requests to include SpeakerLabels setting into our code sample.
So, What Is Speaker Diarization?
According to Wikipedia, "Speaker diarisation (or diarization) is the process of partitioning an input audio stream into homogeneous segments according to the speaker identity. It can enhance the readability of an automatic speech transcription by structuring the audio stream into speaker turns and, when used together with speaker recognition systems, by providing the speaker’s true identity."
Real-time Speaker Diarization with Watson Speech-to-Text API.
Why Speaker Diarization?
Real-time speaker diarization is a need we’ve heard about from many businesses across the world that rely on transcribing volumes of voice conversations collected every day. Imagine you operate a call center and regularly take action as customer and agent conversations happen — issues can come up like providing product-related help, alerting a supervisor about negative feedback, or flagging calls based on customer promotional activities. Prior to today, calls were typically transcribed and analyzed after they ended. Now, Watson’s speaker diarization capability enables access to that data immediately.
To experience speaker diarization via Watson speech-to-text API on IBM Bluemix, head to this demo and click to play sample audio 1 or 2. If you check the input JSON specifically the highlighted line below; we are setting “speaker_labels” optional parameter to true. This helps us in distinguishing between speakers in a conversation.
{
"continuous": true,
"timestamps": true,
"content-type": "audio/wav",
"interim_results": true,
"keywords": ["IBM", "admired", "AI", "transformations", "cognitive", "Artificial Intelligence", "data", "predict", "learn"],
"keywords_threshold": 0.01,
"word_alternatives_threshold": 0.01,
"smart_formatting": true,
"speaker_labels": true,
"action": "start"
}
A part of output JSON after real-time speech-to-text conversion:
{....
"confidence": 0.927, "transcript": "So thank you very much for coming Dave it's good to have you here. "
}], "final": true, "speaker": 0
}
You can see that a speaker label is getting assigned to each speaker in the conversation.
Steps to Enable Speaker Diarization
- Watson speech-to-text is available as a service on Bluemix, IBM Cloud platform. Create now to leverage it in your application.
- If you are taking the Rest API approach, don’t forget to include the optional parameter “speaker_labels: true” in your request JSON.
- Based on the programming language your application is created, use any of the SDKs available on Watson Developer Cloud ranging from Python to Node, Java, Swift, etc.
Refer to the WatBot repository to get a gist of how to enable or add speaker diarization to an existing Android app. Similarly, you can use other SDKs to achieve speaker diarization.
Note: Speaker labels are not enabled by default. Check ToDos in the code to uncomment.
Use Cases
From integrating into chatbots to interacting with home assistants like Alexa, Google Home, and more, from call centers to medical services. The possibilities are endless.
Published at DZone with permission of Vidyasagar Machupalli, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
Exploratory Testing Tutorial: A Comprehensive Guide With Examples and Best Practices
-
Database Integration Tests With Spring Boot and Testcontainers
-
Reactive Programming
-
Getting Started With the YugabyteDB Managed REST API
Comments