Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Adding Watson Speech-to-Text to your Android App

DZone's Guide to

Adding Watson Speech-to-Text to your Android App

Integrating AI into mobile apps is becoming more popular. Learn how to use IBM's Waston to create speech-to-text capability in your app.

Free Resource

Download this comprehensive Mobile Testing Reference Guide to help prioritize which mobile devices and OSs to test against, brought to you in partnership with Sauce Labs.

This post is about injecting Watson Speech-to-Text into an Android native app. Speech-to-Text is available as a service on IBM Cloud i.e., Bluemix. You will integrate the service available on Bluemix into our favorite chatbot “The WatBOT” using Watson Developer Android SDK with minimal lines of code.

Why Watson Speech-to-Text?

The IBM® Speech to Text service provides an Application Programming Interface (API) that lets you add speech transcription capabilities to your applications. To transcribe the human voice accurately, the service leverages machine intelligence to combine information about grammar and language structure with knowledge of the composition of the audio signal. The service continuously returns and retroactively updates the transcription as more speech is heard.

This overview for developers introduces the three interfaces provided by the service: a WebSocket interface, an HTTP REST interface, and an asynchronous HTTP interface (beta).

Input Features

  • Languages: Supports Brazilian Portuguese, French, Japanese, Mandarin Chinese, Modern Standard Arabic, Spanish, UK English, and US English.
  • Models: For most languages, supports both broadband (for audio that is sampled at a minimum rate of 16 KHz) and narrowband (for audio that is sampled at a minimum rate of 8 KHz) models.
  • Audio Formats: Transcribes Free Lossless Audio Codec (FLAC), Linear 16-bit Pulse-Code Modulation (PCM), Waveform Audio File Format (WAV), Ogg format with the opus codec, mu-law (or u-law) audio data, or basic audio.
  • Audio Transmission: Lets the client pass as much as 100 MB of audio to the service as a continuous stream of data chunks or as a one-shot delivery, passing all of the data at one time. With streaming, the service enforces various timeouts to preserve resources.

Output Features

  • Speaker Labels (beta): Recognizes different speakers from narrowband audio in US English, Spanish, or Japanese. This feature provides a transcription that labels each speaker’s contributions to a multi-participant conversation.
  • Keyword Spotting (beta): Identifies spoken phrases from the audio that match specified keyword strings with a user-defined level of confidence. This feature is especially useful when individual words or topics from the input are more important than the full transcription. For example, it can be used with a customer support system to determine how to route or categorize a customer request.
  • Word Alternatives (beta), Confidence, and Timestamps: Reports alternative words that are acoustically similar to the words that it transcribes, confidence levels for each of the words that it transcribes, and timestamps for the start and end of each word.
  • Maximum Alternatives and Interim Results: Returns alternative and interim transcription results. The former provide different possible hypotheses; the latter represent interim hypotheses as the transcription progresses. In both cases, the service indicates final results in which it has the greatest confidence.
  • Profanity Filtering: Censors profanity from US English transcriptions by default. You can use the filtering to sanitize the service’s output.
  • Smart Formatting (beta): Converts dates, times, numbers, phone numbers, and currency values in final transcripts of US English audio into more readable, conventional forms.

Integrating STT into Existing an Android App

Add this permission to RECORD_AUDIO in Manifest.xml:

<uses-permission android:name="android.permission.RECORD_AUDIO"/>


Open build.gradle(app) and add the below entries under dependencies:

compile 'com.squareup.okhttp3:okhttp-ws:3.4.2' compile 'com.ibm.watson.developer_cloud:android-sdk:0.2.3' compile 'com.ibm.watson.developer_cloud:speech-to-text:3.5.3'


Add an image (mic) as an asset under res/mipmap

Open res/layout/content_chat_room.xml and add the below code

<android.support.v7.widget.AppCompatImageButton     android:id="@+id/btn_record"     android:layout_width="wrap_content"     android:layout_height="match_parent"     android:layout_marginBottom="10dp"     android:background="@null"     android:elevation="0dp"     android:paddingLeft="10dp"     android:scaleType="fitCenter"     android:src="@mipmap/ic_mic" />


Entries in MainActivity.java to request permission from the user to access Microphone and record audio.

int permission = ContextCompat.checkSelfPermission(this,
        Manifest.permission.RECORD_AUDIO);

if (permission != PackageManager.PERMISSION_GRANTED) {
    Log.i(TAG, "Permission to record denied");
    makeRequest();
}


// Speech-to-Text Record Audio permission
@Override
public void onRequestPermissionsResult(int requestCode, @NonNull String[] permissions, @NonNull int[] grantResults) {
    super.onRequestPermissionsResult(requestCode, permissions, grantResults);
    switch (requestCode){
        case REQUEST_RECORD_AUDIO_PERMISSION:
            permissionToRecordAccepted  = grantResults[0] == PackageManager.PERMISSION_GRANTED;
            break;
        case RECORD_REQUEST_CODE: {

            if (grantResults.length == 0
                    || grantResults[0] !=
                    PackageManager.PERMISSION_GRANTED) {

                Log.i(TAG, "Permission has been denied by user");
            } else {
                Log.i(TAG, "Permission has been granted by user");
            }
            return;
        }
    }
    if (!permissionToRecordAccepted ) finish();

}

protected void makeRequest() {
    ActivityCompat.requestPermissions(this,
            new String[]{Manifest.permission.RECORD_AUDIO},
            RECORD_REQUEST_CODE);
}


Add the below code in MainActivity.Java outside onCreate

//Record a message via Watson Speech to Text
private void recordMessage() {
    //mic.setEnabled(false);
    speechService = new SpeechToText();
    speechService.setUsernameAndPassword(STT_username, STT_password);

    if(listening != true) {
        capture = new MicrophoneInputStream(true);
        new Thread(new Runnable() {
            @Override public void run() {
                try {
                    speechService.recognizeUsingWebSocket(capture, getRecognizeOptions(), new MicrophoneRecognizeDelegate());
                } catch (Exception e) {
                    showError(e);
                }
            }
        }).start();
        listening = true;
        Toast.makeText(MainActivity.this,"Listening....Click to Stop", Toast.LENGTH_LONG).show();

    } else {
        try {
            capture.close();
            listening = false;
            Toast.makeText(MainActivity.this,"Stopped Listening....Click to Start", Toast.LENGTH_LONG).show();
        } catch (Exception e) {
            e.printStackTrace();
        }

    }
}
btnRecord.setOnClickListener(new View.OnClickListener() {
        @Override public void onClick(View v) {
            recordMessage();
        }
    });
};


Add these private methods to complete the story:

//Private Methods - Speech to Text
private RecognizeOptions getRecognizeOptions() {
    return new RecognizeOptions.Builder()
            .continuous(true)
            .contentType(ContentType.OPUS.toString())
            //.model("en-UK_NarrowbandModel")
            .interimResults(true)
            .inactivityTimeout(2000)
            .build();
}

//Watson Speech to Text Methods.
private class MicrophoneRecognizeDelegate implements RecognizeCallback {

    @Override
    public void onTranscription(SpeechResults speechResults) {
        System.out.println(speechResults);
        if(speechResults.getResults() != null && !speechResults.getResults().isEmpty()) {
            String text = speechResults.getResults().get(0).getAlternatives().get(0).getTranscript();
            showMicText(text);
        }
    }

    @Override public void onConnected() {

    }

    @Override public void onError(Exception e) {
        showError(e);
        enableMicButton();
    }

    @Override public void onDisconnected() {
        enableMicButton();
    }
}

private void showMicText(final String text) {
    runOnUiThread(new Runnable() {
        @Override public void run() {
            inputMessage.setText(text);
        }
    });
}

private void enableMicButton() {
    runOnUiThread(new Runnable() {
        @Override public void run() {
            btnRecord.setEnabled(true);
        }
    });
}

private void showError(final Exception e) {
    runOnUiThread(new Runnable() {
        @Override public void run() {
            Toast.makeText(MainActivity.this, e.getMessage(), Toast.LENGTH_SHORT).show();
            e.printStackTrace();
        }
    });
}

Here’s the complete code to understand where to make the above entries, click here to see the MainActivity.Java

Also, check how to integrate Text-to-Speech here.

Analysts agree that a mix of emulators/simulators and real devices are necessary to optimize your mobile app testing - learn more in this white paper, brought to you in partnership with Sauce Labs.

Topics:
mobile ,speech-to-text ,android app development ,watson

Published at DZone with permission of Vidyasagar Machupalli, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}