Ultra-Simple Integration With ML Kit to Implement Word Broadcasting
Learn how to use the general text recognition and speech synthesis functions of ML Kit to implement an automatic voice broadcast app
Join the DZone community and get the full member experience.
Join For FreeI believe that we all start to learn a language when we start to speak it. When primary school students learn a language, an important part of their homework work is dictating the text of the new words. Many parents know this experience.
On one hand, the pronunciation is relatively simple. On the other hand, parents' time is very precious. Now, there are many dictation voices in the market. These broadcasters record the dictation words in the language teaching materials after class for parents to download. However, this kind of recording is not flexible enough. If the teacher leaves a few extra words that are not part of the after-school problem set, the recordings won't meet the needs of parents and children.
This article describes how to use the general text recognition and speech synthesis functions of ML Kit to implement an automatic voice broadcast app. You only need to take photos of dictation words or texts, and then the text in the photos can be automatically played. The tone of the voice can be adjusted.
Development Preparations
Open the project-level build.gradle file:
Choose allprojects > repositories and configure the Maven repository address of HMS SDK:
allprojects {
repositories {
google()
jcenter()
maven {url 'http://developer.huawei.com/repo/'}
}
}
Configure the Maven repository address of HMS SDK in buildscript->repositories:
xxxxxxxxxx
buildscript {
repositories {
google()
jcenter()
maven {url 'http://developer.huawei.com/repo/'}
}
}
Choose buildscript > dependencies and configure the AGC plug-in:
xxxxxxxxxx
dependencies {
classpath 'com.huawei.agconnect:agcp:1.2.1.301'
}
Adding Compilation Dependencies
Open the application levelbuild.gradle file:
SDK integration:
xxxxxxxxxx
dependencies{
implementation 'com.huawei.hms:ml-computer-voice-tts:1.0.4.300'
implementation 'com.huawei.hms:ml-computer-vision-ocr:1.0.4.300'
implementation 'com.huawei.hms:ml-computer-vision-ocr-cn-model:1.0.4.300'
}
Add the ACG plug-in to the file header:
xxxxxxxxxx
apply plugin: 'com.huawei.agconnect'
Specify permissions and features: Declare them in AndroidManifest.xml.
xxxxxxxxxx
<uses-permission android:name="android.permission.CAMERA" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
<uses-feature android:name="android.hardware.camera" />
<uses-feature android:name="android.hardware.camera.autofocus" />
Key Development Steps
There are two main functions. One is to identify the operation text, and the other is to read the operation. The OCR+TTS mode is used to read the operation. After taking a photo, click the play button to read the operation.
1. Dynamic permission application:
xxxxxxxxxx
private static final int PERMISSION_REQUESTS = 1;
public void onCreate(Bundle savedInstanceState) {
// Checking camera permission
if (!allPermissionsGranted()) {
getRuntimePermissions();
}
}
2. Start the reading interface:
xxxxxxxxxx
public void takePhoto(View view) {
Intent intent = new Intent(MainActivity.this, ReadPhotoActivity.class);
startActivity(intent);
}
3. Invoke createLocalTextAnalyzer() in the onCreate() method to create a device-side text recognizer:
xxxxxxxxxx
private void createLocalTextAnalyzer() {
MLLocalTextSetting setting = new MLLocalTextSetting.Factory()
.setOCRMode(MLLocalTextSetting.OCR_DETECT_MODE)
.setLanguage("zh")
.create();
this.textAnalyzer = MLAnalyzerFactory.getInstance().getLocalTextAnalyzer(setting);
}
4. Invoke createLocalTextAnalyzer() in the onCreate() method to create a device-side text recognizer:
xxxxxxxxxx
private void createTtsEngine() {
MLTtsConfig mlConfigs = new MLTtsConfig()
.setLanguage(MLTtsConstants.TTS_ZH_HANS)
.setPerson(MLTtsConstants.TTS_SPEAKER_FEMALE_ZH)
.setSpeed(0.2f)
.setVolume(1.0f);
this.mlTtsEngine = new MLTtsEngine(mlConfigs);
MLTtsCallback callback = new MLTtsCallback() {
public void onError(String taskId, MLTtsError err) {
}
public void onWarn(String taskId, MLTtsWarn warn) {
}
public void onRangeStart(String taskId, int start, int end) {
}
public void onEvent(String taskId, int eventName, Bundle bundle) {
if (eventName == MLTtsConstants.EVENT_PLAY_STOP) {
if (!bundle.getBoolean(MLTtsConstants.EVENT_PLAY_STOP_INTERRUPTED)) {
Toast.makeText(ReadPhotoActivity.this.getApplicationContext(), R.string.read_finish, Toast.LENGTH_SHORT).show();
}
}
}
};
mlTtsEngine.setTtsCallback(callback);
}
5. Set the buttons for reading photos, taking photos, and reading aloud.
xxxxxxxxxx
this.relativeLayoutLoadPhoto.setOnClickListener(new View.OnClickListener() {
public void onClick(View v) {
ReadPhotoActivity.this.selectLocalImage(ReadPhotoActivity.this.REQUEST_CHOOSE_ORIGINPIC);
}
});
this.relativeLayoutTakePhoto.setOnClickListener(new View.OnClickListener() {
public void onClick(View v) {
ReadPhotoActivity.this.takePhoto(ReadPhotoActivity.this.REQUEST_TAKE_PHOTO);
}
});
6. Start TextAnalyzer() during the callback of photographing and reading photos:
xxxxxxxxxx
private void startTextAnalyzer() {
if (this.isChosen(this.originBitmap)) {
MLFrame mlFrame = new MLFrame.Creator().setBitmap(this.originBitmap).create();
Task<MLText> task = this.textAnalyzer.asyncAnalyseFrame(mlFrame);
task.addOnSuccessListener(new OnSuccessListener<MLText>() {
public void onSuccess(MLText mlText) {
// Transacting logic for segment success.
if (mlText != null) {
ReadPhotoActivity.this.remoteDetectSuccess(mlText);
} else {
ReadPhotoActivity.this.displayFailure();
}
}
}).addOnFailureListener(new OnFailureListener() {
public void onFailure(Exception e) {
// Transacting logic for segment failure.
ReadPhotoActivity.this.displayFailure();
return;
}
});
} else {
Toast.makeText(this.getApplicationContext(), R.string.please_select_picture, Toast.LENGTH_SHORT).show();
return;
}
}
7. After the recognition is successful, click the play button to start the playback:
xxxxxxxxxx
this.relativeLayoutRead.setOnClickListener(new View.OnClickListener() {
public void onClick(View v) {
if (ReadPhotoActivity.this.sourceText == null) {
Toast.makeText(ReadPhotoActivity.this.getApplicationContext(), R.string.please_select_picture, Toast.LENGTH_SHORT).show();
} else {
ReadPhotoActivity.this.mlTtsEngine.speak(sourceText, MLTtsEngine.QUEUE_APPEND);
Toast.makeText(ReadPhotoActivity.this.getApplicationContext(), R.string.read_start, Toast.LENGTH_SHORT).show();
}
}
});
Demo
Opinions expressed by DZone contributors are their own.
Comments