Over a million developers have joined DZone.

Building AR Applications With Unity and IBM Watson

DZone's Guide to

Building AR Applications With Unity and IBM Watson

See how to build AR applications using Unity and the IBM Watson SDK. Here, we'll take a look at using speech to text in an AR app.

· IoT Zone ·
Free Resource

Over the last few days, I’ve enjoyed playing with Unity and the IBM Watson SDK, which allows using cognitive services like speech recognition in Unity projects. With this technology, you can not only build games, but also other exciting scenarios. I’ve extended an augmented reality application from my colleague, Amara Keller, which allows iOS users to have conversations with a virtual character.

The picture shows a printed piece of paper with a pattern. When using the app, the 3D character shows up on the paper. Users can have conversations with the character, for example:

  • User: How is the weather?
  • Virtual character: In which location?
  • User: Munich
  • Virtual character: The temperature in Munich is currently 24 degrees.
  • User: How is the weather in Berlin?
  • Virtual character: The temperature in Berlin is currently 28 degrees.

Check out the video for a short demo.

Get the code from GitHub.

Technically, the following services and tools are used:

The main logic is in this file. Let’s take a look how to use the Speech To Text service as an example. First, you need to initialize the service with credentials you can get from the IBM Cloud. The lite account offers access to the Watson services, doesn’t cost anything, and you don’t even have to provide a credit card.

SpeechToText _speechToText;
Credentials credentials = new Credentials(WATSON_SPEECH_TO_TEXT_USER, WATSON_SPEECH_TO_TEXT_PASSWORD, "https://stream.watsonplatform.net/speech-to-text/api");
_speechToText = new SpeechToText(credentials);

Next, you start listening by invoking StartListening and defining some options:

_speechToText.DetectSilence = true;
_speechToText.EnableWordConfidence = false;
_speechToText.EnableTimestamps = false;
_speechToText.SilenceThreshold = 0.03f;
_speechToText.MaxAlternatives = 1;
_speechToText.StartListening(OnSpeechToTextResultReceived, OnRecognizeSpeaker);

The callback OnSpeechToTextResultReceived gets the spoken text as input:

private void OnSpeechToTextResultReceived(SpeechRecognitionEvent result, Dictionary<string, object> customData) {
    if (result != null && result.results.Length > 0) {
        foreach (var res in result.results) {
            foreach (var alt in res.alternatives) { 

The application also showcases how to use Watson Assistant and Watson Text To Speech in addition to Watson Speech To Text. Check out the open source project for details.

One important thing to keep in mind when using the three Watson services together is the timing. For example, you should stop recording before playing an audio clip received from Watson Text To Speech so that Watson doesn’t listen to itself. Also, you need to make sure to only play one clip at a time.

I’m neither a Unity, nor a C# expert. So I’m sure there are better ways to do this. Below is how I’ve solved this. I start the recording again only after the duration of the audio clip.

private void OnSynthesize(AudioClip clip, Dictionary<string, object> customData) {      
    GameObject audioObject = new GameObject("AudioObject");
    AudioSource source = audioObject.AddComponent<AudioSource>();
    source.loop = false;
    source.clip = clip;
    Invoke("RecordAgain", source.clip.length);
    Destroy(audioObject, clip.length);

Want to run this sample yourself? Try it out on the IBM Cloud.

iot ,augmented reality ,ibm cloud ,unity ,tutorial ,ibm watson

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}