Speech recognition or taking notes is a very common practice depending on your profession or usage. For e.g. it is understood that today's mobile devices offer some kind of speech recognition and application control, whether it is directions or dialing a number or the other most common usage, playing music.
Note taking is another feature that is useful in recording findings in the field for later reference.
The very first step to all of these applications is to be able to capture the audio via the microphone and process it as needed.
So today we will look at the microphone API provided in Mango and we will put together a very simple sample to get you started.
What is involved?
At the very minimum, we need to record the audio and save it for playback. This gives the user the very basic ability to take notes.
The following key points should be noted:
- Microphone class: This is the class provided by the Microsoft.Xna.Framework.Audio namespace that allows us access to the microphone api.
- public event EventHandler<EventArgs> BufferReady: This is an event provided when the microphone is ready to release the buffered audio. We need to handle this event and store the audio for playback.
- Microphone.Start: As the name suggests, we call this to start recording.
- Microphone.Stop: We call this to stop recording. One key point to note here is that calling Microphone.Stop immediately clears out the buffer.
As you will see in the application, we don't call stop immediately when the user toggles the microphone or clicks the play button. Instead we let the microphone raise the buffer ready event at the selected buffer duration to capture the last bit of audio data before stopping recording.
using Microsoft.Xna.Framework-> As you must have guessed from the microphone namespace, we require a reference to the XNA framework. The microphone API is part of the XNA framework and requires simulating the XNA game loop. If you are not familiar with XNA, XNA is a rich framework provided by Microsoft for game and graphics based applications.
Understanding the sample
Prerequisites: Install the Mango tools from http://create.msdn.com. This should give you Visual Studio Express 2010 and the Windows Phone SDK that you need to develop applications for Windows Phone.
1. Launch Visual Studio and browse to the solution file and open it. The application is built using the Silverlight "Windows Phone Application" template. Run the project and deploy to the emulator. You will see the following screen when the application finishes loading:
2. Screen element:
a. The microphone button is a toggle which starts and stops the microphone.
b. The play button is used to playback the recorded sound.
c. There are three slider controls to adjust the volume, pitch and pan of the sound being played back. These properties can be adjusted only before starting playback.
3. How it works:
a. Touch the microphone to start recording. You can stop recording by touching the microphone again or alternatively touching the play button.
b. Adjust the volume , pitch and pan one by one and test the effect of the change by playing the recorded sound.
Understanding the code
Declarations: Here is a screenshot of the declarations.
We will be using an object of the SoundEffectInstance class to playback the recorded audio. We could also have used a SoundEffect however using the SoundEffectInstance class allows us to track the state (playing or stopped).
The other declaration here is for a MemoryStream object. The microphone buffer is constantly written to a memorystream until playback is desired. At that time,we submit the contents of the memorystream to the SoundEffectInstance object to play.
The key point to note here is the game loop we have created using the DispatcherTimer. This loop is essential to capturing the audio from the microphone.
We set the image for the play icon based on the light or dark theme used in the phone.
At this point we also set up the microphone defaults for our application as follows:
The buffer duration is set to 1/2 second and then we use a method GetSampleSizeInBytes and pass the buffer duration to get the right buffer size. This is important to ensure smooth audio capture.
We wire up the buffer ready event and set the default stopped microphone image.
We are ready to start recording!
When the user clicks the microphone button the following code is called:
There are a few things happening here:
1. Microphone is stopped: If the microphone is stopped we need to start it to begin recording. We set the background of the microphone button. Then we reset to MemoryStream to clear out previously recorded audio. We check and stop the recorded sound from playing.
Fairly straightforward steps. At a minimum we need to call Microphone.Start(). The rest of the steps are based on the UI and application design.
2. Microphone is recording: At a minimum we have to stop recording. If you recall the note from above, we cannot call Microphone.Stop immediately as all recorded data has not been flushed to the MemoryStream. So we use boolean varibles to keep track and defer the stop action to DispatchTimer event.
Two things need to happen here. First, we need to let the microphone bufer ready event fire first so we can read the last bit of audio and then we need to trigger playback. We handle it as follows:
In the buffer ready event, we check if recording has been stopped using our boolean variables and then we call Stream.Flush(). This flushes the remaining data to the MemoryStream. Then we stop the microphone.
However we cannot trigger playback in this event. That is handled by the DispatchTimer tick event as follows:
The tick event is called every 33ms. So it will trigger fairly close to the user selecting playback with relatively no indication of a lag. The advantage ofcourse is that we can playback the entire audio rather than letting it get cut off.
We check if it is time to play the recording and whether the stream has been flushed. If so we start a new thread for audio playback. This is an important point to note here.
We are triggering audio playback on a different thread to allow the UI to update. This means any code inside our playback routine that chooses to update UI elements has to do it by calling Dispatch.BeginInvoke as shown below.
We create an object of the SoundEffectInstance class by feeding it the captured audio stream, the sample rate of the microphone and the audio channel.
Since we wish to use the sliders to adjust volume, pitch and pan we have to use Dispatcher.BeginInvoke as they are on a different thread.
Finally we call Play.
So, its fairly simple to create an application to record audio. We can extend this application to save the recorded audio to isolated storage and give it a title chosen by the user. We can add a listview of the recordings from isolated storage and turn this into a note taking application.
The basic steps to record audio are
a) Wire an event to the default microphone to capture the audio.
b) Write the audio to a stream.
c) When the user stops recording, flush the stream and store it or playback.
We can also extend this sample by taking the recorded audio and sending it to a speech translation service and capture commands.
To download a full Windows Phone application that uses all of the code and examples from above, click the Download Code button below.
Tomorrow, Jeff Fansler will be covering the Media Library class, and how we can use it to learn more about a user’s music library on their phone. See you then!