Ambient babel and audio distortion is enough of a problem for regular people like you and me to endure. Our human audio processing system is so good at filtering out the aspects of our sonic environment that we are not focused on directly. We can understand people speaking while we're standing next to noisy machinery or walking in a large echoey train station. These kinds of noises which are categorized in the signal processing world as "stationary noise" and "reverberant noise" have been tackled with some measure of success by modern day noise cancellation algorithms. A decade ago most cell phones transmitted all of the sounds that their microphone picked up. Talking to someone who was driving in a convertible with the top down on the freeway was a torturous auditory experience. But today such a call can be eerily quiet and tranquil. Often the only clue the listener has of the environment is the Lombard effect that can be heard in the speaker's voice.
But there's another more insidious kind of noise that humans can deal with relatively easily. This kind of environmental noise is referred to as the cocktail party effect. It is a serious problem for all speech recognizers. But humans can just "magically" hear what a person in a crowd is saying. It's just another one of those amazingly complex computational tasks our brains do seemingly effortlessly. This task is more insidious because all of the noise is actually speech. As humans, we can focus on a single specific voice in a sea of babel. We're aware that people are speaking all around us, but we are listening to the person that we're having a conversation with.
Now imagine if you had to have a conversation in a babel-filled environment but you could only hear it through cell phone technology from a decade ago. The experience would range from very difficult to impossible. This is the experience that people with cochlear implants (and to varying degrees normal hearing aids) have to cope with.
This is the problem that Roozbeh Soleymani, an electrical engineering doctoral student at NYU, decided to tackle. The centerpiece of his idea was the observation that if you decompose the audio signal into wavelets and remove the more spectrally complex (those with the most wiggles) and finally recombined the simpler wavelets into a new audio signal, then the speech embedded in that new audio signal was more intelligible. The "babel noise" was more heavily represented in the complex wavelets. And with today's tiny but computationally capable devices it is practical to put this kind of analysis directly into the assisted hearing device. Below is a diagram of how this is done.
A simplified flowchart demonstrates NYU Tandon graduate student Roozbeh Soleymani's algorithmic solution for reducing babble signals to hearing aids.
This noise reduction technology is called Speech Enhancement using Decomposition Approach (SEDA) and is being developed at NYU in the Langone Department of Otolaryngology and the Tandon Department of Electrical and Computer Engineering.
In an interesting turn of events, this technology developed for the hard of hearing is now being considered for the cell phone platform. It seems quite likely that it will help make our calls clearer in the future. But I think even more important is that it will help with the cocktail party effect that speech recognizers suffer from. Speech recognition lately has gotten quite good. I suspect it's about to get a little bit better.
Imagine being able to do a voice search in the middle of an actual cocktail party!