Using AI to Help Improve Lip Reading
Artificial intelligence is becoming more common and pervasive. Find out the latest domain where it's making a difference.
Join the DZone community and get the full member experience.Join For Free
ai has made some impressive strides in understanding and processing speech in recent times. for instance, i wrote recently about a startup that is using machine learning to analyze our conversations for signs of neurological disorders such as alzheimer’s and parkinson’s.
another fascinating project has emerged from researchers at oxford university. they have developed a new product, called lipnet, which they believe is capable of lip reading considerably better than existing software.
read my lips
the software, which was documented in a recent paper , claims to be the most accurate at understanding what someone is saying, just by tracking their lips. the team claims that it is capable of doing so with 93.4% accuracy, which is not only considerably better than most other lip reading applications on the market, but is also far superior to the best human lip readers , who manage an accuracy of around 52%.
“lipreading is the task of decoding text from the movement of a speaker’s mouth. traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. more recent deep lipreading approaches are end-to-end trainable (wand et al., 2016; chung & zisserman, 2016a). all existing works, however, perform only word classification, not sentence-level sequence prediction. studies have shown that human lipreading performance increases for longer words (easton & basala, 1982), indicating the importance of features capturing temporal context in an ambiguous communication channel. motivated by this observation, we present lipnet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, an lstm recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end,” the authors write.
in other words, the researchers looked out for sentences rather than specific words and used this context combined with deep learning to allow the system to then analyze an entire sentence, deciphering individual words as they go.
the team believes that the service could eventually be used by the hearing impaired in a smartphone type service to allow them to better lip read.
it’s certainly a fascinating project and a further indication of the progress that’s being made. check out the video below to see lipnet in action.
Published at DZone with permission of Adi Gaskell, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.