Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Using AI to Help Improve Lip Reading

DZone's Guide to

Using AI to Help Improve Lip Reading

Artificial intelligence is becoming more common and pervasive. Find out the latest domain where it's making a difference.

· Big Data Zone
Free Resource

Access NoSQL and Big Data through SQL using standard drivers (ODBC, JDBC, ADO.NET). Free Download 

speech-disorder

AI has made some impressive strides in understanding and processing speech in recent times.  For instance, I wrote recently about a startup that is using machine learning to analyze our conversations for signs of neurological disorders such as Alzheimer’s and Parkinson’s.

Another fascinating project has emerged from researchers at Oxford University.  They have developed a new product, called LipNet, which they believe is capable of lip reading considerably better than existing software.

Read My Lips

The software, which was documented in a recent paper, claims to be the most accurate at understanding what someone is saying, just by tracking their lips.  The team claims that it is capable of doing so with 93.4% accuracy, which is not only considerably better than most other lip reading applications on the market, but is also far superior to the best human lip readers, who manage an accuracy of around 52%.

“Lipreading is the task of decoding text from the movement of a speaker’s mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). All existing works, however, perform only word classification, not sentence-level sequence prediction. Studies have shown that human lipreading performance increases for longer words (Easton & Basala, 1982), indicating the importance of features capturing temporal context in an ambiguous communication channel. Motivated by this observation, we present LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, an LSTM recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end,” the authors write.

In other words, the researchers looked out for sentences rather than specific words and used this context combined with deep learning to allow the system to then analyze an entire sentence, deciphering individual words as they go.

The team believes that the service could eventually be used by the hearing impaired in a smartphone type service to allow them to better lip read.

It’s certainly a fascinating project and a further indication of the progress that’s being made.  Check out the video below to see LipNet in action.


The fastest databases need the fastest drivers - learn how you can leverage CData Drivers for high performance NoSQL & Big Data Access.

Topics:
artificial intelligence ,big data ,healthcare

Published at DZone with permission of Adi Gaskell, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}