Over a million developers have joined DZone.

Using AI to Help Improve Lip Reading

DZone's Guide to

Using AI to Help Improve Lip Reading

Artificial intelligence is becoming more common and pervasive. Find out the latest domain where it's making a difference.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.


AI has made some impressive strides in understanding and processing speech in recent times.  For instance, I wrote recently about a startup that is using machine learning to analyze our conversations for signs of neurological disorders such as Alzheimer’s and Parkinson’s.

Another fascinating project has emerged from researchers at Oxford University.  They have developed a new product, called LipNet, which they believe is capable of lip reading considerably better than existing software.

Read My Lips

The software, which was documented in a recent paper, claims to be the most accurate at understanding what someone is saying, just by tracking their lips.  The team claims that it is capable of doing so with 93.4% accuracy, which is not only considerably better than most other lip reading applications on the market, but is also far superior to the best human lip readers, who manage an accuracy of around 52%.

“Lipreading is the task of decoding text from the movement of a speaker’s mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). All existing works, however, perform only word classification, not sentence-level sequence prediction. Studies have shown that human lipreading performance increases for longer words (Easton & Basala, 1982), indicating the importance of features capturing temporal context in an ambiguous communication channel. Motivated by this observation, we present LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, an LSTM recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end,” the authors write.

In other words, the researchers looked out for sentences rather than specific words and used this context combined with deep learning to allow the system to then analyze an entire sentence, deciphering individual words as they go.

The team believes that the service could eventually be used by the hearing impaired in a smartphone type service to allow them to better lip read.

It’s certainly a fascinating project and a further indication of the progress that’s being made.  Check out the video below to see LipNet in action.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

artificial intelligence ,big data ,healthcare

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}